diff mbox series

[net,v1] octeontx2-pf: Fix page pool cache index corruption.

Message ID 20230906033926.3663659-1-rkannoth@marvell.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v1] octeontx2-pf: Fix page pool cache index corruption. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 175 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ratheesh Kannoth Sept. 6, 2023, 3:39 a.m. UTC
The access to page pool `cache' array and the `count' variable
is not locked. Page pool cache access is fine as long as there
is only one consumer per pool.

octeontx2 driver fills in rx buffers from page pool in NAPI context.
If system is stressed and could not allocate buffers, refiiling work
will be delegated to a delayed workqueue. This means that there are
two cosumers to the page pool cache.

Either workqueue or IRQ/NAPI can be run on other CPU. This will lead
to lock less access, hence corruption of cache pool indexes.

To fix this issue, NAP is rescheduled from workqueue context to refill
rx buffers.

Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

---
ChangeLogs
v0 -> v1: udelay will waste CPU cycles. So call napi_schedule from
	  delayed work queue context.
---
---
 .../ethernet/marvell/octeontx2/nic/cn10k.c    |  4 +-
 .../ethernet/marvell/octeontx2/nic/cn10k.h    |  2 +-
 .../marvell/octeontx2/nic/otx2_common.c       | 40 ++-----------------
 .../marvell/octeontx2/nic/otx2_common.h       |  3 +-
 .../marvell/octeontx2/nic/otx2_txrx.c         | 30 +++++++++++---
 .../marvell/octeontx2/nic/otx2_txrx.h         |  4 +-
 6 files changed, 36 insertions(+), 47 deletions(-)

Comments

Sebastian Andrzej Siewior Sept. 6, 2023, 8:08 a.m. UTC | #1
On 2023-09-06 09:09:26 [+0530], Ratheesh Kannoth wrote:
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index 8511906cb4e2..5bba1f34e4f6 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -1082,38 +1070,16 @@ static int otx2_cq_init(struct otx2_nic *pfvf, u16 qidx)
>  static void otx2_pool_refill_task(struct work_struct *work)
>  {> +	napi_schedule(wrk->napi);

This will delay NAPI until "some random point in the future" for
instance if an interrupt on _this_ CPU fires. You only set the softirq
state and never enforce it here. This works as intended if invoked from
an IRQ but this here a worker/ process context.

You can either put local_bh_disable()/enable() around napi_schedule() or
use it from a timer callback and skip the worker.

Sebastian
Ratheesh Kannoth Sept. 6, 2023, 8:24 a.m. UTC | #2
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Subject: [EXT] Re: [PATCH net v1] octeontx2-pf: Fix page pool cache index
> corruption.
> 
> > +	napi_schedule(wrk->napi);
> 
> This will delay NAPI until "some random point in the future" for instance if an
> interrupt on _this_ CPU fires. You only set the softirq state and never enforce
> it here. This works as intended if invoked from an IRQ but this here a worker/
> process context.
ACK.  Do we need to be so precise here ? Anyway we are short of rx buffers and want to schedule NAPI after some time 
(in delayed workqueue) to recheck on rx buffers.  Softirq state will be checked even on timer interrupt returns, right ?.  I was thinking 
Case - what will happen if workqueue never got a chance to run if system is stressed with interrupts. 
 
> You can either put local_bh_disable()/enable() around napi_schedule() or
> use it from a timer callback and skip the worker.
Switching to  a timer callback makes sense. 

-Ratheesh
Sebastian Andrzej Siewior Sept. 6, 2023, 8:39 a.m. UTC | #3
On 2023-09-06 08:24:53 [+0000], Ratheesh Kannoth wrote:
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Subject: [EXT] Re: [PATCH net v1] octeontx2-pf: Fix page pool cache index
> > corruption.
> > 
> > > +	napi_schedule(wrk->napi);
> > 
> > This will delay NAPI until "some random point in the future" for instance if an
> > interrupt on _this_ CPU fires. You only set the softirq state and never enforce
> > it here. This works as intended if invoked from an IRQ but this here a worker/
> > process context.
> ACK.  Do we need to be so precise here ? Anyway we are short of rx buffers and want to schedule NAPI after some time 
> (in delayed workqueue) to recheck on rx buffers.  Softirq state will be checked even on timer interrupt returns, right ?.  I was thinking 
> Case - what will happen if workqueue never got a chance to run if system is stressed with interrupts. 

As of this patch, the timer will wake the worker and worker will
schedule NAPI. (The timer CPU and the worker CPU can be different.)
If nothing else happens the CPU, which run the worker, will go idle.
With NO_HZ enabled you should see warning that the CPU is going idle
with pending softirqs.
I prefer to be precise what is going on so there are no surprises.

If the CPU is stressed with interrupts then it is probably busy enough
and not running the worker to ask for memory (which it did not have
earlier) is probably the smallest problem.

I'm not against the worker (with the extra steps) but I prefer to have
napi schedule done properly.

> > You can either put local_bh_disable()/enable() around napi_schedule() or
> > use it from a timer callback and skip the worker.
> Switching to  a timer callback makes sense. 

oki.

> -Ratheesh

Sebastian
Ratheesh Kannoth Sept. 6, 2023, 8:46 a.m. UTC | #4
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Subject: Re: RE: [EXT] Re: [PATCH net v1] octeontx2-pf: Fix page pool cache
> index corruption.
> 
> NAPI. (The timer CPU and the worker CPU can be different.) 
ACK. 

> I prefer to be precise what is going on so there are no surprises.
ACK.

 
> I'm not against the worker (with the extra steps) but I prefer to have napi
> schedule done properly.
ACK. Will add local_bh_enable() and keep workqueue. As only interrupt that can
Occur at high rate is rx interrupts, and is disabled while workqueue is scheduled (from NAPI).
diff mbox series

Patch

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c
index 826f691de259..211c7d8a0556 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.c
@@ -107,12 +107,13 @@  int cn10k_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura)
 }
 
 #define NPA_MAX_BURST 16
-void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq)
+int cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq)
 {
 	struct otx2_nic *pfvf = dev;
 	u64 ptrs[NPA_MAX_BURST];
 	int num_ptrs = 1;
 	dma_addr_t bufptr;
+	int cnt = cq->pool_ptrs;
 
 	/* Refill pool with new buffers */
 	while (cq->pool_ptrs) {
@@ -131,6 +132,7 @@  void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq)
 			num_ptrs = 1;
 		}
 	}
+	return cnt - cq->pool_ptrs;
 }
 
 void cn10k_sqe_flush(void *dev, struct otx2_snd_queue *sq, int size, int qidx)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h
index 8ae96815865e..c1861f7de254 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k.h
@@ -24,7 +24,7 @@  static inline int mtu_to_dwrr_weight(struct otx2_nic *pfvf, int mtu)
 	return weight;
 }
 
-void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq);
+int cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq);
 void cn10k_sqe_flush(void *dev, struct otx2_snd_queue *sq, int size, int qidx);
 int cn10k_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura);
 int cn10k_lmtst_init(struct otx2_nic *pfvf);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 8511906cb4e2..5bba1f34e4f6 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -574,20 +574,8 @@  int otx2_alloc_rbuf(struct otx2_nic *pfvf, struct otx2_pool *pool,
 int otx2_alloc_buffer(struct otx2_nic *pfvf, struct otx2_cq_queue *cq,
 		      dma_addr_t *dma)
 {
-	if (unlikely(__otx2_alloc_rbuf(pfvf, cq->rbpool, dma))) {
-		struct refill_work *work;
-		struct delayed_work *dwork;
-
-		work = &pfvf->refill_wrk[cq->cq_idx];
-		dwork = &work->pool_refill_work;
-		/* Schedule a task if no other task is running */
-		if (!cq->refill_task_sched) {
-			cq->refill_task_sched = true;
-			schedule_delayed_work(dwork,
-					      msecs_to_jiffies(100));
-		}
+	if (unlikely(__otx2_alloc_rbuf(pfvf, cq->rbpool, dma)))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1082,38 +1070,16 @@  static int otx2_cq_init(struct otx2_nic *pfvf, u16 qidx)
 static void otx2_pool_refill_task(struct work_struct *work)
 {
 	struct otx2_cq_queue *cq;
-	struct otx2_pool *rbpool;
 	struct refill_work *wrk;
-	int qidx, free_ptrs = 0;
 	struct otx2_nic *pfvf;
-	dma_addr_t bufptr;
+	int qidx;
 
 	wrk = container_of(work, struct refill_work, pool_refill_work.work);
 	pfvf = wrk->pf;
 	qidx = wrk - pfvf->refill_wrk;
 	cq = &pfvf->qset.cq[qidx];
-	rbpool = cq->rbpool;
-	free_ptrs = cq->pool_ptrs;
 
-	while (cq->pool_ptrs) {
-		if (otx2_alloc_rbuf(pfvf, rbpool, &bufptr)) {
-			/* Schedule a WQ if we fails to free atleast half of the
-			 * pointers else enable napi for this RQ.
-			 */
-			if (!((free_ptrs - cq->pool_ptrs) > free_ptrs / 2)) {
-				struct delayed_work *dwork;
-
-				dwork = &wrk->pool_refill_work;
-				schedule_delayed_work(dwork,
-						      msecs_to_jiffies(100));
-			} else {
-				cq->refill_task_sched = false;
-			}
-			return;
-		}
-		pfvf->hw_ops->aura_freeptr(pfvf, qidx, bufptr + OTX2_HEAD_ROOM);
-		cq->pool_ptrs--;
-	}
+	napi_schedule(wrk->napi);
 	cq->refill_task_sched = false;
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index 4c6032ee7800..f6a6437fe169 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -301,6 +301,7 @@  struct flr_work {
 
 struct refill_work {
 	struct delayed_work pool_refill_work;
+	struct napi_struct *napi;
 	struct otx2_nic *pf;
 };
 
@@ -370,7 +371,7 @@  struct dev_hw_ops {
 	int	(*sq_aq_init)(void *dev, u16 qidx, u16 sqb_aura);
 	void	(*sqe_flush)(void *dev, struct otx2_snd_queue *sq,
 			     int size, int qidx);
-	void	(*refill_pool_ptrs)(void *dev, struct otx2_cq_queue *cq);
+	int	(*refill_pool_ptrs)(void *dev, struct otx2_cq_queue *cq);
 	void	(*aura_freeptr)(void *dev, int aura, u64 buf);
 };
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
index e369baf11530..e77d43848955 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
@@ -424,9 +424,10 @@  static int otx2_rx_napi_handler(struct otx2_nic *pfvf,
 	return processed_cqe;
 }
 
-void otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq)
+int otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq)
 {
 	struct otx2_nic *pfvf = dev;
+	int cnt = cq->pool_ptrs;
 	dma_addr_t bufptr;
 
 	while (cq->pool_ptrs) {
@@ -435,6 +436,8 @@  void otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq)
 		otx2_aura_freeptr(pfvf, cq->cq_idx, bufptr + OTX2_HEAD_ROOM);
 		cq->pool_ptrs--;
 	}
+
+	return cnt - cq->pool_ptrs;
 }
 
 static int otx2_tx_napi_handler(struct otx2_nic *pfvf,
@@ -521,6 +524,7 @@  int otx2_napi_handler(struct napi_struct *napi, int budget)
 	struct otx2_cq_queue *cq;
 	struct otx2_qset *qset;
 	struct otx2_nic *pfvf;
+	int filled_cnt = -1;
 
 	cq_poll = container_of(napi, struct otx2_cq_poll, napi);
 	pfvf = (struct otx2_nic *)cq_poll->dev;
@@ -541,7 +545,7 @@  int otx2_napi_handler(struct napi_struct *napi, int budget)
 	}
 
 	if (rx_cq && rx_cq->pool_ptrs)
-		pfvf->hw_ops->refill_pool_ptrs(pfvf, rx_cq);
+		filled_cnt = pfvf->hw_ops->refill_pool_ptrs(pfvf, rx_cq);
 	/* Clear the IRQ */
 	otx2_write64(pfvf, NIX_LF_CINTX_INT(cq_poll->cint_idx), BIT_ULL(0));
 
@@ -561,9 +565,25 @@  int otx2_napi_handler(struct napi_struct *napi, int budget)
 				otx2_config_irq_coalescing(pfvf, i);
 		}
 
-		/* Re-enable interrupts */
-		otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
-			     BIT_ULL(0));
+		if (unlikely(!filled_cnt)) {
+			struct refill_work *work;
+			struct delayed_work *dwork;
+
+			work = &pfvf->refill_wrk[cq->cq_idx];
+			dwork = &work->pool_refill_work;
+			/* Schedule a task if no other task is running */
+			if (!cq->refill_task_sched) {
+				work->napi = napi;
+				cq->refill_task_sched = true;
+				schedule_delayed_work(dwork,
+						      msecs_to_jiffies(100));
+			}
+		} else {
+			/* Re-enable interrupts */
+			otx2_write64(pfvf,
+				     NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
+				     BIT_ULL(0));
+		}
 	}
 	return workdone;
 }
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
index 9e3bfbe5c480..a82ffca8ce1b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
@@ -170,6 +170,6 @@  void cn10k_sqe_flush(void *dev, struct otx2_snd_queue *sq,
 		     int size, int qidx);
 void otx2_sqe_flush(void *dev, struct otx2_snd_queue *sq,
 		    int size, int qidx);
-void otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq);
-void cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq);
+int otx2_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq);
+int cn10k_refill_pool_ptrs(void *dev, struct otx2_cq_queue *cq);
 #endif /* OTX2_TXRX_H */