Message ID | 20230412073611.62942-1-kurt@linutronix.de (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] igc: Avoid transmit queue timeout for XDP | expand |
On 4/12/2023 12:36 AM, Kurt Kanzenbach wrote: > High XDP load triggers the netdev watchdog: > > |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out > > The reason is the Tx queue transmission start (txq->trans_start) is not updated > in XDP code path. Therefore, add it for all XDP transmission functions. > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> For Intel, I only see this being done in igb, as 5337824f4dc4 ("net: annotate accesses to queue->trans_start"). I see a few other drivers also calling this. Is this a gap that other XDP implementations also need to fix? grepping for txq_trans_cond_update I see: > apm/xgene/xgene_enet_main.c > 874: txq_trans_cond_update(txq); > > engleder/tsnep_main.c > 623: txq_trans_cond_update(tx_nq); > 1660: txq_trans_cond_update(nq); > > freescale/dpaa/dpaa_eth.c > 2347: txq_trans_cond_update(txq); > 2553: txq_trans_cond_update(txq); > > ibm/ibmvnic.c > 2485: txq_trans_cond_update(txq); > > intel/igb/igb_main.c > 2980: txq_trans_cond_update(nq); > 3014: txq_trans_cond_update(nq); > > stmicro/stmmac/stmmac_main.c > 2428: txq_trans_cond_update(nq); > 4808: txq_trans_cond_update(nq); > 6436: txq_trans_cond_update(nq); > Is most driver's XDP implementation broken? There's also netif_trans_update but this is called out as a legacy only function. Far more drivers call this but I don't see either call or a direct update to trans_start in many XDP implementations... Am I missing something or are a bunch of other XDP implementations also wrong? The patch seems ok to me, assuming this is the correct way to fix things and not something in the XDP path. > --- > drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ba49728be919..e71e85e3bcc2 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -2384,6 +2384,8 @@ static int igc_xdp_xmit_back(struct igc_adapter *adapter, struct xdp_buff *xdp) > nq = txring_txq(ring); > > __netif_tx_lock(nq, cpu); > + /* Avoid transmit queue timeout since we share it with the slow path */ > + txq_trans_cond_update(nq); > res = igc_xdp_init_tx_descriptor(ring, xdpf); > __netif_tx_unlock(nq); > return res; > @@ -2786,6 +2788,9 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) > > __netif_tx_lock(nq, cpu); > > + /* Avoid transmit queue timeout since we share it with the slow path */ > + txq_trans_cond_update(nq); > + > budget = igc_desc_unused(ring); > > while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { > @@ -6311,6 +6316,9 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames, > > __netif_tx_lock(nq, cpu); > > + /* Avoid transmit queue timeout since we share it with the slow path */ > + txq_trans_cond_update(nq); > + > drops = 0; > for (i = 0; i < num_frames; i++) { > int err;
On Wed Apr 12 2023, Jacob Keller wrote: > On 4/12/2023 12:36 AM, Kurt Kanzenbach wrote: >> High XDP load triggers the netdev watchdog: >> >> |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out >> >> The reason is the Tx queue transmission start (txq->trans_start) is not updated >> in XDP code path. Therefore, add it for all XDP transmission functions. >> >> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > > For Intel, I only see this being done in igb, as 5337824f4dc4 ("net: > annotate accesses to queue->trans_start"). I see a few other drivers > also calling this. > > Is this a gap that other XDP implementations also need to fix? > > grepping for txq_trans_cond_update I see: > >> apm/xgene/xgene_enet_main.c >> 874: txq_trans_cond_update(txq); >> >> engleder/tsnep_main.c >> 623: txq_trans_cond_update(tx_nq); >> 1660: txq_trans_cond_update(nq); >> >> freescale/dpaa/dpaa_eth.c >> 2347: txq_trans_cond_update(txq); >> 2553: txq_trans_cond_update(txq); >> >> ibm/ibmvnic.c >> 2485: txq_trans_cond_update(txq); >> >> intel/igb/igb_main.c >> 2980: txq_trans_cond_update(nq); >> 3014: txq_trans_cond_update(nq); >> >> stmicro/stmmac/stmmac_main.c >> 2428: txq_trans_cond_update(nq); >> 4808: txq_trans_cond_update(nq); >> 6436: txq_trans_cond_update(nq); >> > > Is most driver's XDP implementation broken? There's also > netif_trans_update but this is called out as a legacy only function. Far > more drivers call this but I don't see either call or a direct update to > trans_start in many XDP implementations... > > Am I missing something or are a bunch of other XDP implementations also > wrong? > > The patch seems ok to me, assuming this is the correct way to fix things > and not something in the XDP path. AFAICT the netdev watchdog is only started when the device exposes ndo_tx_timeout callback (see __netdev_watchdog_up()). For igc this callback was introduced recently in 9b275176270e ("igc: Add ndo_tx_timeout support"). My guess, as soon as the net device has ndo_tx_timeout it needs to maintain trans_start for XDP? Thanks, Kurt
On Wed, 12 Apr 2023 15:30:38 -0700 Jacob Keller wrote: > Is most driver's XDP implementation broken? There's also > netif_trans_update but this is called out as a legacy only function. Far > more drivers call this but I don't see either call or a direct update to > trans_start in many XDP implementations... > > Am I missing something or are a bunch of other XDP implementations also > wrong? Only drivers which use the same Tx queues for the stack and XDP need this.
On 4/13/2023 9:03 AM, Jakub Kicinski wrote: > On Wed, 12 Apr 2023 15:30:38 -0700 Jacob Keller wrote: >> Is most driver's XDP implementation broken? There's also >> netif_trans_update but this is called out as a legacy only function. Far >> more drivers call this but I don't see either call or a direct update to >> trans_start in many XDP implementations... >> >> Am I missing something or are a bunch of other XDP implementations also >> wrong? > > Only drivers which use the same Tx queues for the stack and XDP need > this. Ok that explains it. igc has few enough queues they need to be shared, but other devices have more queues and can use dedicated queues. Then this looks good to me! Thanks, Jake
From: Jacob Keller > Sent: 13 April 2023 17:40 > > On 4/13/2023 9:03 AM, Jakub Kicinski wrote: > > On Wed, 12 Apr 2023 15:30:38 -0700 Jacob Keller wrote: > >> Is most driver's XDP implementation broken? There's also > >> netif_trans_update but this is called out as a legacy only function. Far > >> more drivers call this but I don't see either call or a direct update to > >> trans_start in many XDP implementations... > >> > >> Am I missing something or are a bunch of other XDP implementations also > >> wrong? > > > > Only drivers which use the same Tx queues for the stack and XDP need > > this. > > Ok that explains it. igc has few enough queues they need to be shared, > but other devices have more queues and can use dedicated queues. Aren't there some generic problems with multiple transmit queues? The tg3 driver only uses a single tx queue because sending enough large packets to saturate the network through one queue has the effect of significantly delaying packets on other queues because of the 'round robin' algorithm used by the hardware. Transmit processing is also a lot less demanding than the receive processing. So you may want to split the receive traffic into multiple queues (with the processing happening on multiple cpu) but keep the transmit processing (which is much less critical) only running on a single cpu - so with a single queue. (Trying to process 10000 RTP streams gets 'interesting'.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 4/12/2023 10:36, Kurt Kanzenbach wrote: > High XDP load triggers the netdev watchdog: > > |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out > > The reason is the Tx queue transmission start (txq->trans_start) is not updated > in XDP code path. Therefore, add it for all XDP transmission functions. > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++ > 1 file changed, 8 insertions(+) Tested-by: Naama Meir <naamax.meir@linux.intel.com>
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index ba49728be919..e71e85e3bcc2 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -2384,6 +2384,8 @@ static int igc_xdp_xmit_back(struct igc_adapter *adapter, struct xdp_buff *xdp) nq = txring_txq(ring); __netif_tx_lock(nq, cpu); + /* Avoid transmit queue timeout since we share it with the slow path */ + txq_trans_cond_update(nq); res = igc_xdp_init_tx_descriptor(ring, xdpf); __netif_tx_unlock(nq); return res; @@ -2786,6 +2788,9 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) __netif_tx_lock(nq, cpu); + /* Avoid transmit queue timeout since we share it with the slow path */ + txq_trans_cond_update(nq); + budget = igc_desc_unused(ring); while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { @@ -6311,6 +6316,9 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames, __netif_tx_lock(nq, cpu); + /* Avoid transmit queue timeout since we share it with the slow path */ + txq_trans_cond_update(nq); + drops = 0; for (i = 0; i < num_frames; i++) { int err;
High XDP load triggers the netdev watchdog: |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out The reason is the Tx queue transmission start (txq->trans_start) is not updated in XDP code path. Therefore, add it for all XDP transmission functions. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> --- drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++ 1 file changed, 8 insertions(+)