diff mbox series

[net-next] igc: Avoid transmit queue timeout for XDP

Message ID 20230412073611.62942-1-kurt@linutronix.de (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net-next] igc: Avoid transmit queue timeout for XDP | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kurt Kanzenbach April 12, 2023, 7:36 a.m. UTC
High XDP load triggers the netdev watchdog:

|NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out

The reason is the Tx queue transmission start (txq->trans_start) is not updated
in XDP code path. Therefore, add it for all XDP transmission functions.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jacob Keller April 12, 2023, 10:30 p.m. UTC | #1
On 4/12/2023 12:36 AM, Kurt Kanzenbach wrote:
> High XDP load triggers the netdev watchdog:
> 
> |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
> 
> The reason is the Tx queue transmission start (txq->trans_start) is not updated
> in XDP code path. Therefore, add it for all XDP transmission functions.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>

For Intel, I only see this being done in igb, as 5337824f4dc4 ("net:
annotate accesses to queue->trans_start"). I see a few other drivers
also calling this.

Is this a gap that other XDP implementations also need to fix?

grepping for txq_trans_cond_update I see:

> apm/xgene/xgene_enet_main.c
> 874:            txq_trans_cond_update(txq);
> 
> engleder/tsnep_main.c
> 623:            txq_trans_cond_update(tx_nq);
> 1660:           txq_trans_cond_update(nq);
> 
> freescale/dpaa/dpaa_eth.c
> 2347:   txq_trans_cond_update(txq);
> 2553:   txq_trans_cond_update(txq);
> 
> ibm/ibmvnic.c
> 2485:   txq_trans_cond_update(txq);
> 
> intel/igb/igb_main.c
> 2980:   txq_trans_cond_update(nq);
> 3014:   txq_trans_cond_update(nq);
> 
> stmicro/stmmac/stmmac_main.c
> 2428:   txq_trans_cond_update(nq);
> 4808:   txq_trans_cond_update(nq);
> 6436:   txq_trans_cond_update(nq);
> 

Is most driver's XDP implementation broken? There's also
netif_trans_update but this is called out as a legacy only function. Far
more drivers call this but I don't see either call or a direct update to
trans_start in many XDP implementations...

Am I missing something or are a bunch of other XDP implementations also
wrong?

The patch seems ok to me, assuming this is the correct way to fix things
and not something in the XDP path.

> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index ba49728be919..e71e85e3bcc2 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2384,6 +2384,8 @@ static int igc_xdp_xmit_back(struct igc_adapter *adapter, struct xdp_buff *xdp)
>  	nq = txring_txq(ring);
>  
>  	__netif_tx_lock(nq, cpu);
> +	/* Avoid transmit queue timeout since we share it with the slow path */
> +	txq_trans_cond_update(nq);
>  	res = igc_xdp_init_tx_descriptor(ring, xdpf);
>  	__netif_tx_unlock(nq);
>  	return res;
> @@ -2786,6 +2788,9 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>  
>  	__netif_tx_lock(nq, cpu);
>  
> +	/* Avoid transmit queue timeout since we share it with the slow path */
> +	txq_trans_cond_update(nq);
> +
>  	budget = igc_desc_unused(ring);
>  
>  	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> @@ -6311,6 +6316,9 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
>  
>  	__netif_tx_lock(nq, cpu);
>  
> +	/* Avoid transmit queue timeout since we share it with the slow path */
> +	txq_trans_cond_update(nq);
> +
>  	drops = 0;
>  	for (i = 0; i < num_frames; i++) {
>  		int err;
Kurt Kanzenbach April 13, 2023, 7:20 a.m. UTC | #2
On Wed Apr 12 2023, Jacob Keller wrote:
> On 4/12/2023 12:36 AM, Kurt Kanzenbach wrote:
>> High XDP load triggers the netdev watchdog:
>> 
>> |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
>> 
>> The reason is the Tx queue transmission start (txq->trans_start) is not updated
>> in XDP code path. Therefore, add it for all XDP transmission functions.
>> 
>> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
>
> For Intel, I only see this being done in igb, as 5337824f4dc4 ("net:
> annotate accesses to queue->trans_start"). I see a few other drivers
> also calling this.
>
> Is this a gap that other XDP implementations also need to fix?
>
> grepping for txq_trans_cond_update I see:
>
>> apm/xgene/xgene_enet_main.c
>> 874:            txq_trans_cond_update(txq);
>> 
>> engleder/tsnep_main.c
>> 623:            txq_trans_cond_update(tx_nq);
>> 1660:           txq_trans_cond_update(nq);
>> 
>> freescale/dpaa/dpaa_eth.c
>> 2347:   txq_trans_cond_update(txq);
>> 2553:   txq_trans_cond_update(txq);
>> 
>> ibm/ibmvnic.c
>> 2485:   txq_trans_cond_update(txq);
>> 
>> intel/igb/igb_main.c
>> 2980:   txq_trans_cond_update(nq);
>> 3014:   txq_trans_cond_update(nq);
>> 
>> stmicro/stmmac/stmmac_main.c
>> 2428:   txq_trans_cond_update(nq);
>> 4808:   txq_trans_cond_update(nq);
>> 6436:   txq_trans_cond_update(nq);
>> 
>
> Is most driver's XDP implementation broken? There's also
> netif_trans_update but this is called out as a legacy only function. Far
> more drivers call this but I don't see either call or a direct update to
> trans_start in many XDP implementations...
>
> Am I missing something or are a bunch of other XDP implementations also
> wrong?
>
> The patch seems ok to me, assuming this is the correct way to fix things
> and not something in the XDP path.

AFAICT the netdev watchdog is only started when the device exposes
ndo_tx_timeout callback (see __netdev_watchdog_up()). For igc this
callback was introduced recently in 9b275176270e ("igc: Add
ndo_tx_timeout support"). My guess, as soon as the net device has
ndo_tx_timeout it needs to maintain trans_start for XDP?

Thanks,
Kurt
Jakub Kicinski April 13, 2023, 4:03 p.m. UTC | #3
On Wed, 12 Apr 2023 15:30:38 -0700 Jacob Keller wrote:
> Is most driver's XDP implementation broken? There's also
> netif_trans_update but this is called out as a legacy only function. Far
> more drivers call this but I don't see either call or a direct update to
> trans_start in many XDP implementations...
> 
> Am I missing something or are a bunch of other XDP implementations also
> wrong?

Only drivers which use the same Tx queues for the stack and XDP need
this.
Jacob Keller April 13, 2023, 4:39 p.m. UTC | #4
On 4/13/2023 9:03 AM, Jakub Kicinski wrote:
> On Wed, 12 Apr 2023 15:30:38 -0700 Jacob Keller wrote:
>> Is most driver's XDP implementation broken? There's also
>> netif_trans_update but this is called out as a legacy only function. Far
>> more drivers call this but I don't see either call or a direct update to
>> trans_start in many XDP implementations...
>>
>> Am I missing something or are a bunch of other XDP implementations also
>> wrong?
> 
> Only drivers which use the same Tx queues for the stack and XDP need
> this.

Ok that explains it. igc has few enough queues they need to be shared,
but other devices have more queues and can use dedicated queues.

Then this looks good to me!

Thanks,
Jake
David Laight April 13, 2023, 9:19 p.m. UTC | #5
From: Jacob Keller
> Sent: 13 April 2023 17:40
> 
> On 4/13/2023 9:03 AM, Jakub Kicinski wrote:
> > On Wed, 12 Apr 2023 15:30:38 -0700 Jacob Keller wrote:
> >> Is most driver's XDP implementation broken? There's also
> >> netif_trans_update but this is called out as a legacy only function. Far
> >> more drivers call this but I don't see either call or a direct update to
> >> trans_start in many XDP implementations...
> >>
> >> Am I missing something or are a bunch of other XDP implementations also
> >> wrong?
> >
> > Only drivers which use the same Tx queues for the stack and XDP need
> > this.
> 
> Ok that explains it. igc has few enough queues they need to be shared,
> but other devices have more queues and can use dedicated queues.

Aren't there some generic problems with multiple transmit queues?
The tg3 driver only uses a single tx queue because sending enough
large packets to saturate the network through one queue has the
effect of significantly delaying packets on other queues because
of the 'round robin' algorithm used by the hardware.

Transmit processing is also a lot less demanding than the receive
processing.
So you may want to split the receive traffic into multiple
queues (with the processing happening on multiple cpu) but
keep the transmit processing (which is much less critical)
only running on a single cpu - so with a single queue.
(Trying to process 10000 RTP streams gets 'interesting'.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
naamax.meir May 1, 2023, 10:01 a.m. UTC | #6
On 4/12/2023 10:36, Kurt Kanzenbach wrote:
> High XDP load triggers the netdev watchdog:
> 
> |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
> 
> The reason is the Tx queue transmission start (txq->trans_start) is not updated
> in XDP code path. Therefore, add it for all XDP transmission functions.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> ---
>   drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++
>   1 file changed, 8 insertions(+)

Tested-by: Naama Meir <naamax.meir@linux.intel.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ba49728be919..e71e85e3bcc2 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2384,6 +2384,8 @@  static int igc_xdp_xmit_back(struct igc_adapter *adapter, struct xdp_buff *xdp)
 	nq = txring_txq(ring);
 
 	__netif_tx_lock(nq, cpu);
+	/* Avoid transmit queue timeout since we share it with the slow path */
+	txq_trans_cond_update(nq);
 	res = igc_xdp_init_tx_descriptor(ring, xdpf);
 	__netif_tx_unlock(nq);
 	return res;
@@ -2786,6 +2788,9 @@  static void igc_xdp_xmit_zc(struct igc_ring *ring)
 
 	__netif_tx_lock(nq, cpu);
 
+	/* Avoid transmit queue timeout since we share it with the slow path */
+	txq_trans_cond_update(nq);
+
 	budget = igc_desc_unused(ring);
 
 	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
@@ -6311,6 +6316,9 @@  static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 
 	__netif_tx_lock(nq, cpu);
 
+	/* Avoid transmit queue timeout since we share it with the slow path */
+	txq_trans_cond_update(nq);
+
 	drops = 0;
 	for (i = 0; i < num_frames; i++) {
 		int err;