Message ID | 20240929024506.1527828-4-wei.fang@nxp.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | net: enetc: fix some issues of XDP | expand |
On Sun, Sep 29, 2024 at 10:45:06AM +0800, Wei Fang wrote: > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC > on LS1028A, it was found that if the command was re-run multiple times, > Rx could not receive the frames, and the result of xdo-bench showed > that the rx rate was 0. > > root@ls1028ardb:~# ./xdp-bench tx eno0 > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) > Summary 2046 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > > By observing the Rx PIR and CIR registers, we found that CIR is always > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring > is full and can no longer accommodate other Rx frames. Therefore, we > can conclude that the problem is caused by the Rx BD ring not being > cleaned up. > > Further analysis of the code revealed that the Rx BD ring will only > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. > Therefore, some debug logs were added to the driver and the current > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx > BD ring was full. The logs are as follows. > > [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 > [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 > [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 > > From the results, we can see that the max value of xdp_tx_in_flight > has reached 2140. However, the size of the Rx BD ring is only 2048. > This is incredible, so we checked the code again and found that > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled > and it was not reset when the bfp program was installed again. The > root cause is that the IRQ is disabled too early in enetc_stop(), > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, > xdp_tx_in_flight is not cleared. > > Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing") > Cc: stable@vger.kernel.org > Signed-off-by: Wei Fang <wei.fang@nxp.com> > --- > v2 changes: > 1. Modify the titile and rephrase the commit meesage. > 2. Use the new solution as described in the title > --- I gave this another test under a bit different set of circumstances this time, and I'm confident that there are still problems, which I haven't identified though (yet). With 64 byte frames at 2.5 Gbps, I see this going on: $ xdp-bench tx eno0 & $ while :; do taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 1 && sleep 1 && taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 0 && sleep 1; done current settings: tx_type 0 rx_filter 0 new settings: tx_type 0 rx_filter 1 Summary 1,556,952 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s current settings: tx_type 0 rx_filter 1 Summary 0 rx/s 0 err,drop/s [ 883.780346] fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear (its RX ring has 2072 XDP_TX frames in flight) new settings: tx_type 0 rx_filter 0 Summary 1,027 rx/s 0 err,drop/s current settings: tx_type 0 rx_filter 0 Summary 0 rx/s 0 err,drop/s which looks like the symptoms that the patch tries to solve. My previous testing was with 390 byte frames, and this did not happen. Please do not merge this.
> -----Original Message----- > From: Vladimir Oltean <vladimir.oltean@nxp.com> > Sent: 2024年10月1日 6:03 > To: Wei Fang <wei.fang@nxp.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; Claudiu Manoil <claudiu.manoil@nxp.com>; > ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org; > john.fastabend@gmail.com; linux-kernel@vger.kernel.org; > netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org; > imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com; > sbhatta@marvell.com > Subject: Re: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings > are disabled > > On Sun, Sep 29, 2024 at 10:45:06AM +0800, Wei Fang wrote: > > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC > > on LS1028A, it was found that if the command was re-run multiple times, > > Rx could not receive the frames, and the result of xdo-bench showed > > that the rx rate was 0. > > > > root@ls1028ardb:~# ./xdp-bench tx eno0 > > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) > > Summary 2046 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > > > By observing the Rx PIR and CIR registers, we found that CIR is always > > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring > > is full and can no longer accommodate other Rx frames. Therefore, we > > can conclude that the problem is caused by the Rx BD ring not being > > cleaned up. > > > > Further analysis of the code revealed that the Rx BD ring will only > > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. > > Therefore, some debug logs were added to the driver and the current > > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx > > BD ring was full. The logs are as follows. > > > > [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 > > [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 > > [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 > > > > From the results, we can see that the max value of xdp_tx_in_flight > > has reached 2140. However, the size of the Rx BD ring is only 2048. > > This is incredible, so we checked the code again and found that > > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled > > and it was not reset when the bfp program was installed again. The > > root cause is that the IRQ is disabled too early in enetc_stop(), > > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, > > xdp_tx_in_flight is not cleared. > > > > Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet > processing") > > Cc: stable@vger.kernel.org > > Signed-off-by: Wei Fang <wei.fang@nxp.com> > > --- > > v2 changes: > > 1. Modify the titile and rephrase the commit meesage. > > 2. Use the new solution as described in the title > > --- > > I gave this another test under a bit different set of circumstances this time, > and I'm confident that there are still problems, which I haven't identified > though (yet). > > With 64 byte frames at 2.5 Gbps, I see this going on: > > $ xdp-bench tx eno0 & > $ while :; do taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 1 && sleep 1 && taskset > $((1 << 0)) hwstamp_ctl -i eno0 -r 0 && sleep 1; done > current settings: > tx_type 0 > rx_filter 0 > new settings: > tx_type 0 > rx_filter 1 > Summary 1,556,952 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > current settings: > tx_type 0 > rx_filter 1 > Summary 0 rx/s 0 err,drop/s > [ 883.780346] fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear (its > RX ring has 2072 XDP_TX frames in flight) > new settings: > tx_type 0 > rx_filter 0 > Summary 1,027 rx/s 0 err,drop/s > current settings: > tx_type 0 > rx_filter 0 > Summary 0 rx/s 0 err,drop/s > > which looks like the symptoms that the patch tries to solve. > > My previous testing was with 390 byte frames, and this did not happen. > > Please do not merge this. Oh, it looks like there are still some issues we don't know about. I did test using 64 bytes but not at that high of a rate. Also I didn't turn on timestamp. Anyway, I will try to reproduce the issue when I'm back to office next Tuesday. It would be nice if you can help find the root cause before next Tuesday, thanks!
Best Regards, Wei Fang Hi Vladimir, > > > > On Sun, Sep 29, 2024 at 10:45:06AM +0800, Wei Fang wrote: > > > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC > > > on LS1028A, it was found that if the command was re-run multiple times, > > > Rx could not receive the frames, and the result of xdo-bench showed > > > that the rx rate was 0. > > > > > > root@ls1028ardb:~# ./xdp-bench tx eno0 > > > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) > > > Summary 2046 rx/s 0 > > err,drop/s > > > Summary 0 rx/s 0 > > err,drop/s > > > Summary 0 rx/s 0 > > err,drop/s > > > Summary 0 rx/s 0 > > err,drop/s > > > > > > By observing the Rx PIR and CIR registers, we found that CIR is always > > > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring > > > is full and can no longer accommodate other Rx frames. Therefore, we > > > can conclude that the problem is caused by the Rx BD ring not being > > > cleaned up. > > > > > > Further analysis of the code revealed that the Rx BD ring will only > > > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. > > > Therefore, some debug logs were added to the driver and the current > > > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx > > > BD ring was full. The logs are as follows. > > > > > > [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 > > > [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 > > > [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 > > > > > > From the results, we can see that the max value of xdp_tx_in_flight > > > has reached 2140. However, the size of the Rx BD ring is only 2048. > > > This is incredible, so we checked the code again and found that > > > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled > > > and it was not reset when the bfp program was installed again. The > > > root cause is that the IRQ is disabled too early in enetc_stop(), > > > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, > > > xdp_tx_in_flight is not cleared. > > > > > > Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet > > processing") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Wei Fang <wei.fang@nxp.com> > > > --- > > > v2 changes: > > > 1. Modify the titile and rephrase the commit meesage. > > > 2. Use the new solution as described in the title > > > --- > > > > I gave this another test under a bit different set of circumstances this time, > > and I'm confident that there are still problems, which I haven't identified > > though (yet). > > > > With 64 byte frames at 2.5 Gbps, I see this going on: > > > > $ xdp-bench tx eno0 & > > $ while :; do taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 1 && sleep 1 && > taskset > > $((1 << 0)) hwstamp_ctl -i eno0 -r 0 && sleep 1; done > > current settings: > > tx_type 0 > > rx_filter 0 > > new settings: > > tx_type 0 > > rx_filter 1 > > Summary 1,556,952 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > current settings: > > tx_type 0 > > rx_filter 1 > > Summary 0 rx/s 0 > err,drop/s > > [ 883.780346] fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear (its > > RX ring has 2072 XDP_TX frames in flight) > > new settings: > > tx_type 0 > > rx_filter 0 > > Summary 1,027 rx/s 0 > err,drop/s > > current settings: > > tx_type 0 > > rx_filter 0 > > Summary 0 rx/s 0 > err,drop/s > > > > which looks like the symptoms that the patch tries to solve. > > > > My previous testing was with 390 byte frames, and this did not happen. > > > > Please do not merge this. > > Oh, it looks like there are still some issues we don't know about. I did > test using 64 bytes but not at that high of a rate. Also I didn't turn on > timestamp. Anyway, I will try to reproduce the issue when I'm back to > office next Tuesday. It would be nice if you can help find the root cause > before next Tuesday, thanks! I think the reason is that Rx BDRs are disabled when enetc_stop() is called, but there are still many unprocessed frames on Rx BDR. These frames will be processed by XDP program and put into Tx BDR. So enetc_wait_txbdr() will timeout and cause xdp_tx_in_flight will not be cleared. So based on this patch, we should add a separate patch, similar to the patch 2 ("net: enetc: fix the issues of XDP_REDIRECT feature "), which prevents the XDP_TX frames from being put into Tx BDRs when the ENETC_TX_DOWN flag is set. The new patch is shown below. After adding this new patch, I followed your test steps and tested for more than 30 minutes, and the issue cannot be reproduced anymore (without this patch, this problem would be reproduced within seconds). --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -1606,6 +1606,12 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, break; case XDP_TX: tx_ring = priv->xdp_tx_ring[rx_ring->index]; + if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags))) { + enetc_xdp_drop(rx_ring, orig_i, i); + tx_ring->stats.xdp_tx_drops++; + break; + } + xdp_tx_bd_cnt = enetc_rx_swbd_to_xdp_tx_swbd(xdp_tx_arr, rx_ring, orig_i, i);
On Tue, Oct 08, 2024 at 06:30:49AM +0300, Wei Fang wrote: > I think the reason is that Rx BDRs are disabled when enetc_stop() is called, > but there are still many unprocessed frames on Rx BDR. These frames will > be processed by XDP program and put into Tx BDR. So enetc_wait_txbdr() > will timeout and cause xdp_tx_in_flight will not be cleared. > > So based on this patch, we should add a separate patch, similar to the patch > 2 ("net: enetc: fix the issues of XDP_REDIRECT feature "), which prevents the > XDP_TX frames from being put into Tx BDRs when the ENETC_TX_DOWN flag > is set. The new patch is shown below. After adding this new patch, I followed > your test steps and tested for more than 30 minutes, and the issue cannot be > reproduced anymore (without this patch, this problem would be reproduced > within seconds). > > --- a/drivers/net/ethernet/freescale/enetc/enetc.c > +++ b/drivers/net/ethernet/freescale/enetc/enetc.c > @@ -1606,6 +1606,12 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, > break; > case XDP_TX: > tx_ring = priv->xdp_tx_ring[rx_ring->index]; > + if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags))) { > + enetc_xdp_drop(rx_ring, orig_i, i); > + tx_ring->stats.xdp_tx_drops++; > + break; > + } > + > xdp_tx_bd_cnt = enetc_rx_swbd_to_xdp_tx_swbd(xdp_tx_arr, > rx_ring, > orig_i, i); Yeah, it works on my side as well. Thanks for following up. I would argue that the above snippet should be a fixup for the "net: enetc: fix the issues of XDP_REDIRECT feature" change, and a rewrite of the commit message is in order. Currently, as a reader, I get the impression that only XDP_REDIRECT needs to check the ENETC_TX_DOWN flag, only for the next patch to come and say "remember what was said about the TX ring not being allowed to actively transmit frames while disabling it? well, that patch wasn't sufficient to ensure this condition, because XDP_TX needs to respect that flag as well!"
> -----Original Message----- > From: Vladimir Oltean <vladimir.oltean@nxp.com> > Sent: 2024年10月9日 6:48 > To: Wei Fang <wei.fang@nxp.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; Claudiu Manoil <claudiu.manoil@nxp.com>; > ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org; > john.fastabend@gmail.com; linux-kernel@vger.kernel.org; > netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org; > imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com; > sbhatta@marvell.com > Subject: Re: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings > are disabled > > On Tue, Oct 08, 2024 at 06:30:49AM +0300, Wei Fang wrote: > > I think the reason is that Rx BDRs are disabled when enetc_stop() is called, > > but there are still many unprocessed frames on Rx BDR. These frames will > > be processed by XDP program and put into Tx BDR. So enetc_wait_txbdr() > > will timeout and cause xdp_tx_in_flight will not be cleared. > > > > So based on this patch, we should add a separate patch, similar to the patch > > 2 ("net: enetc: fix the issues of XDP_REDIRECT feature "), which prevents the > > XDP_TX frames from being put into Tx BDRs when the ENETC_TX_DOWN flag > > is set. The new patch is shown below. After adding this new patch, I followed > > your test steps and tested for more than 30 minutes, and the issue cannot be > > reproduced anymore (without this patch, this problem would be reproduced > > within seconds). > > > > --- a/drivers/net/ethernet/freescale/enetc/enetc.c > > +++ b/drivers/net/ethernet/freescale/enetc/enetc.c > > @@ -1606,6 +1606,12 @@ static int enetc_clean_rx_ring_xdp(struct > enetc_bdr *rx_ring, > > break; > > case XDP_TX: > > tx_ring = priv->xdp_tx_ring[rx_ring->index]; > > + if (unlikely(test_bit(ENETC_TX_DOWN, > &priv->flags))) { > > + enetc_xdp_drop(rx_ring, orig_i, i); > > + tx_ring->stats.xdp_tx_drops++; > > + break; > > + } > > + > > xdp_tx_bd_cnt = > enetc_rx_swbd_to_xdp_tx_swbd(xdp_tx_arr, > > > rx_ring, > > > orig_i, i); > > Yeah, it works on my side as well. Thanks for following up. > > I would argue that the above snippet should be a fixup for the > "net: enetc: fix the issues of XDP_REDIRECT feature" change, and a > rewrite of the commit message is in order. Currently, as a reader, I get > the impression that only XDP_REDIRECT needs to check the ENETC_TX_DOWN > flag, only for the next patch to come and say "remember what was said > about the TX ring not being allowed to actively transmit frames while > disabling it? well, that patch wasn't sufficient to ensure this condition, > because XDP_TX needs to respect that flag as well!" Okay, I will merge this snippet into "net: enetc: fix the issues of XDP_REDIRECT feature" and refine the commit message.
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 138c0a36f033..906f0edbfef8 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -2468,8 +2468,6 @@ void enetc_start(struct net_device *ndev) enetc_setup_interrupts(priv); - enetc_enable_tx_bdrs(priv); - for (i = 0; i < priv->bdr_int_num; i++) { int irq = pci_irq_vector(priv->si->pdev, ENETC_BDR_INT_BASE_IDX + i); @@ -2478,6 +2476,8 @@ void enetc_start(struct net_device *ndev) enable_irq(irq); } + enetc_enable_tx_bdrs(priv); + enetc_enable_rx_bdrs(priv); netif_tx_start_all_queues(ndev); @@ -2546,6 +2546,10 @@ void enetc_stop(struct net_device *ndev) enetc_disable_rx_bdrs(priv); + enetc_wait_bdrs(priv); + + enetc_disable_tx_bdrs(priv); + for (i = 0; i < priv->bdr_int_num; i++) { int irq = pci_irq_vector(priv->si->pdev, ENETC_BDR_INT_BASE_IDX + i); @@ -2555,10 +2559,6 @@ void enetc_stop(struct net_device *ndev) napi_disable(&priv->int_vector[i]->napi); } - enetc_wait_bdrs(priv); - - enetc_disable_tx_bdrs(priv); - enetc_clear_interrupts(priv); } EXPORT_SYMBOL_GPL(enetc_stop);
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC on LS1028A, it was found that if the command was re-run multiple times, Rx could not receive the frames, and the result of xdo-bench showed that the rx rate was 0. root@ls1028ardb:~# ./xdp-bench tx eno0 Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) Summary 2046 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s By observing the Rx PIR and CIR registers, we found that CIR is always equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring is full and can no longer accommodate other Rx frames. Therefore, we can conclude that the problem is caused by the Rx BD ring not being cleaned up. Further analysis of the code revealed that the Rx BD ring will only be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. Therefore, some debug logs were added to the driver and the current values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx BD ring was full. The logs are as follows. [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 From the results, we can see that the max value of xdp_tx_in_flight has reached 2140. However, the size of the Rx BD ring is only 2048. This is incredible, so we checked the code again and found that xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled and it was not reset when the bfp program was installed again. The root cause is that the IRQ is disabled too early in enetc_stop(), resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, xdp_tx_in_flight is not cleared. Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> --- v2 changes: 1. Modify the titile and rephrase the commit meesage. 2. Use the new solution as described in the title --- drivers/net/ethernet/freescale/enetc/enetc.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)