diff mbox series

[net-next,v2] net: airoha: Implement BQL support

Message ID 20241012-en7581-bql-v2-1-4deb4efdb60b@kernel.org (mailing list archive)
State Accepted
Commit 1d304174106c93ce05f6088813ad7203b3eb381a
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] net: airoha: Implement BQL support | expand

Commit Message

Lorenzo Bianconi Oct. 12, 2024, 9:01 a.m. UTC
Introduce BQL support in the airoha_eth driver reporting to the kernel
info about tx hw DMA queues in order to avoid bufferbloat and keep the
latency small.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes in v2:
- simplify BQL support and rebase on top of net-next main
- Link to v1: https://lore.kernel.org/r/20240930-en7581-bql-v1-1-064cdd570068@kernel.org
---
 drivers/net/ethernet/mediatek/airoha_eth.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


---
base-commit: c531f2269a53db5cf64b24baf785ccbcda52970f
change-id: 20240930-en7581-bql-552566f2078e

Best regards,

Comments

Jakub Kicinski Oct. 15, 2024, 2:32 p.m. UTC | #1
On Sat, 12 Oct 2024 11:01:11 +0200 Lorenzo Bianconi wrote:
> Introduce BQL support in the airoha_eth driver reporting to the kernel
> info about tx hw DMA queues in order to avoid bufferbloat and keep the
> latency small.

TBH I haven't looked at the code again, but when I looked at v1 I was
surprised you don't have a reset in airoha_qdma_cleanup_tx_queue().
Are you sure it's okay? It's a common bug not to reset the BQL state
when queue is purged while stopping the interface.
Lorenzo Bianconi Oct. 15, 2024, 2:39 p.m. UTC | #2
On Oct 15, Jakub Kicinski wrote:
> On Sat, 12 Oct 2024 11:01:11 +0200 Lorenzo Bianconi wrote:
> > Introduce BQL support in the airoha_eth driver reporting to the kernel
> > info about tx hw DMA queues in order to avoid bufferbloat and keep the
> > latency small.
> 
> TBH I haven't looked at the code again, but when I looked at v1 I was
> surprised you don't have a reset in airoha_qdma_cleanup_tx_queue().
> Are you sure it's okay? It's a common bug not to reset the BQL state
> when queue is purged while stopping the interface.

So far airoha_qdma_cleanup_tx_queue() is called just in airoha_hw_cleanup()
that in turn runs just when the module is removed (airoha_remove()).
Do we need it?

Regards,
Lorenzo
Jakub Kicinski Oct. 15, 2024, 2:52 p.m. UTC | #3
On Tue, 15 Oct 2024 16:39:08 +0200 Lorenzo Bianconi wrote:
> On Oct 15, Jakub Kicinski wrote:
> > On Sat, 12 Oct 2024 11:01:11 +0200 Lorenzo Bianconi wrote:  
> > > Introduce BQL support in the airoha_eth driver reporting to the kernel
> > > info about tx hw DMA queues in order to avoid bufferbloat and keep the
> > > latency small.  
> > 
> > TBH I haven't looked at the code again, but when I looked at v1 I was
> > surprised you don't have a reset in airoha_qdma_cleanup_tx_queue().
> > Are you sure it's okay? It's a common bug not to reset the BQL state
> > when queue is purged while stopping the interface.  
> 
> So far airoha_qdma_cleanup_tx_queue() is called just in airoha_hw_cleanup()
> that in turn runs just when the module is removed (airoha_remove()).
> Do we need it?

Oh, thought its called on stop. In that case we're probably good
from BQL perspective.

But does it mean potentially very stale packets can sit on the Tx
ring when the device is stopped, until it's started again?
Lorenzo Bianconi Oct. 15, 2024, 3:54 p.m. UTC | #4
On Oct 15, Jakub Kicinski wrote:
> On Tue, 15 Oct 2024 16:39:08 +0200 Lorenzo Bianconi wrote:
> > On Oct 15, Jakub Kicinski wrote:
> > > On Sat, 12 Oct 2024 11:01:11 +0200 Lorenzo Bianconi wrote:  
> > > > Introduce BQL support in the airoha_eth driver reporting to the kernel
> > > > info about tx hw DMA queues in order to avoid bufferbloat and keep the
> > > > latency small.  
> > > 
> > > TBH I haven't looked at the code again, but when I looked at v1 I was
> > > surprised you don't have a reset in airoha_qdma_cleanup_tx_queue().
> > > Are you sure it's okay? It's a common bug not to reset the BQL state
> > > when queue is purged while stopping the interface.  
> > 
> > So far airoha_qdma_cleanup_tx_queue() is called just in airoha_hw_cleanup()
> > that in turn runs just when the module is removed (airoha_remove()).
> > Do we need it?
> 
> Oh, thought its called on stop. In that case we're probably good
> from BQL perspective.
> 
> But does it mean potentially very stale packets can sit on the Tx
> ring when the device is stopped, until it's started again?

Do you mean the packets that the stack is transmitting when the .ndo_stop() is
run? 
In airoha_dev_stop() we call netif_tx_disable() to disable the transmission on
new packets and inflight packets will be consumed by the completion napi,
is it not enough? I guess we can even add netdev_tx_reset_subqueue() for all netdev
queues in airoha_dev_stop(), I do not have a strong opinion about it. What
do you prefer?

Regards,
Lorenzo
Jakub Kicinski Oct. 15, 2024, 4:09 p.m. UTC | #5
On Tue, 15 Oct 2024 17:54:59 +0200 Lorenzo Bianconi wrote:
> > Oh, thought its called on stop. In that case we're probably good
> > from BQL perspective.
> > 
> > But does it mean potentially very stale packets can sit on the Tx
> > ring when the device is stopped, until it's started again?  
> 
> Do you mean the packets that the stack is transmitting when the .ndo_stop()
> is run?

Whatever is in the queue at the time ndo_stop() gets called.
Could be the full descriptor ring I presume?

> In airoha_dev_stop() we call netif_tx_disable() to disable the transmission
> on new packets and inflight packets will be consumed by the completion napi,
> is it not enough?

They will only get consumed if the DMA gets to them right?
Stop seems to stop the DMA.

> I guess we can even add netdev_tx_reset_subqueue() for all netdev
> queues in airoha_dev_stop(), I do not have a strong opinion about it. What
> do you prefer?

So to be clear I think this patch is correct as of the current driver
code. I'm just wondering if we should call airoha_qdma_cleanup_tx_queue()
on stop as well, and then that should come with the reset.
I think having a packet stuck in a queue may lead to all sort of oddness
so my recommendation would be to flush the queues.
Lorenzo Bianconi Oct. 15, 2024, 4:35 p.m. UTC | #6
> On Tue, 15 Oct 2024 17:54:59 +0200 Lorenzo Bianconi wrote:
> > > Oh, thought its called on stop. In that case we're probably good
> > > from BQL perspective.
> > > 
> > > But does it mean potentially very stale packets can sit on the Tx
> > > ring when the device is stopped, until it's started again?  
> > 
> > Do you mean the packets that the stack is transmitting when the .ndo_stop()
> > is run?
> 
> Whatever is in the queue at the time ndo_stop() gets called.
> Could be the full descriptor ring I presume?
> 
> > In airoha_dev_stop() we call netif_tx_disable() to disable the transmission
> > on new packets and inflight packets will be consumed by the completion napi,
> > is it not enough?
> 
> They will only get consumed if the DMA gets to them right?
> Stop seems to stop the DMA.
> 
> > I guess we can even add netdev_tx_reset_subqueue() for all netdev
> > queues in airoha_dev_stop(), I do not have a strong opinion about it. What
> > do you prefer?
> 
> So to be clear I think this patch is correct as of the current driver
> code. I'm just wondering if we should call airoha_qdma_cleanup_tx_queue()
> on stop as well, and then that should come with the reset.
> I think having a packet stuck in a queue may lead to all sort of oddness
> so my recommendation would be to flush the queues.

ack, I will post a fix for this. Do you prefer to resend even this patch? Up to you.

Regards,
Lorenzo
Jakub Kicinski Oct. 15, 2024, 4:49 p.m. UTC | #7
On Tue, 15 Oct 2024 18:35:22 +0200 Lorenzo Bianconi wrote:
> > So to be clear I think this patch is correct as of the current driver
> > code. I'm just wondering if we should call airoha_qdma_cleanup_tx_queue()
> > on stop as well, and then that should come with the reset.
> > I think having a packet stuck in a queue may lead to all sort of oddness
> > so my recommendation would be to flush the queues.  
> 
> ack, I will post a fix for this. Do you prefer to resend even this patch? Up to you.

I'll take it. I don't think the stop rework will be a fix.
patchwork-bot+netdevbpf@kernel.org Oct. 15, 2024, 5:10 p.m. UTC | #8
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 12 Oct 2024 11:01:11 +0200 you wrote:
> Introduce BQL support in the airoha_eth driver reporting to the kernel
> info about tx hw DMA queues in order to avoid bufferbloat and keep the
> latency small.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> Changes in v2:
> - simplify BQL support and rebase on top of net-next main
> - Link to v1: https://lore.kernel.org/r/20240930-en7581-bql-v1-1-064cdd570068@kernel.org
> 
> [...]

Here is the summary with links:
  - [net-next,v2] net: airoha: Implement BQL support
    https://git.kernel.org/netdev/net-next/c/1d304174106c

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c
index e037f725f6d3505a8b91815ae26322f5d1b8590c..836a957aad77bec972c536567a4ee7d304ac7b52 100644
--- a/drivers/net/ethernet/mediatek/airoha_eth.c
+++ b/drivers/net/ethernet/mediatek/airoha_eth.c
@@ -1709,9 +1709,11 @@  static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
 			WRITE_ONCE(desc->msg1, 0);
 
 			if (skb) {
+				u16 queue = skb_get_queue_mapping(skb);
 				struct netdev_queue *txq;
 
-				txq = netdev_get_tx_queue(skb->dev, qid);
+				txq = netdev_get_tx_queue(skb->dev, queue);
+				netdev_tx_completed_queue(txq, 1, skb->len);
 				if (netif_tx_queue_stopped(txq) &&
 				    q->ndesc - q->queued >= q->free_thr)
 					netif_tx_wake_queue(txq);
@@ -2487,7 +2489,9 @@  static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 	q->queued += i;
 
 	skb_tx_timestamp(skb);
-	if (!netdev_xmit_more())
+	netdev_tx_sent_queue(txq, skb->len);
+
+	if (netif_xmit_stopped(txq) || !netdev_xmit_more())
 		airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid),
 				TX_RING_CPU_IDX_MASK,
 				FIELD_PREP(TX_RING_CPU_IDX_MASK, q->head));