diff mbox series

net: broadcom: bcm4908_enet: report queued and transmitted bytes

Message ID 20221026142624.19314-1-zajec5@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: broadcom: bcm4908_enet: report queued and transmitted bytes | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 37 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Rafał Miłecki Oct. 26, 2022, 2:26 p.m. UTC
From: Rafał Miłecki <rafal@milecki.pl>

This allows BQL to operate avoiding buffer bloat and reducing latency.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
---
 drivers/net/ethernet/broadcom/bcm4908_enet.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Florian Fainelli Oct. 26, 2022, 2:58 p.m. UTC | #1
On 10/26/2022 7:26 AM, Rafał Miłecki wrote:
> From: Rafał Miłecki <rafal@milecki.pl>
> 
> This allows BQL to operate avoiding buffer bloat and reducing latency.
> 
> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
> ---
>   drivers/net/ethernet/broadcom/bcm4908_enet.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcm4908_enet.c b/drivers/net/ethernet/broadcom/bcm4908_enet.c
> index 93ccf549e2ed..e672a9ef4444 100644
> --- a/drivers/net/ethernet/broadcom/bcm4908_enet.c
> +++ b/drivers/net/ethernet/broadcom/bcm4908_enet.c
> @@ -495,6 +495,7 @@ static int bcm4908_enet_stop(struct net_device *netdev)
>   	netif_carrier_off(netdev);
>   	napi_disable(&rx_ring->napi);
>   	napi_disable(&tx_ring->napi);
> +	netdev_reset_queue(netdev);
>   
>   	bcm4908_enet_dma_rx_ring_disable(enet, &enet->rx_ring);
>   	bcm4908_enet_dma_tx_ring_disable(enet, &enet->tx_ring);
> @@ -564,6 +565,8 @@ static netdev_tx_t bcm4908_enet_start_xmit(struct sk_buff *skb, struct net_devic
>   	enet->netdev->stats.tx_bytes += skb->len;
>   	enet->netdev->stats.tx_packets++;
>   
> +	netdev_sent_queue(enet->netdev, skb->len);

There is an opportunity for fixing an use after free here, after you 
call bcm4908_enet_dma_tx_ring_enable() the hardware can start 
transmission right away and also call the TX completion handler, so you 
could be de-referencing a freed skb reference at this point. Also, to 
ensure that DMA is actually functional, it is recommended to increase TX 
stats in the TX completion handler, since that indicates that you have a 
functional completion process.

So long story short, if you record the skb length *before* calling 
bcm4908_enet_dma_tx_ring_enable() and use that for reporting sent bytes, 
you should be good.
Rafał Miłecki Oct. 26, 2022, 3:12 p.m. UTC | #2
On 26.10.2022 16:58, Florian Fainelli wrote:
> On 10/26/2022 7:26 AM, Rafał Miłecki wrote:
>> From: Rafał Miłecki <rafal@milecki.pl>
>>
>> This allows BQL to operate avoiding buffer bloat and reducing latency.
>>
>> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
>> ---
>>   drivers/net/ethernet/broadcom/bcm4908_enet.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bcm4908_enet.c b/drivers/net/ethernet/broadcom/bcm4908_enet.c
>> index 93ccf549e2ed..e672a9ef4444 100644
>> --- a/drivers/net/ethernet/broadcom/bcm4908_enet.c
>> +++ b/drivers/net/ethernet/broadcom/bcm4908_enet.c
>> @@ -495,6 +495,7 @@ static int bcm4908_enet_stop(struct net_device *netdev)
>>       netif_carrier_off(netdev);
>>       napi_disable(&rx_ring->napi);
>>       napi_disable(&tx_ring->napi);
>> +    netdev_reset_queue(netdev);
>>       bcm4908_enet_dma_rx_ring_disable(enet, &enet->rx_ring);
>>       bcm4908_enet_dma_tx_ring_disable(enet, &enet->tx_ring);
>> @@ -564,6 +565,8 @@ static netdev_tx_t bcm4908_enet_start_xmit(struct sk_buff *skb, struct net_devic
>>       enet->netdev->stats.tx_bytes += skb->len;
>>       enet->netdev->stats.tx_packets++;
>> +    netdev_sent_queue(enet->netdev, skb->len);
> 
> There is an opportunity for fixing an use after free here, after you call bcm4908_enet_dma_tx_ring_enable() the hardware can start transmission right away and also call the TX completion handler, so you could be de-referencing a freed skb reference at this point. Also, to ensure that DMA is actually functional, it is recommended to increase TX stats in the TX completion handler, since that indicates that you have a functional completion process.

I see the problem, thanks!

Actually hw may start transmission even earlier - right after filling
buf_desc coherent struct.


> So long story short, if you record the skb length *before* calling bcm4908_enet_dma_tx_ring_enable() and use that for reporting sent bytes, you should be good.

I may still end up calling netdev_completed_queue() for data for which
I didn't call netdev_sent_queue() yet. Is that safe?

Maybe I just just call netdev_sent_queue() before updating the buf_desc?
Florian Fainelli Oct. 26, 2022, 7:53 p.m. UTC | #3
On 10/26/22 08:12, Rafał Miłecki wrote:
> On 26.10.2022 16:58, Florian Fainelli wrote:
>> On 10/26/2022 7:26 AM, Rafał Miłecki wrote:
>>> From: Rafał Miłecki <rafal@milecki.pl>
>>>
>>> This allows BQL to operate avoiding buffer bloat and reducing latency.
>>>
>>> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
>>> ---
>>>   drivers/net/ethernet/broadcom/bcm4908_enet.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/bcm4908_enet.c 
>>> b/drivers/net/ethernet/broadcom/bcm4908_enet.c
>>> index 93ccf549e2ed..e672a9ef4444 100644
>>> --- a/drivers/net/ethernet/broadcom/bcm4908_enet.c
>>> +++ b/drivers/net/ethernet/broadcom/bcm4908_enet.c
>>> @@ -495,6 +495,7 @@ static int bcm4908_enet_stop(struct net_device 
>>> *netdev)
>>>       netif_carrier_off(netdev);
>>>       napi_disable(&rx_ring->napi);
>>>       napi_disable(&tx_ring->napi);
>>> +    netdev_reset_queue(netdev);
>>>       bcm4908_enet_dma_rx_ring_disable(enet, &enet->rx_ring);
>>>       bcm4908_enet_dma_tx_ring_disable(enet, &enet->tx_ring);
>>> @@ -564,6 +565,8 @@ static netdev_tx_t bcm4908_enet_start_xmit(struct 
>>> sk_buff *skb, struct net_devic
>>>       enet->netdev->stats.tx_bytes += skb->len;
>>>       enet->netdev->stats.tx_packets++;
>>> +    netdev_sent_queue(enet->netdev, skb->len);
>>
>> There is an opportunity for fixing an use after free here, after you 
>> call bcm4908_enet_dma_tx_ring_enable() the hardware can start 
>> transmission right away and also call the TX completion handler, so 
>> you could be de-referencing a freed skb reference at this point. Also, 
>> to ensure that DMA is actually functional, it is recommended to 
>> increase TX stats in the TX completion handler, since that indicates 
>> that you have a functional completion process.
> 
> I see the problem, thanks!
> 
> Actually hw may start transmission even earlier - right after filling
> buf_desc coherent struct.

Not familiar with that hardware, but in premise yes, I suppose once you 
write a proper address and length the DMA can notice and start 
transmitting. Also even though you are using non-coherent memory, there 
appears to be a missing dma_wmb() between the store to buf_desc->ctl and 
buf_desc->addr. There is no explicit dependency between those two stores 
and subsequent loads or stores, so the processor write buffer could 
re-order those in theory. Unlikely to happen because this used on a 
Cortex-A53 IIRC, but better safe than sorry.

> 
> 
>> So long story short, if you record the skb length *before* calling 
>> bcm4908_enet_dma_tx_ring_enable() and use that for reporting sent 
>> bytes, you should be good.
> 
> I may still end up calling netdev_completed_queue() for data for which
> I didn't call netdev_sent_queue() yet. Is that safe?
> 
> Maybe I just just call netdev_sent_queue() before updating the buf_desc?

You would want it to be as close a possible from when you hand the 
buffer to the hardware, but I see no locking between 
bcm4908_start_xmit() and bcm4908_enet_irq_handler() so you already have 
a race don't you?
Rafał Miłecki Oct. 26, 2022, 8:15 p.m. UTC | #4
On 26.10.2022 16:26, Rafał Miłecki wrote:
> From: Rafał Miłecki <rafal@milecki.pl>
> 
> This allows BQL to operate avoiding buffer bloat and reducing latency.
> 
> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>

Please drop it, I'll work on V2.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bcm4908_enet.c b/drivers/net/ethernet/broadcom/bcm4908_enet.c
index 93ccf549e2ed..e672a9ef4444 100644
--- a/drivers/net/ethernet/broadcom/bcm4908_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm4908_enet.c
@@ -495,6 +495,7 @@  static int bcm4908_enet_stop(struct net_device *netdev)
 	netif_carrier_off(netdev);
 	napi_disable(&rx_ring->napi);
 	napi_disable(&tx_ring->napi);
+	netdev_reset_queue(netdev);
 
 	bcm4908_enet_dma_rx_ring_disable(enet, &enet->rx_ring);
 	bcm4908_enet_dma_tx_ring_disable(enet, &enet->tx_ring);
@@ -564,6 +565,8 @@  static netdev_tx_t bcm4908_enet_start_xmit(struct sk_buff *skb, struct net_devic
 	enet->netdev->stats.tx_bytes += skb->len;
 	enet->netdev->stats.tx_packets++;
 
+	netdev_sent_queue(enet->netdev, skb->len);
+
 	return NETDEV_TX_OK;
 }
 
@@ -635,6 +638,7 @@  static int bcm4908_enet_poll_tx(struct napi_struct *napi, int weight)
 	struct bcm4908_enet_dma_ring_bd *buf_desc;
 	struct bcm4908_enet_dma_ring_slot *slot;
 	struct device *dev = enet->dev;
+	unsigned int bytes = 0;
 	int handled = 0;
 
 	while (handled < weight && tx_ring->read_idx != tx_ring->write_idx) {
@@ -645,6 +649,7 @@  static int bcm4908_enet_poll_tx(struct napi_struct *napi, int weight)
 
 		dma_unmap_single(dev, slot->dma_addr, slot->len, DMA_TO_DEVICE);
 		dev_kfree_skb(slot->skb);
+		bytes += slot->len;
 		if (++tx_ring->read_idx == tx_ring->length)
 			tx_ring->read_idx = 0;
 
@@ -656,6 +661,8 @@  static int bcm4908_enet_poll_tx(struct napi_struct *napi, int weight)
 		bcm4908_enet_dma_ring_intrs_on(enet, tx_ring);
 	}
 
+	netdev_completed_queue(enet->netdev, handled, bytes);
+
 	if (netif_queue_stopped(enet->netdev))
 		netif_wake_queue(enet->netdev);