diff mbox series

[net,v3,3/3] bnxt_en: handle tpa_info in queue API implementation

Message ID 20241204041022.56512-4-dw@davidwei.uk (mailing list archive)
State Accepted
Commit bd649c5cc958169b8a8a3e77ea926d92d472b02a
Delegated to: Netdev Maintainers
Headers show
Series bnxt_en: support header page pool in queue API | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 304 this patch: 304
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 80 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-12-04--15-02 (tests: 760)

Commit Message

David Wei Dec. 4, 2024, 4:10 a.m. UTC
Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.

But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.

Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 27 +++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Comments

Yunsheng Lin Dec. 10, 2024, 12:25 p.m. UTC | #1
On 2024/12/4 12:10, David Wei wrote:

>  	bnxt_copy_rx_ring(bp, rxr, clone);
> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>  	rxr->rx_next_cons = 0;
>  	page_pool_disable_direct_recycling(rxr->page_pool);
> +	if (bnxt_separate_head_pool())
> +		page_pool_disable_direct_recycling(rxr->head_pool);

Hi, David
As mentioned in [1], is the above page_pool_disable_direct_recycling()
really needed?

Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
It doesn't seem obvious there is any NAPI API like napi_enable() &
____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.

1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
David Wei Dec. 10, 2024, 6:14 p.m. UTC | #2
On 2024-12-10 04:25, Yunsheng Lin wrote:
> On 2024/12/4 12:10, David Wei wrote:
> 
>>  	bnxt_copy_rx_ring(bp, rxr, clone);
>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>  	rxr->rx_next_cons = 0;
>>  	page_pool_disable_direct_recycling(rxr->page_pool);
>> +	if (bnxt_separate_head_pool())
>> +		page_pool_disable_direct_recycling(rxr->head_pool);
> 
> Hi, David
> As mentioned in [1], is the above page_pool_disable_direct_recycling()
> really needed?
> 
> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
> It doesn't seem obvious there is any NAPI API like napi_enable() &
> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
> 
> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/

Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
that I'm trying to stop. Calling napi_disable() has unintended side
effects on the Tx side.

The intent of the call to page_pool_disable_direct_recycling() is to
prevent pages from the old page pool from being returned into the fast
cache. These pages must be returned via page_pool_return_page() so that
the it can eventually be freed in page_pool_release_retry().

I'm going to take a look at your discussions in [1] and respond there.
Yunsheng Lin Dec. 11, 2024, 12:32 p.m. UTC | #3
On 2024/12/11 2:14, David Wei wrote:
> On 2024-12-10 04:25, Yunsheng Lin wrote:
>> On 2024/12/4 12:10, David Wei wrote:
>>
>>>  	bnxt_copy_rx_ring(bp, rxr, clone);
>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>>  	rxr->rx_next_cons = 0;
>>>  	page_pool_disable_direct_recycling(rxr->page_pool);
>>> +	if (bnxt_separate_head_pool())
>>> +		page_pool_disable_direct_recycling(rxr->head_pool);
>>
>> Hi, David
>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
>> really needed?
>>
>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
>> It doesn't seem obvious there is any NAPI API like napi_enable() &
>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
>>
>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
> 
> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
> that I'm trying to stop. Calling napi_disable() has unintended side
> effects on the Tx side.

It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
a VNIC? and a napi_disable() is not needed? Is it possible that there may
be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
is called?

> 
> The intent of the call to page_pool_disable_direct_recycling() is to
> prevent pages from the old page pool from being returned into the fast
> cache. These pages must be returned via page_pool_return_page() so that
> the it can eventually be freed in page_pool_release_retry().
> 
> I'm going to take a look at your discussions in [1] and respond there.

Thanks.
David Wei Dec. 11, 2024, 4:10 p.m. UTC | #4
On 2024-12-11 04:32, Yunsheng Lin wrote:
> On 2024/12/11 2:14, David Wei wrote:
>> On 2024-12-10 04:25, Yunsheng Lin wrote:
>>> On 2024/12/4 12:10, David Wei wrote:
>>>
>>>>  	bnxt_copy_rx_ring(bp, rxr, clone);
>>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>>>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>>>  	rxr->rx_next_cons = 0;
>>>>  	page_pool_disable_direct_recycling(rxr->page_pool);
>>>> +	if (bnxt_separate_head_pool())
>>>> +		page_pool_disable_direct_recycling(rxr->head_pool);
>>>
>>> Hi, David
>>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
>>> really needed?
>>>
>>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
>>> It doesn't seem obvious there is any NAPI API like napi_enable() &
>>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
>>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
>>>
>>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
>>
>> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
>> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
>> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
>> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
>> that I'm trying to stop. Calling napi_disable() has unintended side
>> effects on the Tx side.
> 
> It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
> a VNIC? and a napi_disable() is not needed?

Correct.

> Is it possible that there may
> be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
> is called?

Possibly, I don't know the details of how the HW works.

At the time I just wanted something to work, and not having
napi_enable/disable() made it work. :) Looking back though it does seem
odd, so I'll try putting it back.

> 
>>
>> The intent of the call to page_pool_disable_direct_recycling() is to
>> prevent pages from the old page pool from being returned into the fast
>> cache. These pages must be returned via page_pool_return_page() so that
>> the it can eventually be freed in page_pool_release_retry().
>>
>> I'm going to take a look at your discussions in [1] and respond there.
> 
> Thanks.
Michael Chan Dec. 11, 2024, 5:11 p.m. UTC | #5
On Wed, Dec 11, 2024 at 8:10 AM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-12-11 04:32, Yunsheng Lin wrote:
> > On 2024/12/11 2:14, David Wei wrote:
> >> On 2024-12-10 04:25, Yunsheng Lin wrote:
> >>> On 2024/12/4 12:10, David Wei wrote:
> >>>
> >>>>    bnxt_copy_rx_ring(bp, rxr, clone);
> >>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
> >>>>    bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
> >>>>    rxr->rx_next_cons = 0;
> >>>>    page_pool_disable_direct_recycling(rxr->page_pool);
> >>>> +  if (bnxt_separate_head_pool())
> >>>> +          page_pool_disable_direct_recycling(rxr->head_pool);
> >>>
> >>> Hi, David
> >>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
> >>> really needed?
> >>>
> >>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
> >>> It doesn't seem obvious there is any NAPI API like napi_enable() &
> >>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
> >>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
> >>>
> >>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
> >>
> >> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
> >> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
> >> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
> >> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
> >> that I'm trying to stop. Calling napi_disable() has unintended side
> >> effects on the Tx side.
> >
> > It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
> > a VNIC? and a napi_disable() is not needed?
>
> Correct.
>
> > Is it possible that there may
> > be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
> > is called?
>
> Possibly, I don't know the details of how the HW works.
>

bnxt_hwrm_vnic_update() only stops the HW from receiving more packets.
The completion may already have lots of RX entries and TPA entries.
NAPI may be behind and it can take a while to process a batch of 64
entries at a time to go through all the remaining entries.

> At the time I just wanted something to work, and not having
> napi_enable/disable() made it work. :) Looking back though it does seem
> odd, so I'll try putting it back.

Yeah, I think it makes sense to add napi_disable().  Thanks.
Jakub Kicinski Dec. 12, 2024, 12:48 a.m. UTC | #6
On Wed, 11 Dec 2024 09:11:55 -0800 Michael Chan wrote:
> > At the time I just wanted something to work, and not having
> > napi_enable/disable() made it work. :) Looking back though it does seem
> > odd, so I'll try putting it back.  
> 
> Yeah, I think it makes sense to add napi_disable().

+1, TBH I'm not sure how we avoid hitting the warning which checks
if NAPI is "scheduled" in page_pool_disable_direct_recycling().

But, Yunsheng, I hope it is clear that the sync RCU is needed even 
if driver disables NAPI for the reconfiguration. Unless you see a
RCU sync in napi_disable() / napi_enable()..
Yunsheng Lin Dec. 12, 2024, 11:23 a.m. UTC | #7
On 2024/12/12 8:48, Jakub Kicinski wrote:
 > But, Yunsheng, I hope it is clear that the sync RCU is needed even
> if driver disables NAPI for the reconfiguration. Unless you see a
> RCU sync in napi_disable() / napi_enable()..

It seems that depends on calling order of napi_disable()/napi_enable()
and page_pool_destroy() for old page_pool.
It seems an extra RCU sync is not really needed if page_pool_destroy()
for the old page_pool is called between napi_disable() and napi_enable()
as page_pool_destroy() already have a RCU sync.
Jakub Kicinski Dec. 12, 2024, 2:56 p.m. UTC | #8
On Thu, 12 Dec 2024 19:23:52 +0800 Yunsheng Lin wrote:
> It seems an extra RCU sync is not really needed if page_pool_destroy()
> for the old page_pool is called between napi_disable() and napi_enable()
> as page_pool_destroy() already have a RCU sync.

I did my best.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8031ff31f837..6b963086c1d3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3710,7 +3710,7 @@  static void bnxt_free_rx_rings(struct bnxt *bp)
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
 		page_pool_destroy(rxr->page_pool);
-		if (rxr->page_pool != rxr->head_pool)
+		if (bnxt_separate_head_pool())
 			page_pool_destroy(rxr->head_pool);
 		rxr->page_pool = rxr->head_pool = NULL;
 
@@ -15388,15 +15388,25 @@  static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 			goto err_free_rx_agg_ring;
 	}
 
+	if (bp->flags & BNXT_FLAG_TPA) {
+		rc = bnxt_alloc_one_tpa_info(bp, clone);
+		if (rc)
+			goto err_free_tpa_info;
+	}
+
 	bnxt_init_one_rx_ring_rxbd(bp, clone);
 	bnxt_init_one_rx_agg_ring_rxbd(bp, clone);
 
 	bnxt_alloc_one_rx_ring_skb(bp, clone, idx);
 	if (bp->flags & BNXT_FLAG_AGG_RINGS)
 		bnxt_alloc_one_rx_ring_page(bp, clone, idx);
+	if (bp->flags & BNXT_FLAG_TPA)
+		bnxt_alloc_one_tpa_info_data(bp, clone);
 
 	return 0;
 
+err_free_tpa_info:
+	bnxt_free_one_tpa_info(bp, clone);
 err_free_rx_agg_ring:
 	bnxt_free_ring(bp, &clone->rx_agg_ring_struct.ring_mem);
 err_free_rx_ring:
@@ -15404,9 +15414,11 @@  static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 err_rxq_info_unreg:
 	xdp_rxq_info_unreg(&clone->xdp_rxq);
 err_page_pool_destroy:
-	clone->page_pool->p.napi = NULL;
 	page_pool_destroy(clone->page_pool);
+	if (bnxt_separate_head_pool())
+		page_pool_destroy(clone->head_pool);
 	clone->page_pool = NULL;
+	clone->head_pool = NULL;
 	return rc;
 }
 
@@ -15416,13 +15428,15 @@  static void bnxt_queue_mem_free(struct net_device *dev, void *qmem)
 	struct bnxt *bp = netdev_priv(dev);
 	struct bnxt_ring_struct *ring;
 
-	bnxt_free_one_rx_ring(bp, rxr);
-	bnxt_free_one_rx_agg_ring(bp, rxr);
+	bnxt_free_one_rx_ring_skbs(bp, rxr);
 
 	xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
 	page_pool_destroy(rxr->page_pool);
+	if (bnxt_separate_head_pool())
+		page_pool_destroy(rxr->head_pool);
 	rxr->page_pool = NULL;
+	rxr->head_pool = NULL;
 
 	ring = &rxr->rx_ring_struct;
 	bnxt_free_ring(bp, &ring->ring_mem);
@@ -15504,7 +15518,10 @@  static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
 	rxr->rx_agg_prod = clone->rx_agg_prod;
 	rxr->rx_sw_agg_prod = clone->rx_sw_agg_prod;
 	rxr->rx_next_cons = clone->rx_next_cons;
+	rxr->rx_tpa = clone->rx_tpa;
+	rxr->rx_tpa_idx_map = clone->rx_tpa_idx_map;
 	rxr->page_pool = clone->page_pool;
+	rxr->head_pool = clone->head_pool;
 	rxr->xdp_rxq = clone->xdp_rxq;
 
 	bnxt_copy_rx_ring(bp, rxr, clone);
@@ -15563,6 +15580,8 @@  static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
 	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
 	rxr->rx_next_cons = 0;
 	page_pool_disable_direct_recycling(rxr->page_pool);
+	if (bnxt_separate_head_pool())
+		page_pool_disable_direct_recycling(rxr->head_pool);
 
 	memcpy(qmem, rxr, sizeof(*rxr));
 	bnxt_init_rx_ring_struct(bp, qmem);