Message ID | 20241204041022.56512-4-dw@davidwei.uk (mailing list archive) |
---|---|
State | Accepted |
Commit | bd649c5cc958169b8a8a3e77ea926d92d472b02a |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bnxt_en: support header page pool in queue API | expand |
On 2024/12/4 12:10, David Wei wrote: > bnxt_copy_rx_ring(bp, rxr, clone); > @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); > rxr->rx_next_cons = 0; > page_pool_disable_direct_recycling(rxr->page_pool); > + if (bnxt_separate_head_pool()) > + page_pool_disable_direct_recycling(rxr->head_pool); Hi, David As mentioned in [1], is the above page_pool_disable_direct_recycling() really needed? Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops? It doesn't seem obvious there is any NAPI API like napi_enable() & ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/ bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading. 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
On 2024-12-10 04:25, Yunsheng Lin wrote: > On 2024/12/4 12:10, David Wei wrote: > >> bnxt_copy_rx_ring(bp, rxr, clone); >> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) >> bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); >> rxr->rx_next_cons = 0; >> page_pool_disable_direct_recycling(rxr->page_pool); >> + if (bnxt_separate_head_pool()) >> + page_pool_disable_direct_recycling(rxr->head_pool); > > Hi, David > As mentioned in [1], is the above page_pool_disable_direct_recycling() > really needed? > > Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops? > It doesn't seem obvious there is any NAPI API like napi_enable() & > ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/ > bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading. > > 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/ Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW quirks. I looked back at my discussions w/ Broadcom, and IIU/RC bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq that I'm trying to stop. Calling napi_disable() has unintended side effects on the Tx side. The intent of the call to page_pool_disable_direct_recycling() is to prevent pages from the old page pool from being returned into the fast cache. These pages must be returned via page_pool_return_page() so that the it can eventually be freed in page_pool_release_retry(). I'm going to take a look at your discussions in [1] and respond there.
On 2024/12/11 2:14, David Wei wrote: > On 2024-12-10 04:25, Yunsheng Lin wrote: >> On 2024/12/4 12:10, David Wei wrote: >> >>> bnxt_copy_rx_ring(bp, rxr, clone); >>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) >>> bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); >>> rxr->rx_next_cons = 0; >>> page_pool_disable_direct_recycling(rxr->page_pool); >>> + if (bnxt_separate_head_pool()) >>> + page_pool_disable_direct_recycling(rxr->head_pool); >> >> Hi, David >> As mentioned in [1], is the above page_pool_disable_direct_recycling() >> really needed? >> >> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops? >> It doesn't seem obvious there is any NAPI API like napi_enable() & >> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/ >> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading. >> >> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/ > > Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the > bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW > quirks. I looked back at my discussions w/ Broadcom, and IIU/RC > bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq > that I'm trying to stop. Calling napi_disable() has unintended side > effects on the Tx side. It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable a VNIC? and a napi_disable() is not needed? Is it possible that there may be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update() is called? > > The intent of the call to page_pool_disable_direct_recycling() is to > prevent pages from the old page pool from being returned into the fast > cache. These pages must be returned via page_pool_return_page() so that > the it can eventually be freed in page_pool_release_retry(). > > I'm going to take a look at your discussions in [1] and respond there. Thanks.
On 2024-12-11 04:32, Yunsheng Lin wrote: > On 2024/12/11 2:14, David Wei wrote: >> On 2024-12-10 04:25, Yunsheng Lin wrote: >>> On 2024/12/4 12:10, David Wei wrote: >>> >>>> bnxt_copy_rx_ring(bp, rxr, clone); >>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) >>>> bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); >>>> rxr->rx_next_cons = 0; >>>> page_pool_disable_direct_recycling(rxr->page_pool); >>>> + if (bnxt_separate_head_pool()) >>>> + page_pool_disable_direct_recycling(rxr->head_pool); >>> >>> Hi, David >>> As mentioned in [1], is the above page_pool_disable_direct_recycling() >>> really needed? >>> >>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops? >>> It doesn't seem obvious there is any NAPI API like napi_enable() & >>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/ >>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading. >>> >>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/ >> >> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the >> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW >> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC >> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq >> that I'm trying to stop. Calling napi_disable() has unintended side >> effects on the Tx side. > > It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable > a VNIC? and a napi_disable() is not needed? Correct. > Is it possible that there may > be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update() > is called? Possibly, I don't know the details of how the HW works. At the time I just wanted something to work, and not having napi_enable/disable() made it work. :) Looking back though it does seem odd, so I'll try putting it back. > >> >> The intent of the call to page_pool_disable_direct_recycling() is to >> prevent pages from the old page pool from being returned into the fast >> cache. These pages must be returned via page_pool_return_page() so that >> the it can eventually be freed in page_pool_release_retry(). >> >> I'm going to take a look at your discussions in [1] and respond there. > > Thanks.
On Wed, Dec 11, 2024 at 8:10 AM David Wei <dw@davidwei.uk> wrote: > > On 2024-12-11 04:32, Yunsheng Lin wrote: > > On 2024/12/11 2:14, David Wei wrote: > >> On 2024-12-10 04:25, Yunsheng Lin wrote: > >>> On 2024/12/4 12:10, David Wei wrote: > >>> > >>>> bnxt_copy_rx_ring(bp, rxr, clone); > >>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > >>>> bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); > >>>> rxr->rx_next_cons = 0; > >>>> page_pool_disable_direct_recycling(rxr->page_pool); > >>>> + if (bnxt_separate_head_pool()) > >>>> + page_pool_disable_direct_recycling(rxr->head_pool); > >>> > >>> Hi, David > >>> As mentioned in [1], is the above page_pool_disable_direct_recycling() > >>> really needed? > >>> > >>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops? > >>> It doesn't seem obvious there is any NAPI API like napi_enable() & > >>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/ > >>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading. > >>> > >>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/ > >> > >> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the > >> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW > >> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC > >> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq > >> that I'm trying to stop. Calling napi_disable() has unintended side > >> effects on the Tx side. > > > > It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable > > a VNIC? and a napi_disable() is not needed? > > Correct. > > > Is it possible that there may > > be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update() > > is called? > > Possibly, I don't know the details of how the HW works. > bnxt_hwrm_vnic_update() only stops the HW from receiving more packets. The completion may already have lots of RX entries and TPA entries. NAPI may be behind and it can take a while to process a batch of 64 entries at a time to go through all the remaining entries. > At the time I just wanted something to work, and not having > napi_enable/disable() made it work. :) Looking back though it does seem > odd, so I'll try putting it back. Yeah, I think it makes sense to add napi_disable(). Thanks.
On Wed, 11 Dec 2024 09:11:55 -0800 Michael Chan wrote: > > At the time I just wanted something to work, and not having > > napi_enable/disable() made it work. :) Looking back though it does seem > > odd, so I'll try putting it back. > > Yeah, I think it makes sense to add napi_disable(). +1, TBH I'm not sure how we avoid hitting the warning which checks if NAPI is "scheduled" in page_pool_disable_direct_recycling(). But, Yunsheng, I hope it is clear that the sync RCU is needed even if driver disables NAPI for the reconfiguration. Unless you see a RCU sync in napi_disable() / napi_enable()..
On 2024/12/12 8:48, Jakub Kicinski wrote: > But, Yunsheng, I hope it is clear that the sync RCU is needed even > if driver disables NAPI for the reconfiguration. Unless you see a > RCU sync in napi_disable() / napi_enable().. It seems that depends on calling order of napi_disable()/napi_enable() and page_pool_destroy() for old page_pool. It seems an extra RCU sync is not really needed if page_pool_destroy() for the old page_pool is called between napi_disable() and napi_enable() as page_pool_destroy() already have a RCU sync.
On Thu, 12 Dec 2024 19:23:52 +0800 Yunsheng Lin wrote: > It seems an extra RCU sync is not really needed if page_pool_destroy() > for the old page_pool is called between napi_disable() and napi_enable() > as page_pool_destroy() already have a RCU sync. I did my best.
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 8031ff31f837..6b963086c1d3 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -3710,7 +3710,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp) xdp_rxq_info_unreg(&rxr->xdp_rxq); page_pool_destroy(rxr->page_pool); - if (rxr->page_pool != rxr->head_pool) + if (bnxt_separate_head_pool()) page_pool_destroy(rxr->head_pool); rxr->page_pool = rxr->head_pool = NULL; @@ -15388,15 +15388,25 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx) goto err_free_rx_agg_ring; } + if (bp->flags & BNXT_FLAG_TPA) { + rc = bnxt_alloc_one_tpa_info(bp, clone); + if (rc) + goto err_free_tpa_info; + } + bnxt_init_one_rx_ring_rxbd(bp, clone); bnxt_init_one_rx_agg_ring_rxbd(bp, clone); bnxt_alloc_one_rx_ring_skb(bp, clone, idx); if (bp->flags & BNXT_FLAG_AGG_RINGS) bnxt_alloc_one_rx_ring_page(bp, clone, idx); + if (bp->flags & BNXT_FLAG_TPA) + bnxt_alloc_one_tpa_info_data(bp, clone); return 0; +err_free_tpa_info: + bnxt_free_one_tpa_info(bp, clone); err_free_rx_agg_ring: bnxt_free_ring(bp, &clone->rx_agg_ring_struct.ring_mem); err_free_rx_ring: @@ -15404,9 +15414,11 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx) err_rxq_info_unreg: xdp_rxq_info_unreg(&clone->xdp_rxq); err_page_pool_destroy: - clone->page_pool->p.napi = NULL; page_pool_destroy(clone->page_pool); + if (bnxt_separate_head_pool()) + page_pool_destroy(clone->head_pool); clone->page_pool = NULL; + clone->head_pool = NULL; return rc; } @@ -15416,13 +15428,15 @@ static void bnxt_queue_mem_free(struct net_device *dev, void *qmem) struct bnxt *bp = netdev_priv(dev); struct bnxt_ring_struct *ring; - bnxt_free_one_rx_ring(bp, rxr); - bnxt_free_one_rx_agg_ring(bp, rxr); + bnxt_free_one_rx_ring_skbs(bp, rxr); xdp_rxq_info_unreg(&rxr->xdp_rxq); page_pool_destroy(rxr->page_pool); + if (bnxt_separate_head_pool()) + page_pool_destroy(rxr->head_pool); rxr->page_pool = NULL; + rxr->head_pool = NULL; ring = &rxr->rx_ring_struct; bnxt_free_ring(bp, &ring->ring_mem); @@ -15504,7 +15518,10 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) rxr->rx_agg_prod = clone->rx_agg_prod; rxr->rx_sw_agg_prod = clone->rx_sw_agg_prod; rxr->rx_next_cons = clone->rx_next_cons; + rxr->rx_tpa = clone->rx_tpa; + rxr->rx_tpa_idx_map = clone->rx_tpa_idx_map; rxr->page_pool = clone->page_pool; + rxr->head_pool = clone->head_pool; rxr->xdp_rxq = clone->xdp_rxq; bnxt_copy_rx_ring(bp, rxr, clone); @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); rxr->rx_next_cons = 0; page_pool_disable_direct_recycling(rxr->page_pool); + if (bnxt_separate_head_pool()) + page_pool_disable_direct_recycling(rxr->head_pool); memcpy(qmem, rxr, sizeof(*rxr)); bnxt_init_rx_ring_struct(bp, qmem);
Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a page pool for header frags, which may be distinct from the existing pool for the aggregation ring. Prior to this change, frags used in the TPA ring rx_tpa were allocated from system memory e.g. napi_alloc_frag() meaning their lifetimes were not associated with a page pool. They can be returned at any time and so the queue API did not alloc or free rx_tpa. But now frags come from a separate head_pool which may be different to page_pool. Without allocating and freeing rx_tpa, frags allocated from the old head_pool may be returned to a different new head_pool which causes a mismatch between the pp hold/release count. Fix this problem by properly freeing and allocating rx_tpa in the queue API implementation. Signed-off-by: David Wei <dw@davidwei.uk> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 27 +++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-)