Message ID | 20250208202916.1391614-10-michael.chan@broadcom.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bnxt_en: Add NPAR 1.2 and TPH support | expand |
On Sat, 8 Feb 2025 12:29:15 -0800 Michael Chan wrote: > + rc = bnxt_hwrm_cp_ring_alloc_p5(bp, txr->tx_cpr); > + if (rc) > + return rc; > + > + rc = bnxt_hwrm_tx_ring_alloc(bp, txr, false); > + if (rc) > + return rc; Under what circumstances can these alloc calls fail? "alloc" sounds concerning in a start call. > + txr->tx_prod = 0; > + txr->tx_cons = 0; > + txr->tx_hw_cons = 0; > cpr->sw_stats->rx.rx_resets++; > > + if (bp->flags & BNXT_FLAG_SHARED_RINGS) { > + cpr->sw_stats->tx.tx_resets++; Is there a reason why queue op stop/start cycles are counted as resets? IIUC previously only faults (~errors) would be counted as resets. ifdown / ifup or ring reconfig (ethtool -L / -G) would not increment resets. I think queue reconfig is more like ethtool -L than a fault. It'd be more consistent with existing code not to increment these counters. > + rc = bnxt_tx_queue_start(bp, idx); > + if (rc) { > + netdev_warn(bp->dev, > + "tx queue restart failed: rc=%d\n", rc); > + bnapi->tx_fault = 1; > + goto err_reset; > + } > + } > + > + napi_enable(&bnapi->napi); Here you first start the queue then enable NAPI... > + bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons); > + > for (i = 0; i <= BNXT_VNIC_NTUPLE; i++) { > vnic = &bp->vnic_info[i]; > > @@ -15716,17 +15820,25 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > /* Make sure NAPI sees that the VNIC is disabled */ > synchronize_net(); > rxr = &bp->rx_ring[idx]; > - cancel_work_sync(&rxr->bnapi->cp_ring.dim.work); > + bnapi = rxr->bnapi; > + cpr = &bnapi->cp_ring; > + cancel_work_sync(&cpr->dim.work); > bnxt_hwrm_rx_ring_free(bp, rxr, false); > bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); > page_pool_disable_direct_recycling(rxr->page_pool); > if (bnxt_separate_head_pool()) > page_pool_disable_direct_recycling(rxr->head_pool); > > + if (bp->flags & BNXT_FLAG_SHARED_RINGS) > + bnxt_tx_queue_stop(bp, idx); > + > + napi_disable(&bnapi->napi); ... but here you do the opposite, and require extra synchronization in bnxt_tx_queue_stop() to set your magic flag, sync the NAPI etc. Why can't the start and stop paths be the mirror image?
On Tue, Feb 11, 2025 at 5:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 8 Feb 2025 12:29:15 -0800 Michael Chan wrote: > > + rc = bnxt_hwrm_cp_ring_alloc_p5(bp, txr->tx_cpr); > > + if (rc) > > + return rc; > > + > > + rc = bnxt_hwrm_tx_ring_alloc(bp, txr, false); > > + if (rc) > > + return rc; > > Under what circumstances can these alloc calls fail? > "alloc" sounds concerning in a start call. The ring has been previously reserved with FW, so it normally should not fail. I'll need to ask the FW team for some possible failure scenarios. > > > + txr->tx_prod = 0; > > + txr->tx_cons = 0; > > + txr->tx_hw_cons = 0; > > > cpr->sw_stats->rx.rx_resets++; > > > > + if (bp->flags & BNXT_FLAG_SHARED_RINGS) { > > + cpr->sw_stats->tx.tx_resets++; > > Is there a reason why queue op stop/start cycles are counted as resets? > IIUC previously only faults (~errors) would be counted as resets. > ifdown / ifup or ring reconfig (ethtool -L / -G) would not increment > resets. I think queue reconfig is more like ethtool -L than a fault. > It'd be more consistent with existing code not to increment these > counters. I think David's original code increments the rx_reset counter for every queue_start. We're just following that. Maybe it came from the original plan to use HWRM_RING_RESET to do the RX queue_stop/queue_start. We can remove the reset counters for all queue_stop/queue_start if that makes more sense. > > > + rc = bnxt_tx_queue_start(bp, idx); > > + if (rc) { > > + netdev_warn(bp->dev, > > + "tx queue restart failed: rc=%d\n", rc); > > + bnapi->tx_fault = 1; > > + goto err_reset; > > + } > > + } > > + > > + napi_enable(&bnapi->napi); > > Here you first start the queue then enable NAPI... > > > + bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons); > > + > > for (i = 0; i <= BNXT_VNIC_NTUPLE; i++) { > > vnic = &bp->vnic_info[i]; > > > > > @@ -15716,17 +15820,25 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > > /* Make sure NAPI sees that the VNIC is disabled */ > > synchronize_net(); > > rxr = &bp->rx_ring[idx]; > > - cancel_work_sync(&rxr->bnapi->cp_ring.dim.work); > > + bnapi = rxr->bnapi; > > + cpr = &bnapi->cp_ring; > > + cancel_work_sync(&cpr->dim.work); > > bnxt_hwrm_rx_ring_free(bp, rxr, false); > > bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); > > page_pool_disable_direct_recycling(rxr->page_pool); > > if (bnxt_separate_head_pool()) > > page_pool_disable_direct_recycling(rxr->head_pool); > > > > + if (bp->flags & BNXT_FLAG_SHARED_RINGS) > > + bnxt_tx_queue_stop(bp, idx); > > + > > + napi_disable(&bnapi->napi); > > ... but here you do the opposite, and require extra synchronization > in bnxt_tx_queue_stop() to set your magic flag, sync the NAPI etc. > Why can't the start and stop paths be the mirror image? The ring free operation requires interrupt/NAPI to be working. FW signals the completion of the ring free command on the completion ring associated with the ring we're freeing. When we see this completion during NAPI, it guarantees that this is the last DMA on that ring. Only ring free FW commands are handled this way, requiring NAPI.
On Tue, 11 Feb 2025 18:31:21 -0800 Michael Chan wrote: > On Tue, Feb 11, 2025 at 5:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 8 Feb 2025 12:29:15 -0800 Michael Chan wrote: > > > + rc = bnxt_hwrm_cp_ring_alloc_p5(bp, txr->tx_cpr); > > > + if (rc) > > > + return rc; > > > + > > > + rc = bnxt_hwrm_tx_ring_alloc(bp, txr, false); > > > + if (rc) > > > + return rc; > > > > Under what circumstances can these alloc calls fail? > > "alloc" sounds concerning in a start call. > > The ring has been previously reserved with FW, so it normally should > not fail. I'll need to ask the FW team for some possible failure > scenarios. Thanks, expectation is that start never fails. If the FW team comes back with "should never happen if rings are reserved" please add a comment to that effect here. Since this is one of very few implementations people may read it and incorrectly assume that allocating is okay. If the FW team comes back with a list of possible but unlikely scenarios I'm afraid a rework will be needed. > > > cpr->sw_stats->rx.rx_resets++; > > > > > > + if (bp->flags & BNXT_FLAG_SHARED_RINGS) { > > > + cpr->sw_stats->tx.tx_resets++; > > > > Is there a reason why queue op stop/start cycles are counted as resets? > > IIUC previously only faults (~errors) would be counted as resets. > > ifdown / ifup or ring reconfig (ethtool -L / -G) would not increment > > resets. I think queue reconfig is more like ethtool -L than a fault. > > It'd be more consistent with existing code not to increment these > > counters. > > I think David's original code increments the rx_reset counter for > every queue_start. We're just following that. Maybe it came from the > original plan to use HWRM_RING_RESET to do the RX > queue_stop/queue_start. We can remove the reset counters for all > queue_stop/queue_start if that makes more sense. I vote remove, just to be crystal clear. > > > @@ -15716,17 +15820,25 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > > > /* Make sure NAPI sees that the VNIC is disabled */ > > > synchronize_net(); > > > rxr = &bp->rx_ring[idx]; > > > - cancel_work_sync(&rxr->bnapi->cp_ring.dim.work); > > > + bnapi = rxr->bnapi; > > > + cpr = &bnapi->cp_ring; > > > + cancel_work_sync(&cpr->dim.work); > > > bnxt_hwrm_rx_ring_free(bp, rxr, false); > > > bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); > > > page_pool_disable_direct_recycling(rxr->page_pool); > > > if (bnxt_separate_head_pool()) > > > page_pool_disable_direct_recycling(rxr->head_pool); > > > > > > + if (bp->flags & BNXT_FLAG_SHARED_RINGS) > > > + bnxt_tx_queue_stop(bp, idx); > > > + > > > + napi_disable(&bnapi->napi); > > > > ... but here you do the opposite, and require extra synchronization > > in bnxt_tx_queue_stop() to set your magic flag, sync the NAPI etc. > > Why can't the start and stop paths be the mirror image? > > The ring free operation requires interrupt/NAPI to be working. FW > signals the completion of the ring free command on the completion ring > associated with the ring we're freeing. When we see this completion > during NAPI, it guarantees that this is the last DMA on that ring. > Only ring free FW commands are handled this way, requiring NAPI. Ugh, I feel like this was explained to me before, sorry. Again, a comment in the code would go a long way for non-Broadcom readers.
On Tue, Feb 11, 2025 at 6:43 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 11 Feb 2025 18:31:21 -0800 Michael Chan wrote: > > The ring has been previously reserved with FW, so it normally should > > not fail. I'll need to ask the FW team for some possible failure > > scenarios. > > Thanks, expectation is that start never fails. > If the FW team comes back with "should never happen if rings > are reserved" please add a comment to that effect here. Since > this is one of very few implementations people may read it > and incorrectly assume that allocating is okay. > If the FW team comes back with a list of possible but unlikely > scenarios I'm afraid a rework will be needed. > The FW team has confirmed that it should never fail in this case. The ring has been previously reserved and allocated. Re-allocating it with essentially the same parameters should never fail. I will be sending V5 soon. Thanks.
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 019a8433b0d6..fee9baff9e5a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -7368,6 +7368,22 @@ static int hwrm_ring_free_send_msg(struct bnxt *bp, return 0; } +static void bnxt_hwrm_tx_ring_free(struct bnxt *bp, struct bnxt_tx_ring_info *txr, + bool close_path) +{ + struct bnxt_ring_struct *ring = &txr->tx_ring_struct; + u32 cmpl_ring_id; + + if (ring->fw_ring_id == INVALID_HW_RING_ID) + return; + + cmpl_ring_id = close_path ? bnxt_cp_ring_for_tx(bp, txr) : + INVALID_HW_RING_ID; + hwrm_ring_free_send_msg(bp, ring, RING_FREE_REQ_RING_TYPE_TX, + cmpl_ring_id); + ring->fw_ring_id = INVALID_HW_RING_ID; +} + static void bnxt_hwrm_rx_ring_free(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, bool close_path) @@ -11274,6 +11290,75 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init) return 0; } +static void bnxt_tx_queue_stop(struct bnxt *bp, int idx) +{ + struct bnxt_tx_ring_info *txr; + struct netdev_queue *txq; + struct bnxt_napi *bnapi; + int i; + + bnapi = bp->bnapi[idx]; + bnxt_for_each_napi_tx(i, bnapi, txr) { + WRITE_ONCE(txr->dev_state, BNXT_DEV_STATE_CLOSING); + synchronize_net(); + + if (!(bnapi->flags & BNXT_NAPI_FLAG_XDP)) { + txq = netdev_get_tx_queue(bp->dev, txr->txq_index); + if (txq) { + __netif_tx_lock_bh(txq); + netif_tx_stop_queue(txq); + __netif_tx_unlock_bh(txq); + } + } + + if (!bp->tph_mode) + continue; + + bnxt_hwrm_tx_ring_free(bp, txr, true); + bnxt_hwrm_cp_ring_free(bp, txr->tx_cpr); + bnxt_free_one_tx_ring_skbs(bp, txr, txr->txq_index); + bnxt_clear_one_cp_ring(bp, txr->tx_cpr); + } +} + +static int bnxt_tx_queue_start(struct bnxt *bp, int idx) +{ + struct bnxt_tx_ring_info *txr; + struct netdev_queue *txq; + struct bnxt_napi *bnapi; + int rc, i; + + bnapi = bp->bnapi[idx]; + bnxt_for_each_napi_tx(i, bnapi, txr) { + if (!bp->tph_mode) + goto start_tx; + + rc = bnxt_hwrm_cp_ring_alloc_p5(bp, txr->tx_cpr); + if (rc) + return rc; + + rc = bnxt_hwrm_tx_ring_alloc(bp, txr, false); + if (rc) + return rc; + + txr->tx_prod = 0; + txr->tx_cons = 0; + txr->tx_hw_cons = 0; +start_tx: + WRITE_ONCE(txr->dev_state, 0); + synchronize_net(); + + if (bnapi->flags & BNXT_NAPI_FLAG_XDP) + continue; + + txq = netdev_get_tx_queue(bp->dev, txr->txq_index); + if (txq) + netif_tx_start_queue(txq); + } + + return 0; +} + static void bnxt_free_irq(struct bnxt *bp) { struct bnxt_irq *irq; @@ -15638,6 +15723,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) struct bnxt_rx_ring_info *rxr, *clone; struct bnxt_cp_ring_info *cpr; struct bnxt_vnic_info *vnic; + struct bnxt_napi *bnapi; int i, rc; rxr = &bp->rx_ring[idx]; @@ -15655,27 +15741,42 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) bnxt_copy_rx_ring(bp, rxr, clone); + bnapi = rxr->bnapi; + cpr = &bnapi->cp_ring; rc = bnxt_hwrm_rx_ring_alloc(bp, rxr); if (rc) - return rc; + goto err_reset_rx; if (bp->tph_mode) { rc = bnxt_hwrm_cp_ring_alloc_p5(bp, rxr->rx_cpr); if (rc) - goto err_free_hwrm_rx_ring; + goto err_reset_rx; } rc = bnxt_hwrm_rx_agg_ring_alloc(bp, rxr); if (rc) - goto err_free_hwrm_cp_ring; + goto err_reset_rx; bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod); if (bp->flags & BNXT_FLAG_AGG_RINGS) bnxt_db_write(bp, &rxr->rx_agg_db, rxr->rx_agg_prod); - cpr = &rxr->bnapi->cp_ring; cpr->sw_stats->rx.rx_resets++; + if (bp->flags & BNXT_FLAG_SHARED_RINGS) { + cpr->sw_stats->tx.tx_resets++; + rc = bnxt_tx_queue_start(bp, idx); + if (rc) { + netdev_warn(bp->dev, + "tx queue restart failed: rc=%d\n", rc); + bnapi->tx_fault = 1; + goto err_reset; + } + } + + napi_enable(&bnapi->napi); + bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons); + for (i = 0; i <= BNXT_VNIC_NTUPLE; i++) { vnic = &bp->vnic_info[i]; @@ -15692,11 +15793,12 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) return 0; -err_free_hwrm_cp_ring: - if (bp->tph_mode) - bnxt_hwrm_cp_ring_free(bp, rxr->rx_cpr); -err_free_hwrm_rx_ring: - bnxt_hwrm_rx_ring_free(bp, rxr, false); +err_reset_rx: + rxr->bnapi->in_reset = true; +err_reset: + napi_enable(&bnapi->napi); + bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons); + bnxt_reset_task(bp, true); return rc; } @@ -15704,7 +15806,9 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) { struct bnxt *bp = netdev_priv(dev); struct bnxt_rx_ring_info *rxr; + struct bnxt_cp_ring_info *cpr; struct bnxt_vnic_info *vnic; + struct bnxt_napi *bnapi; int i; for (i = 0; i <= BNXT_VNIC_NTUPLE; i++) { @@ -15716,17 +15820,25 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) /* Make sure NAPI sees that the VNIC is disabled */ synchronize_net(); rxr = &bp->rx_ring[idx]; - cancel_work_sync(&rxr->bnapi->cp_ring.dim.work); + bnapi = rxr->bnapi; + cpr = &bnapi->cp_ring; + cancel_work_sync(&cpr->dim.work); bnxt_hwrm_rx_ring_free(bp, rxr, false); bnxt_hwrm_rx_agg_ring_free(bp, rxr, false); page_pool_disable_direct_recycling(rxr->page_pool); if (bnxt_separate_head_pool()) page_pool_disable_direct_recycling(rxr->head_pool); + if (bp->flags & BNXT_FLAG_SHARED_RINGS) + bnxt_tx_queue_stop(bp, idx); + + napi_disable(&bnapi->napi); + if (bp->tph_mode) { bnxt_hwrm_cp_ring_free(bp, rxr->rx_cpr); bnxt_clear_one_cp_ring(bp, rxr->rx_cpr); } + bnxt_db_nq(bp, &cpr->cp_db, cpr->cp_raw_cons); memcpy(qmem, rxr, sizeof(*rxr)); bnxt_init_rx_ring_struct(bp, qmem);