Message ID | 20231214213138.98095-1-michael.chan@broadcom.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 23c93c3b6275a59f2a685f4a693944b53c31df4e |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] bnxt_en: do not map packet buffers twice | expand |
On 2023-12-14 13:31, Michael Chan wrote: > From: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > Remove double-mapping of DMA buffers as it can prevent page pool entries > from being freed. Mapping is managed by page pool infrastructure and > was previously managed by the driver in __bnxt_alloc_rx_page before > allowing the page pool infrastructure to manage it. > > Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> > Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> > Signed-off-by: Michael Chan <michael.chan@broadcom.com> > --- > drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++--------- > 1 file changed, 2 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > index 96f5ca778c67..8cb9a99154aa 100644 > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, > for (i = 0; i < num_frags ; i++) { > skb_frag_t *frag = &sinfo->frags[i]; > struct bnxt_sw_tx_bd *frag_tx_buf; > - struct pci_dev *pdev = bp->pdev; > dma_addr_t frag_mapping; > int frag_len; > > @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, > txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)]; > > frag_len = skb_frag_size(frag); > - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0, > - frag_len, DMA_TO_DEVICE); > - > - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping))) > - return NULL; > - > - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping); If this is no longer set, what would happen to dma_unmap_single() in bnxt_tx_int_xdp() that is then reading `mapping` via dma_unmap_addr()? > - > flags = frag_len << TX_BD_LEN_SHIFT; > txbd->tx_bd_len_flags_type = cpu_to_le32(flags); > + frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) + > + skb_frag_off(frag); > txbd->tx_bd_haddr = cpu_to_le64(frag_mapping); > > len = frag_len;
On Thu, Dec 14, 2023 at 3:18 PM David Wei <dw@davidwei.uk> wrote: > > On 2023-12-14 13:31, Michael Chan wrote: > > From: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > > > Remove double-mapping of DMA buffers as it can prevent page pool entries > > from being freed. Mapping is managed by page pool infrastructure and > > was previously managed by the driver in __bnxt_alloc_rx_page before > > allowing the page pool infrastructure to manage it. > > > > Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") > > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> > > Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > Signed-off-by: Michael Chan <michael.chan@broadcom.com> > > --- > > drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++--------- > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > > index 96f5ca778c67..8cb9a99154aa 100644 > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > > @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, > > for (i = 0; i < num_frags ; i++) { > > skb_frag_t *frag = &sinfo->frags[i]; > > struct bnxt_sw_tx_bd *frag_tx_buf; > > - struct pci_dev *pdev = bp->pdev; > > dma_addr_t frag_mapping; > > int frag_len; > > > > @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, > > txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)]; > > > > frag_len = skb_frag_size(frag); > > - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0, > > - frag_len, DMA_TO_DEVICE); > > - > > - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping))) > > - return NULL; > > - > > - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping); > > If this is no longer set, what would happen to dma_unmap_single() in > bnxt_tx_int_xdp() that is then reading `mapping` via dma_unmap_addr()? The dma_unmap_addr() call in bnxt_tx_int_xdp() is for XDP_REDIRECT packets only. Our current XDP_REDIRECT implementation supports only a single buffer per packet. Here, this code path is for XDP_TX multi-buffers. The DMA mapping for the frag pages is always handled by the page pool. Thanks.
On 2023-12-14 13:31, Michael Chan wrote: > From: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > Remove double-mapping of DMA buffers as it can prevent page pool entries > from being freed. Mapping is managed by page pool infrastructure and > was previously managed by the driver in __bnxt_alloc_rx_page before > allowing the page pool infrastructure to manage it. > > Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> > Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> > Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: David Wei <dw@davidwei.uk> > --- > drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++--------- > 1 file changed, 2 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > index 96f5ca778c67..8cb9a99154aa 100644 > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c > @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, > for (i = 0; i < num_frags ; i++) { > skb_frag_t *frag = &sinfo->frags[i]; > struct bnxt_sw_tx_bd *frag_tx_buf; > - struct pci_dev *pdev = bp->pdev; > dma_addr_t frag_mapping; > int frag_len; > > @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, > txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)]; > > frag_len = skb_frag_size(frag); > - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0, > - frag_len, DMA_TO_DEVICE); I checked that skb_frag_dma_map() calls dma_map_page() with page set to skb_frag_page(frag) and offset set to skb_frag_off(frag) + offset where offset is 0. This is thus equivalent to the line added below: page_pool_get_dma_addr(skb_frag_page(frag)) + skb_frag_off(frag) > - > - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping))) > - return NULL; I checked that page_pool_get_dma_addr() cannot fail or return an invalid mapping. The DMA mapping happens when bulk allocating the pp alloc cache during __page_pool_alloc_pages_slow(). If DMA mapping fails during page_pool_dma_map() then the page is not stored in the cache. Therefore any pages allocated from the pp will have a valid DMA addr. > - > - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping); As discussed with Michael Chan, only XDP_TX will have multiple page frags. Presumably only XDP_TX will have num_frags > 0 and enter this for loop. Even though XDP_REDIRECT also calls bnxt_xmit_bd() from __bnxt_xmit_xdp_redirect(), I assume xdp_buff_has_frags() returns false. > - > flags = frag_len << TX_BD_LEN_SHIFT; > txbd->tx_bd_len_flags_type = cpu_to_le32(flags); > + frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) + > + skb_frag_off(frag); I trust that the page pool DMA mapping management is correct. Both skb_frag_dma_map() and page_pool_dma_map() call into dma_map_page_attrs(), but page_pool_dma_map() has flags DMA_ATTR_SKIP_CPU_SYNC and DMA_ATTR_WEAK_ORDERING set whereas skb_frag_dma_map() has no flags. DMA_ATTR_WEAK_ORDERING is optional and ignored for platforms that do not support it, therefore safe to use. DMA_ATTR_SKIP_CPU_SYNC is used since presumably there is no sharing of pages between multiple devices. IIRC there is a single page pool per Rx queue/NAPI context. > txbd->tx_bd_haddr = cpu_to_le64(frag_mapping); > > len = frag_len;
On Thu, 14 Dec 2023 13:31:38 -0800 Michael Chan wrote: > From: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > Remove double-mapping of DMA buffers as it can prevent page pool entries > from being freed. Mapping is managed by page pool infrastructure and > was previously managed by the driver in __bnxt_alloc_rx_page before > allowing the page pool infrastructure to manage it. This patch is all good, but I'm confused by the handling of head. Do you recycle it immediately and hope that the Tx happens before the Rx gets around to using the recycled page again? Am I misreading?
On Fri, Dec 15, 2023 at 08:37:59AM -0800, Jakub Kicinski wrote: > On Thu, 14 Dec 2023 13:31:38 -0800 Michael Chan wrote: > > From: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > > > Remove double-mapping of DMA buffers as it can prevent page pool entries > > from being freed. Mapping is managed by page pool infrastructure and > > was previously managed by the driver in __bnxt_alloc_rx_page before > > allowing the page pool infrastructure to manage it. > > This patch is all good, but I'm confused by the handling of head. > Do you recycle it immediately and hope that the Tx happens before > the Rx gets around to using the recycled page again? Am I misreading? Your description is correct, but we use a better strategy that just hoping it works out. :) The design is that we do not update the rx ring with the producer value that was present when the packet was received until after getting the tx completion indicating that the packet sent via XDP_TX action has been sent.
On Fri, 15 Dec 2023 11:57:14 -0500 Andy Gospodarek wrote: > > This patch is all good, but I'm confused by the handling of head. > > Do you recycle it immediately and hope that the Tx happens before > > the Rx gets around to using the recycled page again? Am I misreading? > > Your description is correct, but we use a better strategy that just > hoping it works out. :) > > The design is that we do not update the rx ring with the producer value > that was present when the packet was received until after getting the tx > completion indicating that the packet sent via XDP_TX action has been > sent. Ah, I see it, interesting! In that case - next question.. :) Are the XDP_REDIRECT (target) and XDP_TX going to the same rings? The locking seems to be missing, and bnxt_tx_int_xdp() does not seem to be able to handle the optimization you described if a ring contains a mix of XDP_REDIRECT and XDP_TX. If I'm reading the assignment in bnxt_alloc_mem() and indexing right - XDP_REDIRECT and XDP_TX do seem to go to the same rings.
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 14 Dec 2023 13:31:38 -0800 you wrote: > From: Andy Gospodarek <andrew.gospodarek@broadcom.com> > > Remove double-mapping of DMA buffers as it can prevent page pool entries > from being freed. Mapping is managed by page pool infrastructure and > was previously managed by the driver in __bnxt_alloc_rx_page before > allowing the page pool infrastructure to manage it. > > [...] Here is the summary with links: - [net] bnxt_en: do not map packet buffers twice https://git.kernel.org/netdev/net/c/23c93c3b6275 You are awesome, thank you!
On Fri, Dec 15, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > Are the XDP_REDIRECT (target) and XDP_TX going to the same rings? > The locking seems to be missing, and bnxt_tx_int_xdp() does not > seem to be able to handle the optimization you described if > a ring contains a mix of XDP_REDIRECT and XDP_TX. XDP_REDIRECT mixed with XDP_TX won't work well currently. It was briefly mentioned on the list a few months ago: https://lore.kernel.org/netdev/CACKFLin+1whPs0qeM5xBb1yXx8FkFS_vGrW6PaGy41_XVH=SGg@mail.gmail.com/ Yes, they share the same set of TX rings in the current code. My plan is to have a set of dedicated TX rings for XDP_REDIRECT. Adding locking to properly support XDP_REDIRECT and XDP_TX seems not ideal for performance.
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 96f5ca778c67..8cb9a99154aa 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, for (i = 0; i < num_frags ; i++) { skb_frag_t *frag = &sinfo->frags[i]; struct bnxt_sw_tx_bd *frag_tx_buf; - struct pci_dev *pdev = bp->pdev; dma_addr_t frag_mapping; int frag_len; @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)]; frag_len = skb_frag_size(frag); - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0, - frag_len, DMA_TO_DEVICE); - - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping))) - return NULL; - - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping); - flags = frag_len << TX_BD_LEN_SHIFT; txbd->tx_bd_len_flags_type = cpu_to_le32(flags); + frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) + + skb_frag_off(frag); txbd->tx_bd_haddr = cpu_to_le64(frag_mapping); len = frag_len;