diff mbox series

[net] bnxt_en: do not map packet buffers twice

Message ID 20231214213138.98095-1-michael.chan@broadcom.com (mailing list archive)
State Accepted
Commit 23c93c3b6275a59f2a685f4a693944b53c31df4e
Delegated to: Netdev Maintainers
Headers show
Series [net] bnxt_en: do not map packet buffers twice | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1115 this patch: 1115
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 1142 this patch: 1142
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1142 this patch: 1142
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 25 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Michael Chan Dec. 14, 2023, 9:31 p.m. UTC
From: Andy Gospodarek <andrew.gospodarek@broadcom.com>

Remove double-mapping of DMA buffers as it can prevent page pool entries
from being freed.  Mapping is managed by page pool infrastructure and
was previously managed by the driver in __bnxt_alloc_rx_page before
allowing the page pool infrastructure to manage it.

Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

Comments

David Wei Dec. 14, 2023, 11:18 p.m. UTC | #1
On 2023-12-14 13:31, Michael Chan wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> 
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed.  Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
> 
> Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index 96f5ca778c67..8cb9a99154aa 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
>  	for (i = 0; i < num_frags ; i++) {
>  		skb_frag_t *frag = &sinfo->frags[i];
>  		struct bnxt_sw_tx_bd *frag_tx_buf;
> -		struct pci_dev *pdev = bp->pdev;
>  		dma_addr_t frag_mapping;
>  		int frag_len;
>  
> @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
>  		txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
>  
>  		frag_len = skb_frag_size(frag);
> -		frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
> -						frag_len, DMA_TO_DEVICE);
> -
> -		if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
> -			return NULL;
> -
> -		dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);

If this is no longer set, what would happen to dma_unmap_single() in
bnxt_tx_int_xdp() that is then reading `mapping` via dma_unmap_addr()?

> -
>  		flags = frag_len << TX_BD_LEN_SHIFT;
>  		txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
> +		frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) +
> +			       skb_frag_off(frag);
>  		txbd->tx_bd_haddr = cpu_to_le64(frag_mapping);
>  
>  		len = frag_len;
Michael Chan Dec. 15, 2023, 12:27 a.m. UTC | #2
On Thu, Dec 14, 2023 at 3:18 PM David Wei <dw@davidwei.uk> wrote:
>
> On 2023-12-14 13:31, Michael Chan wrote:
> > From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> >
> > Remove double-mapping of DMA buffers as it can prevent page pool entries
> > from being freed.  Mapping is managed by page pool infrastructure and
> > was previously managed by the driver in __bnxt_alloc_rx_page before
> > allowing the page pool infrastructure to manage it.
> >
> > Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
> > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> > ---
> >  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
> >  1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> > index 96f5ca778c67..8cb9a99154aa 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> > @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> >       for (i = 0; i < num_frags ; i++) {
> >               skb_frag_t *frag = &sinfo->frags[i];
> >               struct bnxt_sw_tx_bd *frag_tx_buf;
> > -             struct pci_dev *pdev = bp->pdev;
> >               dma_addr_t frag_mapping;
> >               int frag_len;
> >
> > @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> >               txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
> >
> >               frag_len = skb_frag_size(frag);
> > -             frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
> > -                                             frag_len, DMA_TO_DEVICE);
> > -
> > -             if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
> > -                     return NULL;
> > -
> > -             dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);
>
> If this is no longer set, what would happen to dma_unmap_single() in
> bnxt_tx_int_xdp() that is then reading `mapping` via dma_unmap_addr()?

The dma_unmap_addr() call in bnxt_tx_int_xdp() is for XDP_REDIRECT
packets only.  Our current XDP_REDIRECT implementation supports only a
single buffer per packet.  Here, this code path is for XDP_TX
multi-buffers.  The DMA mapping for the frag pages is always handled
by the page pool.  Thanks.
David Wei Dec. 15, 2023, 5:54 a.m. UTC | #3
On 2023-12-14 13:31, Michael Chan wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> 
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed.  Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
> 
> Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>

Reviewed-by: David Wei <dw@davidwei.uk>

> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index 96f5ca778c67..8cb9a99154aa 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
>  	for (i = 0; i < num_frags ; i++) {
>  		skb_frag_t *frag = &sinfo->frags[i];
>  		struct bnxt_sw_tx_bd *frag_tx_buf;
> -		struct pci_dev *pdev = bp->pdev;
>  		dma_addr_t frag_mapping;
>  		int frag_len;
>  
> @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
>  		txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
>  
>  		frag_len = skb_frag_size(frag);
> -		frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
> -						frag_len, DMA_TO_DEVICE);

I checked that skb_frag_dma_map() calls dma_map_page() with page set to
skb_frag_page(frag) and offset set to skb_frag_off(frag) + offset where
offset is 0. This is thus equivalent to the line added below:

page_pool_get_dma_addr(skb_frag_page(frag)) + skb_frag_off(frag)

> -
> -		if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
> -			return NULL;

I checked that page_pool_get_dma_addr() cannot fail or return an invalid
mapping. The DMA mapping happens when bulk allocating the pp alloc cache
during __page_pool_alloc_pages_slow(). If DMA mapping fails during
page_pool_dma_map() then the page is not stored in the cache. Therefore
any pages allocated from the pp will have a valid DMA addr.

> -
> -		dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);

As discussed with Michael Chan, only XDP_TX will have multiple page
frags. Presumably only XDP_TX will have num_frags > 0 and enter this for
loop. Even though XDP_REDIRECT also calls bnxt_xmit_bd() from
__bnxt_xmit_xdp_redirect(), I assume xdp_buff_has_frags() returns false.

> -
>  		flags = frag_len << TX_BD_LEN_SHIFT;
>  		txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
> +		frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) +
> +			       skb_frag_off(frag);

I trust that the page pool DMA mapping management is correct.

Both skb_frag_dma_map() and page_pool_dma_map() call into
dma_map_page_attrs(), but page_pool_dma_map() has flags
DMA_ATTR_SKIP_CPU_SYNC and DMA_ATTR_WEAK_ORDERING set whereas
skb_frag_dma_map() has no flags.

DMA_ATTR_WEAK_ORDERING is optional and ignored for platforms that do not
support it, therefore safe to use.

DMA_ATTR_SKIP_CPU_SYNC is used since presumably there is no sharing of
pages between multiple devices. IIRC there is a single page pool per Rx
queue/NAPI context.

>  		txbd->tx_bd_haddr = cpu_to_le64(frag_mapping);
>  
>  		len = frag_len;
Jakub Kicinski Dec. 15, 2023, 4:37 p.m. UTC | #4
On Thu, 14 Dec 2023 13:31:38 -0800 Michael Chan wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> 
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed.  Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.

This patch is all good, but I'm confused by the handling of head.
Do you recycle it immediately and hope that the Tx happens before
the Rx gets around to using the recycled page again? Am I misreading?
Andy Gospodarek Dec. 15, 2023, 4:57 p.m. UTC | #5
On Fri, Dec 15, 2023 at 08:37:59AM -0800, Jakub Kicinski wrote:
> On Thu, 14 Dec 2023 13:31:38 -0800 Michael Chan wrote:
> > From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > 
> > Remove double-mapping of DMA buffers as it can prevent page pool entries
> > from being freed.  Mapping is managed by page pool infrastructure and
> > was previously managed by the driver in __bnxt_alloc_rx_page before
> > allowing the page pool infrastructure to manage it.
> 
> This patch is all good, but I'm confused by the handling of head.
> Do you recycle it immediately and hope that the Tx happens before
> the Rx gets around to using the recycled page again? Am I misreading?

Your description is correct, but we use a better strategy that just
hoping it works out. :)

The design is that we do not update the rx ring with the producer value
that was present when the packet was received until after getting the tx
completion indicating that the packet sent via XDP_TX action has been
sent.
Jakub Kicinski Dec. 15, 2023, 5:21 p.m. UTC | #6
On Fri, 15 Dec 2023 11:57:14 -0500 Andy Gospodarek wrote:
> > This patch is all good, but I'm confused by the handling of head.
> > Do you recycle it immediately and hope that the Tx happens before
> > the Rx gets around to using the recycled page again? Am I misreading?  
> 
> Your description is correct, but we use a better strategy that just
> hoping it works out. :)
> 
> The design is that we do not update the rx ring with the producer value
> that was present when the packet was received until after getting the tx
> completion indicating that the packet sent via XDP_TX action has been
> sent.

Ah, I see it, interesting! In that case - next question.. :)

Are the XDP_REDIRECT (target) and XDP_TX going to the same rings?
The locking seems to be missing, and bnxt_tx_int_xdp() does not
seem to be able to handle the optimization you described if
a ring contains a mix of XDP_REDIRECT and XDP_TX.

If I'm reading the assignment in bnxt_alloc_mem() and indexing
right - XDP_REDIRECT and XDP_TX do seem to go to the same rings.
patchwork-bot+netdevbpf@kernel.org Dec. 15, 2023, 6:20 p.m. UTC | #7
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 14 Dec 2023 13:31:38 -0800 you wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> 
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed.  Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
> 
> [...]

Here is the summary with links:
  - [net] bnxt_en: do not map packet buffers twice
    https://git.kernel.org/netdev/net/c/23c93c3b6275

You are awesome, thank you!
Michael Chan Dec. 15, 2023, 8:45 p.m. UTC | #8
On Fri, Dec 15, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Are the XDP_REDIRECT (target) and XDP_TX going to the same rings?
> The locking seems to be missing, and bnxt_tx_int_xdp() does not
> seem to be able to handle the optimization you described if
> a ring contains a mix of XDP_REDIRECT and XDP_TX.

XDP_REDIRECT mixed with XDP_TX won't work well currently.  It was
briefly mentioned on the list a few months ago:

https://lore.kernel.org/netdev/CACKFLin+1whPs0qeM5xBb1yXx8FkFS_vGrW6PaGy41_XVH=SGg@mail.gmail.com/

Yes, they share the same set of TX rings in the current code.  My plan
is to have a set of dedicated TX rings for XDP_REDIRECT.  Adding
locking to properly support XDP_REDIRECT and XDP_TX seems not ideal
for performance.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 96f5ca778c67..8cb9a99154aa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -59,7 +59,6 @@  struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
 	for (i = 0; i < num_frags ; i++) {
 		skb_frag_t *frag = &sinfo->frags[i];
 		struct bnxt_sw_tx_bd *frag_tx_buf;
-		struct pci_dev *pdev = bp->pdev;
 		dma_addr_t frag_mapping;
 		int frag_len;
 
@@ -73,16 +72,10 @@  struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
 		txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
 
 		frag_len = skb_frag_size(frag);
-		frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
-						frag_len, DMA_TO_DEVICE);
-
-		if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
-			return NULL;
-
-		dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);
-
 		flags = frag_len << TX_BD_LEN_SHIFT;
 		txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
+		frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) +
+			       skb_frag_off(frag);
 		txbd->tx_bd_haddr = cpu_to_le64(frag_mapping);
 
 		len = frag_len;