diff mbox series

[net,v1] net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data

Message ID 20241021061023.2162701-1-0x1207@gmail.com (mailing list archive)
State Accepted
Commit 66600fac7a984dea4ae095411f644770b2561ede
Delegated to: Netdev Maintainers
Headers show
Series [net,v1] net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: peppe.cavallaro@st.com; 2 maintainers not CCed: andrew+netdev@lunn.ch peppe.cavallaro@st.com
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 34 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 38 this patch: 38
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-23--12-00 (tests: 777)

Commit Message

Furong Xu Oct. 21, 2024, 6:10 a.m. UTC
In case the non-paged data of a SKB carries protocol header and protocol
payload to be transmitted on a certain platform that the DMA AXI address
width is configured to 40-bit/48-bit, or the size of the non-paged data
is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI
address width is configured to 32-bit, then this SKB requires at least
two DMA transmit descriptors to serve it.

For example, three descriptors are allocated to split one DMA buffer
mapped from one piece of non-paged data:
    dma_desc[N + 0],
    dma_desc[N + 1],
    dma_desc[N + 2].
Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold
extra information to be reused in stmmac_tx_clean():
    tx_q->tx_skbuff_dma[N + 0],
    tx_q->tx_skbuff_dma[N + 1],
    tx_q->tx_skbuff_dma[N + 2].
Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer
address returned by DMA mapping call. stmmac_tx_clean() will try to
unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf
is a valid buffer address.

The expected behavior that saves DMA buffer address of this non-paged
data to tx_q->tx_skbuff_dma[entry].buf is:
    tx_q->tx_skbuff_dma[N + 0].buf = NULL;
    tx_q->tx_skbuff_dma[N + 1].buf = NULL;
    tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single();
Unfortunately, the current code misbehaves like this:
    tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single();
    tx_q->tx_skbuff_dma[N + 1].buf = NULL;
    tx_q->tx_skbuff_dma[N + 2].buf = NULL;

On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the
DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address
obviously, then the DMA buffer will be unmapped immediately.
There may be a rare case that the DMA engine does not finish the
pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go
horribly wrong, DMA is going to access a unmapped/unreferenced memory
region, corrupted data will be transmited or iommu fault will be
triggered :(

In contrast, the for-loop that maps SKB fragments behaves perfectly
as expected, and that is how the driver should do for both non-paged
data and paged frags actually.

This patch corrects DMA map/unmap sequences by fixing the array index
for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address.

Tested and verified on DWXGMAC CORE 3.20a

Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com>
Fixes: f748be531d70 ("stmmac: support new GMAC4")
Signed-off-by: Furong Xu <0x1207@gmail.com>
---
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

Comments

Hariprasad Kelam Oct. 21, 2024, 9:04 a.m. UTC | #1
On 2024-10-21 at 11:40:23, Furong Xu (0x1207@gmail.com) wrote:
> In case the non-paged data of a SKB carries protocol header and protocol
> payload to be transmitted on a certain platform that the DMA AXI address
> width is configured to 40-bit/48-bit, or the size of the non-paged data
> is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI
> address width is configured to 32-bit, then this SKB requires at least
> two DMA transmit descriptors to serve it.
> 
> For example, three descriptors are allocated to split one DMA buffer
> mapped from one piece of non-paged data:
>     dma_desc[N + 0],
>     dma_desc[N + 1],
>     dma_desc[N + 2].
> Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold
> extra information to be reused in stmmac_tx_clean():
>     tx_q->tx_skbuff_dma[N + 0],
>     tx_q->tx_skbuff_dma[N + 1],
>     tx_q->tx_skbuff_dma[N + 2].
> Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer
> address returned by DMA mapping call. stmmac_tx_clean() will try to
> unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf
> is a valid buffer address.
> 
> The expected behavior that saves DMA buffer address of this non-paged
> data to tx_q->tx_skbuff_dma[entry].buf is:
>     tx_q->tx_skbuff_dma[N + 0].buf = NULL;
>     tx_q->tx_skbuff_dma[N + 1].buf = NULL;
>     tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single();
> Unfortunately, the current code misbehaves like this:
>     tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single();
>     tx_q->tx_skbuff_dma[N + 1].buf = NULL;
>     tx_q->tx_skbuff_dma[N + 2].buf = NULL;
> 
> On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the
> DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address
> obviously, then the DMA buffer will be unmapped immediately.
> There may be a rare case that the DMA engine does not finish the
> pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go
> horribly wrong, DMA is going to access a unmapped/unreferenced memory
> region, corrupted data will be transmited or iommu fault will be
> triggered :(
> 
> In contrast, the for-loop that maps SKB fragments behaves perfectly
> as expected, and that is how the driver should do for both non-paged
> data and paged frags actually.
> 
> This patch corrects DMA map/unmap sequences by fixing the array index
> for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address.
> 
> Tested and verified on DWXGMAC CORE 3.20a
> 
> Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com>
> Fixes: f748be531d70 ("stmmac: support new GMAC4")
> Signed-off-by: Furong Xu <0x1207@gmail.com>
> ---
>  .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index d3895d7eecfc..208dbc68aaf9 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4304,11 +4304,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  	if (dma_mapping_error(priv->device, des))
>  		goto dma_map_err;
>  
> -	tx_q->tx_skbuff_dma[first_entry].buf = des;
> -	tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
> -	tx_q->tx_skbuff_dma[first_entry].map_as_page = false;
> -	tx_q->tx_skbuff_dma[first_entry].buf_type = STMMAC_TXBUF_T_SKB;
> -
>  	if (priv->dma_cap.addr64 <= 32) {
>  		first->des0 = cpu_to_le32(des);
>  
> @@ -4327,6 +4322,23 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
>  
> +	/* In case two or more DMA transmit descriptors are allocated for this
> +	 * non-paged SKB data, the DMA buffer address should be saved to
> +	 * tx_q->tx_skbuff_dma[].buf corresponding to the last descriptor,
> +	 * and leave the other tx_q->tx_skbuff_dma[].buf as NULL to guarantee
> +	 * that stmmac_tx_clean() does not unmap the entire DMA buffer too early
> +	 * since the tail areas of the DMA buffer can be accessed by DMA engine
> +	 * sooner or later.
> +	 * By saving the DMA buffer address to tx_q->tx_skbuff_dma[].buf
> +	 * corresponding to the last descriptor, stmmac_tx_clean() will unmap
> +	 * this DMA buffer right after the DMA engine completely finishes the
> +	 * full buffer transmission.
> +	 */
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> +
>  	/* Prepare fragments */
>  	for (i = 0; i < nfrags; i++) {
>  		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> -- 
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Simon Horman Oct. 21, 2024, 12:26 p.m. UTC | #2
On Mon, Oct 21, 2024 at 02:10:23PM +0800, Furong Xu wrote:
> In case the non-paged data of a SKB carries protocol header and protocol
> payload to be transmitted on a certain platform that the DMA AXI address
> width is configured to 40-bit/48-bit, or the size of the non-paged data
> is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI
> address width is configured to 32-bit, then this SKB requires at least
> two DMA transmit descriptors to serve it.
> 
> For example, three descriptors are allocated to split one DMA buffer
> mapped from one piece of non-paged data:
>     dma_desc[N + 0],
>     dma_desc[N + 1],
>     dma_desc[N + 2].
> Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold
> extra information to be reused in stmmac_tx_clean():
>     tx_q->tx_skbuff_dma[N + 0],
>     tx_q->tx_skbuff_dma[N + 1],
>     tx_q->tx_skbuff_dma[N + 2].
> Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer
> address returned by DMA mapping call. stmmac_tx_clean() will try to
> unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf
> is a valid buffer address.
> 
> The expected behavior that saves DMA buffer address of this non-paged
> data to tx_q->tx_skbuff_dma[entry].buf is:
>     tx_q->tx_skbuff_dma[N + 0].buf = NULL;
>     tx_q->tx_skbuff_dma[N + 1].buf = NULL;
>     tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single();
> Unfortunately, the current code misbehaves like this:
>     tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single();
>     tx_q->tx_skbuff_dma[N + 1].buf = NULL;
>     tx_q->tx_skbuff_dma[N + 2].buf = NULL;
> 
> On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the
> DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address
> obviously, then the DMA buffer will be unmapped immediately.
> There may be a rare case that the DMA engine does not finish the
> pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go
> horribly wrong, DMA is going to access a unmapped/unreferenced memory
> region, corrupted data will be transmited or iommu fault will be
> triggered :(
> 
> In contrast, the for-loop that maps SKB fragments behaves perfectly
> as expected, and that is how the driver should do for both non-paged
> data and paged frags actually.
> 
> This patch corrects DMA map/unmap sequences by fixing the array index
> for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address.
> 
> Tested and verified on DWXGMAC CORE 3.20a
> 
> Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com>
> Fixes: f748be531d70 ("stmmac: support new GMAC4")
> Signed-off-by: Furong Xu <0x1207@gmail.com>

Thanks for the very thorough explanation, much appreciated.

Reviewed-by: Simon Horman <horms@kernel.org>
patchwork-bot+netdevbpf@kernel.org Oct. 29, 2024, 10:40 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 21 Oct 2024 14:10:23 +0800 you wrote:
> In case the non-paged data of a SKB carries protocol header and protocol
> payload to be transmitted on a certain platform that the DMA AXI address
> width is configured to 40-bit/48-bit, or the size of the non-paged data
> is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI
> address width is configured to 32-bit, then this SKB requires at least
> two DMA transmit descriptors to serve it.
> 
> [...]

Here is the summary with links:
  - [net,v1] net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data
    https://git.kernel.org/netdev/net/c/66600fac7a98

You are awesome, thank you!
Jon Hunter Nov. 27, 2024, 6:39 p.m. UTC | #4
Hi Furong,

On 21/10/2024 07:10, Furong Xu wrote:
> In case the non-paged data of a SKB carries protocol header and protocol
> payload to be transmitted on a certain platform that the DMA AXI address
> width is configured to 40-bit/48-bit, or the size of the non-paged data
> is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI
> address width is configured to 32-bit, then this SKB requires at least
> two DMA transmit descriptors to serve it.
> 
> For example, three descriptors are allocated to split one DMA buffer
> mapped from one piece of non-paged data:
>      dma_desc[N + 0],
>      dma_desc[N + 1],
>      dma_desc[N + 2].
> Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold
> extra information to be reused in stmmac_tx_clean():
>      tx_q->tx_skbuff_dma[N + 0],
>      tx_q->tx_skbuff_dma[N + 1],
>      tx_q->tx_skbuff_dma[N + 2].
> Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer
> address returned by DMA mapping call. stmmac_tx_clean() will try to
> unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf
> is a valid buffer address.
> 
> The expected behavior that saves DMA buffer address of this non-paged
> data to tx_q->tx_skbuff_dma[entry].buf is:
>      tx_q->tx_skbuff_dma[N + 0].buf = NULL;
>      tx_q->tx_skbuff_dma[N + 1].buf = NULL;
>      tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single();
> Unfortunately, the current code misbehaves like this:
>      tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single();
>      tx_q->tx_skbuff_dma[N + 1].buf = NULL;
>      tx_q->tx_skbuff_dma[N + 2].buf = NULL;
> 
> On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the
> DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address
> obviously, then the DMA buffer will be unmapped immediately.
> There may be a rare case that the DMA engine does not finish the
> pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go
> horribly wrong, DMA is going to access a unmapped/unreferenced memory
> region, corrupted data will be transmited or iommu fault will be
> triggered :(
> 
> In contrast, the for-loop that maps SKB fragments behaves perfectly
> as expected, and that is how the driver should do for both non-paged
> data and paged frags actually.
> 
> This patch corrects DMA map/unmap sequences by fixing the array index
> for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address.
> 
> Tested and verified on DWXGMAC CORE 3.20a
> 
> Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com>
> Fixes: f748be531d70 ("stmmac: support new GMAC4")
> Signed-off-by: Furong Xu <0x1207@gmail.com>
> ---
>   .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++-----
>   1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index d3895d7eecfc..208dbc68aaf9 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4304,11 +4304,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>   	if (dma_mapping_error(priv->device, des))
>   		goto dma_map_err;
>   
> -	tx_q->tx_skbuff_dma[first_entry].buf = des;
> -	tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
> -	tx_q->tx_skbuff_dma[first_entry].map_as_page = false;
> -	tx_q->tx_skbuff_dma[first_entry].buf_type = STMMAC_TXBUF_T_SKB;
> -
>   	if (priv->dma_cap.addr64 <= 32) {
>   		first->des0 = cpu_to_le32(des);
>   
> @@ -4327,6 +4322,23 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>   
>   	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
>   
> +	/* In case two or more DMA transmit descriptors are allocated for this
> +	 * non-paged SKB data, the DMA buffer address should be saved to
> +	 * tx_q->tx_skbuff_dma[].buf corresponding to the last descriptor,
> +	 * and leave the other tx_q->tx_skbuff_dma[].buf as NULL to guarantee
> +	 * that stmmac_tx_clean() does not unmap the entire DMA buffer too early
> +	 * since the tail areas of the DMA buffer can be accessed by DMA engine
> +	 * sooner or later.
> +	 * By saving the DMA buffer address to tx_q->tx_skbuff_dma[].buf
> +	 * corresponding to the last descriptor, stmmac_tx_clean() will unmap
> +	 * this DMA buffer right after the DMA engine completely finishes the
> +	 * full buffer transmission.
> +	 */
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
> +	tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> +
>   	/* Prepare fragments */
>   	for (i = 0; i < nfrags; i++) {
>   		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];


I have noticed a lot of intermittent failures on a couple of our boards
starting with Linux v6.12. I have finally bisected the issue to this
change and reverting this change fixes the problem.

The boards where I am seeing this issue on are our Tegra186 Jetson TX2
(tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
(tegra194-p2972-0000).

Tegra184 has:
  dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x41

Tegra194 has:
  dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x50

Otherwise all the other propreties printed on boot are the same for both ...

  dwc-eth-dwmac 2490000.ethernet: 	DWMAC4/5
  dwc-eth-dwmac 2490000.ethernet: DMA HW capability register supported
  dwc-eth-dwmac 2490000.ethernet: RX Checksum Offload Engine supported
  dwc-eth-dwmac 2490000.ethernet: TX Checksum insertion supported
  dwc-eth-dwmac 2490000.ethernet: Wake-Up On Lan supported
  dwc-eth-dwmac 2490000.ethernet: TSO supported
  dwc-eth-dwmac 2490000.ethernet: Enable RX Mitigation via HW Watchdog Timer
  dwc-eth-dwmac 2490000.ethernet: Enabled L3L4 Flow TC (entries=8)
  dwc-eth-dwmac 2490000.ethernet: Enabled RFS Flow TC (entries=10)
  dwc-eth-dwmac 2490000.ethernet: TSO feature enabled
  dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width


Looking at the console logs, when the problem occurs I see the
following prints ...

[  245.571688] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
[  245.575349] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed

I also caught this crash ...

[  245.576690] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[  245.576715] Mem abort info:
[  245.577009]   ESR = 0x0000000096000004
[  245.577040]   EC = 0x25: DABT (current EL), IL = 32 bits
[  245.577142]   SET = 0, FnV = 0
[  245.577355]   EA = 0, S1PTW = 0
[  245.577439]   FSC = 0x04: level 0 translation fault
[  245.577557] Data abort info:
[  245.577628]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  245.577753]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  245.577878]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  245.578018] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106300000
[  245.578168] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
[  245.578390] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  245.578528] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra_pcm tegra_drm snd_soc_tegra186_asrc snd_soc_tegra210_mixer snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic drm_dp_aux_bus snd_soc_tegra210_adx snd_soc_tegra210_amx cec snd_soc_tegra210_sfc drm_display_helper snd_soc_tegra210_i2s drm_kms_helper snd_soc_tegra_audio_graph_card ucsi_ccg typec_ucsi snd_soc_rt5659 snd_soc_audio_graph_card drm backlight tegra210_adma snd_soc_tegra210_ahub crct10dif_ce snd_soc_simple_card_utils pwm_fan snd_soc_rl6231 typec ina3221 snd_hda_codec_hdmi tegra_aconnect pwm_tegra snd_hda_tegra snd_hda_codec snd_hda_core phy_tegra194_p2u tegra_xudc at24 lm90 pcie_tegra194 host1x tegra_bpmp_thermal ip_tables x_tables ipv6
[  245.626942] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.12.0 #5
[  245.635072] Tainted: [W]=WARN
[  245.638220] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[  245.645039] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  245.651870] pc : skb_release_data+0x100/0x1e4
[  245.656334] lr : sk_skb_reason_drop+0x44/0xb0
[  245.660797] sp : ffff800080003c80
[  245.664206] x29: ffff800080003c80 x28: ffff000083d58980 x27: 0000000000000900
[  245.671813] x26: ffff000083d5c980 x25: ffff0000937c03c0 x24: 0000000000000002
[  245.678906] x23: 00000000ffffffff x22: 0000000000000001 x21: ffff000096ae8200
[  245.686258] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000004860
[  245.693605] x17: ffff80037be97000 x16: ffff800080000000 x15: ffff8000827d4968
[  245.700870] x14: fffffffffffe485f x13: 2e656572662d7265 x12: 7466612d65737520
[  245.707957] x11: ffff8000827d49e8 x10: ffff8000827d49e8 x9 : 00000000ffffefff
[  245.715306] x8 : ffff80008282c9e8 x7 : 0000000000017fe8 x6 : 00000000fffff000
[  245.722825] x5 : ffff0003fde07348 x4 : ffff0000937c03c0 x3 : ffff0000937c0280
[  245.729762] x2 : 0000000000000140 x1 : ffff0000937c03c0 x0 : 0000000000000000
[  245.737009] Call trace:
[  245.739459]  skb_release_data+0x100/0x1e4
[  245.743657]  sk_skb_reason_drop+0x44/0xb0
[  245.747684]  dev_kfree_skb_any_reason+0x44/0x50
[  245.752490]  stmmac_tx_clean+0x1ec/0x798
[  245.756177]  stmmac_napi_poll_tx+0x6c/0x144
[  245.760199]  __napi_poll+0x38/0x190
[  245.763868]  net_rx_action+0x140/0x294
[  245.767888]  handle_softirqs+0x120/0x24c
[  245.771574]  __do_softirq+0x14/0x20
[  245.775326]  ____do_softirq+0x10/0x1c
[  245.778748]  call_on_irq_stack+0x24/0x4c
[  245.782510]  do_softirq_own_stack+0x1c/0x2c
[  245.786964]  irq_exit_rcu+0x8c/0xc4
[  245.790463]  el1_interrupt+0x38/0x68
[  245.794139]  el1h_64_irq_handler+0x18/0x24
[  245.798166]  el1h_64_irq+0x64/0x68
[  245.801318]  default_idle_call+0x28/0x3c
[  245.805166]  do_idle+0x208/0x264
[  245.808576]  cpu_startup_entry+0x34/0x3c
[  245.812088]  kernel_init+0x0/0x1d8
[  245.815594]  start_kernel+0x5c0/0x708
[  245.819076]  __primary_switched+0x80/0x88
[  245.823295] Code: 97fff632 72001c1f 54000161 370005b3 (f9400661)
[  245.829151] ---[ end trace 0000000000000000 ]---


And here is another crash ...

[  149.986210] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
[  149.992845] kernel BUG at lib/dynamic_queue_limits.c:99!
[  149.998152] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[  150.004928] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic snd_soc_tegra186_dspk snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_sfc snd_soc_tegra210_i2s tegra_drm drm_dp_aux_bus cec drm_display_helper drm_kms_helper tegra210_adma snd_soc_tegra210_ahub drm backlight snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_simple_card_utils crct10dif_ce tegra_bpmp_thermal at24 tegra_aconnect snd_hda_codec_hdmi tegra_xudc snd_hda_tegra snd_hda_codec snd_hda_core ina3221 host1x ip_tables x_tables ipv6
[  150.061268] CPU: 5 UID: 102 PID: 240 Comm: systemd-resolve Tainted: G S      W          6.12.0-dirty #7
[  150.070654] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[  150.075438] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
[  150.081348] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  150.088303] pc : dql_completed+0x1fc/0x20c
[  150.092401] lr : stmmac_tx_clean+0x3b4/0x798
[  150.096669] sp : ffff800082d73d00
[  150.099979] x29: ffff800082d73d00 x28: ffff000080898980 x27: 0000000000002ce0
[  150.107115] x26: ffff00008089c980 x25: 0000000000000000 x24: ffff000083c88000
[  150.114248] x23: 0000000000000000 x22: 0000000000000001 x21: ffff000080898980
[  150.121380] x20: 0000000000000000 x19: 0000000000000168 x18: 0000000000006540
[  150.128513] x17: ffff800172d32000 x16: ffff800082d70000 x15: ffff8000827d4968
[  150.135646] x14: fffffffffffe653f x13: 2e656572662d7265 x12: 7466612d65737520
[  150.142781] x11: ffff8000827d49e8 x10: ffff8000827d49e8 x9 : 0000000000000000
[  150.149913] x8 : 000000000003ca11 x7 : 0000000000017fe8 x6 : 000000000003ca11
[  150.157046] x5 : ffff000080d09140 x4 : ffff0001f4cb0840 x3 : 000000000010fe05
[  150.164181] x2 : 0000000000000000 x1 : 000000000000004a x0 : ffff000083c88080
[  150.171314] Call trace:
[  150.173757]  dql_completed+0x1fc/0x20c
[  150.177507]  stmmac_napi_poll_tx+0x6c/0x144
[  150.181688]  __napi_poll+0x38/0x190
[  150.185174]  net_rx_action+0x140/0x294
[  150.188921]  handle_softirqs+0x120/0x24c
[  150.192843]  __do_softirq+0x14/0x20
[  150.196328]  ____do_softirq+0x10/0x1c
[  150.199987]  call_on_irq_stack+0x24/0x4c
[  150.203908]  do_softirq_own_stack+0x1c/0x2c
[  150.208088]  do_softirq+0x54/0x6c
[  150.211401]  __local_bh_enable_ip+0x8c/0x98
[  150.215583]  __dev_queue_xmit+0x4e4/0xd6c
[  150.219588]  ip_finish_output2+0x4cc/0x5e8
[  150.223682]  __ip_finish_output+0xac/0x17c
[  150.227776]  ip_finish_output+0x34/0x10c
[  150.231696]  ip_output+0x68/0xfc
[  150.234921]  __ip_queue_xmit+0x16c/0x464
[  150.238840]  ip_queue_xmit+0x14/0x20
[  150.242413]  __tcp_transmit_skb+0x490/0xc4c
[  150.246593]  tcp_connect+0xa08/0xdbc
[  150.250167]  tcp_v4_connect+0x35c/0x494
[  150.253999]  __inet_stream_connect+0xf8/0x3c8
[  150.258354]  inet_stream_connect+0x48/0x70
[  150.262447]  __sys_connect+0xe0/0xfc
[  150.266021]  __arm64_sys_connect+0x20/0x30
[  150.270113]  invoke_syscall+0x48/0x110
[  150.273860]  el0_svc_common.constprop.0+0xc8/0xe8
[  150.278561]  do_el0_svc+0x20/0x2c
[  150.281875]  el0_svc+0x30/0xd0
[  150.284929]  el0t_64_sync_handler+0x13c/0x158
[  150.289282]  el0t_64_sync+0x190/0x194
[  150.292945] Code: 7a401860 5400008b 2a0403e3 17ffff9c (d4210000)
[  150.299033] ---[ end trace 0000000000000000 ]---
[  150.303647] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt

Let me know if you need any more information.

Thanks
Jon
Furong Xu Nov. 28, 2024, 6:45 a.m. UTC | #5
Hi Jon,

On Wed, 27 Nov 2024 18:39:53 +0000, Jon Hunter <jonathanh@nvidia.com> wrote:
> 
> I have noticed a lot of intermittent failures on a couple of our boards
> starting with Linux v6.12. I have finally bisected the issue to this
> change and reverting this change fixes the problem.
> 
> The boards where I am seeing this issue on are our Tegra186 Jetson TX2
> (tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
> (tegra194-p2972-0000).
> 
> Tegra184 has:
>   dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x41
> 
> Tegra194 has:
>   dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x50
> 
> Otherwise all the other propreties printed on boot are the same for both ...
> 
>   dwc-eth-dwmac 2490000.ethernet: 	DWMAC4/5
>   dwc-eth-dwmac 2490000.ethernet: DMA HW capability register supported
>   dwc-eth-dwmac 2490000.ethernet: RX Checksum Offload Engine supported
>   dwc-eth-dwmac 2490000.ethernet: TX Checksum insertion supported
>   dwc-eth-dwmac 2490000.ethernet: Wake-Up On Lan supported
>   dwc-eth-dwmac 2490000.ethernet: TSO supported
>   dwc-eth-dwmac 2490000.ethernet: Enable RX Mitigation via HW Watchdog Timer
>   dwc-eth-dwmac 2490000.ethernet: Enabled L3L4 Flow TC (entries=8)
>   dwc-eth-dwmac 2490000.ethernet: Enabled RFS Flow TC (entries=10)
>   dwc-eth-dwmac 2490000.ethernet: TSO feature enabled
>   dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width
> 
> 
> Looking at the console logs, when the problem occurs I see the
> following prints ...
> 
> [  245.571688] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> [  245.575349] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> 
> I also caught this crash ...
> 
> [  245.576690] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> [  245.576715] Mem abort info:
> [  245.577009]   ESR = 0x0000000096000004
> [  245.577040]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  245.577142]   SET = 0, FnV = 0
> [  245.577355]   EA = 0, S1PTW = 0
> [  245.577439]   FSC = 0x04: level 0 translation fault
> [  245.577557] Data abort info:
> [  245.577628]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [  245.577753]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [  245.577878]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  245.578018] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106300000
> [  245.578168] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> [  245.578390] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [  245.578528] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra_pcm tegra_drm snd_soc_tegra186_asrc snd_soc_tegra210_mixer snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic drm_dp_aux_bus snd_soc_tegra210_adx snd_soc_tegra210_amx cec snd_soc_tegra210_sfc drm_display_helper snd_soc_tegra210_i2s drm_kms_helper snd_soc_tegra_audio_graph_card ucsi_ccg typec_ucsi snd_soc_rt5659 snd_soc_audio_graph_card drm backlight tegra210_adma snd_soc_tegra210_ahub crct10dif_ce snd_soc_simple_card_utils pwm_fan snd_soc_rl6231 typec ina3221 snd_hda_codec_hdmi tegra_aconnect pwm_tegra snd_hda_tegra snd_hda_codec snd_hda_core phy_tegra194_p2u tegra_xudc at24 lm90 pcie_tegra194 host1x tegra_bpmp_thermal ip_tables x_tables ipv6
> [  245.626942] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.12.0 #5
> [  245.635072] Tainted: [W]=WARN
> [  245.638220] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> [  245.645039] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  245.651870] pc : skb_release_data+0x100/0x1e4
> [  245.656334] lr : sk_skb_reason_drop+0x44/0xb0
> [  245.660797] sp : ffff800080003c80
> [  245.664206] x29: ffff800080003c80 x28: ffff000083d58980 x27: 0000000000000900
> [  245.671813] x26: ffff000083d5c980 x25: ffff0000937c03c0 x24: 0000000000000002
> [  245.678906] x23: 00000000ffffffff x22: 0000000000000001 x21: ffff000096ae8200
> [  245.686258] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000004860
> [  245.693605] x17: ffff80037be97000 x16: ffff800080000000 x15: ffff8000827d4968
> [  245.700870] x14: fffffffffffe485f x13: 2e656572662d7265 x12: 7466612d65737520
> [  245.707957] x11: ffff8000827d49e8 x10: ffff8000827d49e8 x9 : 00000000ffffefff
> [  245.715306] x8 : ffff80008282c9e8 x7 : 0000000000017fe8 x6 : 00000000fffff000
> [  245.722825] x5 : ffff0003fde07348 x4 : ffff0000937c03c0 x3 : ffff0000937c0280
> [  245.729762] x2 : 0000000000000140 x1 : ffff0000937c03c0 x0 : 0000000000000000
> [  245.737009] Call trace:
> [  245.739459]  skb_release_data+0x100/0x1e4
> [  245.743657]  sk_skb_reason_drop+0x44/0xb0
> [  245.747684]  dev_kfree_skb_any_reason+0x44/0x50
> [  245.752490]  stmmac_tx_clean+0x1ec/0x798
> [  245.756177]  stmmac_napi_poll_tx+0x6c/0x144
> [  245.760199]  __napi_poll+0x38/0x190
> [  245.763868]  net_rx_action+0x140/0x294
> [  245.767888]  handle_softirqs+0x120/0x24c
> [  245.771574]  __do_softirq+0x14/0x20
> [  245.775326]  ____do_softirq+0x10/0x1c
> [  245.778748]  call_on_irq_stack+0x24/0x4c
> [  245.782510]  do_softirq_own_stack+0x1c/0x2c
> [  245.786964]  irq_exit_rcu+0x8c/0xc4
> [  245.790463]  el1_interrupt+0x38/0x68
> [  245.794139]  el1h_64_irq_handler+0x18/0x24
> [  245.798166]  el1h_64_irq+0x64/0x68
> [  245.801318]  default_idle_call+0x28/0x3c
> [  245.805166]  do_idle+0x208/0x264
> [  245.808576]  cpu_startup_entry+0x34/0x3c
> [  245.812088]  kernel_init+0x0/0x1d8
> [  245.815594]  start_kernel+0x5c0/0x708
> [  245.819076]  __primary_switched+0x80/0x88
> [  245.823295] Code: 97fff632 72001c1f 54000161 370005b3 (f9400661)
> [  245.829151] ---[ end trace 0000000000000000 ]---
> 
> 
> And here is another crash ...
> 
> [  149.986210] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> [  149.992845] kernel BUG at lib/dynamic_queue_limits.c:99!
> [  149.998152] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> [  150.004928] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic snd_soc_tegra186_dspk snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_sfc snd_soc_tegra210_i2s tegra_drm drm_dp_aux_bus cec drm_display_helper drm_kms_helper tegra210_adma snd_soc_tegra210_ahub drm backlight snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_simple_card_utils crct10dif_ce tegra_bpmp_thermal at24 tegra_aconnect snd_hda_codec_hdmi tegra_xudc snd_hda_tegra snd_hda_codec snd_hda_core ina3221 host1x ip_tables x_tables ipv6
> [  150.061268] CPU: 5 UID: 102 PID: 240 Comm: systemd-resolve Tainted: G S      W          6.12.0-dirty #7
> [  150.070654] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> [  150.075438] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
> [  150.081348] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  150.088303] pc : dql_completed+0x1fc/0x20c
> [  150.092401] lr : stmmac_tx_clean+0x3b4/0x798
> [  150.096669] sp : ffff800082d73d00
> [  150.099979] x29: ffff800082d73d00 x28: ffff000080898980 x27: 0000000000002ce0
> [  150.107115] x26: ffff00008089c980 x25: 0000000000000000 x24: ffff000083c88000
> [  150.114248] x23: 0000000000000000 x22: 0000000000000001 x21: ffff000080898980
> [  150.121380] x20: 0000000000000000 x19: 0000000000000168 x18: 0000000000006540
> [  150.128513] x17: ffff800172d32000 x16: ffff800082d70000 x15: ffff8000827d4968
> [  150.135646] x14: fffffffffffe653f x13: 2e656572662d7265 x12: 7466612d65737520
> [  150.142781] x11: ffff8000827d49e8 x10: ffff8000827d49e8 x9 : 0000000000000000
> [  150.149913] x8 : 000000000003ca11 x7 : 0000000000017fe8 x6 : 000000000003ca11
> [  150.157046] x5 : ffff000080d09140 x4 : ffff0001f4cb0840 x3 : 000000000010fe05
> [  150.164181] x2 : 0000000000000000 x1 : 000000000000004a x0 : ffff000083c88080
> [  150.171314] Call trace:
> [  150.173757]  dql_completed+0x1fc/0x20c
> [  150.177507]  stmmac_napi_poll_tx+0x6c/0x144
> [  150.181688]  __napi_poll+0x38/0x190
> [  150.185174]  net_rx_action+0x140/0x294
> [  150.188921]  handle_softirqs+0x120/0x24c
> [  150.192843]  __do_softirq+0x14/0x20
> [  150.196328]  ____do_softirq+0x10/0x1c
> [  150.199987]  call_on_irq_stack+0x24/0x4c
> [  150.203908]  do_softirq_own_stack+0x1c/0x2c
> [  150.208088]  do_softirq+0x54/0x6c
> [  150.211401]  __local_bh_enable_ip+0x8c/0x98
> [  150.215583]  __dev_queue_xmit+0x4e4/0xd6c
> [  150.219588]  ip_finish_output2+0x4cc/0x5e8
> [  150.223682]  __ip_finish_output+0xac/0x17c
> [  150.227776]  ip_finish_output+0x34/0x10c
> [  150.231696]  ip_output+0x68/0xfc
> [  150.234921]  __ip_queue_xmit+0x16c/0x464
> [  150.238840]  ip_queue_xmit+0x14/0x20
> [  150.242413]  __tcp_transmit_skb+0x490/0xc4c
> [  150.246593]  tcp_connect+0xa08/0xdbc
> [  150.250167]  tcp_v4_connect+0x35c/0x494
> [  150.253999]  __inet_stream_connect+0xf8/0x3c8
> [  150.258354]  inet_stream_connect+0x48/0x70
> [  150.262447]  __sys_connect+0xe0/0xfc
> [  150.266021]  __arm64_sys_connect+0x20/0x30
> [  150.270113]  invoke_syscall+0x48/0x110
> [  150.273860]  el0_svc_common.constprop.0+0xc8/0xe8
> [  150.278561]  do_el0_svc+0x20/0x2c
> [  150.281875]  el0_svc+0x30/0xd0
> [  150.284929]  el0t_64_sync_handler+0x13c/0x158
> [  150.289282]  el0t_64_sync+0x190/0x194
> [  150.292945] Code: 7a401860 5400008b 2a0403e3 17ffff9c (d4210000)
> [  150.299033] ---[ end trace 0000000000000000 ]---
> [  150.303647] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
> 
> Let me know if you need any more information.
> 

[  149.986210] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
and
[  245.571688] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
[  245.575349] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
are reported by stmmac_xmit() obviously, but not stmmac_tso_xmit().

And these crashes are caused by "Tx DMA map failed", as you can see that
current driver code does not handle this kind of failure so well. It is clear
that we need to figure out why Tx DMA map failed.

This patch corrects the sequence and timing of DMA unmap by waiting all
DMA transmit descriptors to be closed by DMA engine for one DMA map in
stmmac_tso_xmit(), it never leaks DMA addresses and never introduces
other side effect.

"Tx DMA map failed" is a weird failure, and I cannot reproduce this failure
on my device with DWMAC CORE 5.10a(Synopsys ID: 0x51) and DWXGMAC CORE 3.20a.
Jakub Kicinski Dec. 3, 2024, 12:33 a.m. UTC | #6
On Thu, 28 Nov 2024 14:45:01 +0800 Furong Xu wrote:
> > Let me know if you need any more information.
> 
> [  149.986210] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> and
> [  245.571688] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> [  245.575349] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> are reported by stmmac_xmit() obviously, but not stmmac_tso_xmit().
> 
> And these crashes are caused by "Tx DMA map failed", as you can see that
> current driver code does not handle this kind of failure so well. It is clear
> that we need to figure out why Tx DMA map failed.
> 
> This patch corrects the sequence and timing of DMA unmap by waiting all
> DMA transmit descriptors to be closed by DMA engine for one DMA map in
> stmmac_tso_xmit(), it never leaks DMA addresses and never introduces
> other side effect.
> 
> "Tx DMA map failed" is a weird failure, and I cannot reproduce this failure
> on my device with DWMAC CORE 5.10a(Synopsys ID: 0x51) and DWXGMAC CORE 3.20a.

Let me repeat Jon's question - is there any info or test you need from
Jon to make progress with a fix?

If Jon's board worked before and doesn't work with this patch we will
need *a* fix, if no fix is provided our only choice is revert.
Furong Xu Dec. 3, 2024, 2:03 a.m. UTC | #7
Hi Jakub,

On Mon, 2 Dec 2024 16:33:09 -0800, Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 28 Nov 2024 14:45:01 +0800 Furong Xu wrote:
> > > Let me know if you need any more information.  
> > 
> > [  149.986210] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> > and
> > [  245.571688] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> > [  245.575349] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed
> > are reported by stmmac_xmit() obviously, but not stmmac_tso_xmit().
> > 
> > And these crashes are caused by "Tx DMA map failed", as you can see that
> > current driver code does not handle this kind of failure so well. It is clear
> > that we need to figure out why Tx DMA map failed.
> > 
> > This patch corrects the sequence and timing of DMA unmap by waiting all
> > DMA transmit descriptors to be closed by DMA engine for one DMA map in
> > stmmac_tso_xmit(), it never leaks DMA addresses and never introduces
> > other side effect.
> > 
> > "Tx DMA map failed" is a weird failure, and I cannot reproduce this failure
> > on my device with DWMAC CORE 5.10a(Synopsys ID: 0x51) and DWXGMAC CORE 3.20a.  
> 
> Let me repeat Jon's question - is there any info or test you need from
> Jon to make progress with a fix?
> 
> If Jon's board worked before and doesn't work with this patch we will
> need *a* fix, if no fix is provided our only choice is revert.

Thanks for your attention to this issue.

I requested Jon to provide more info about "Tx DMA map failed" in previous
reply, and he does not respond yet.
This seems to be a device-specific issue, no fix can be provided without
his reply :(
Jakub Kicinski Dec. 3, 2024, 2:34 a.m. UTC | #8
On Tue, 3 Dec 2024 10:03:31 +0800 Furong Xu wrote:
> I requested Jon to provide more info about "Tx DMA map failed" in previous
> reply, and he does not respond yet.

What does it mean to provide "more info" about a print statement from
the driver? Is there a Kconfig which he needs to set to get more info?
Perhaps you should provide a debug patch he can apply on his tree, that
will print info about (1) which buffer mapping failed (head or frags);
(2) what the physical address was of the buffer that couldn't be mapped.
Furong Xu Dec. 3, 2024, 3:16 a.m. UTC | #9
On Mon, 2 Dec 2024 18:34:25 -0800, Jakub Kicinski <kuba@kernel.org> wrote:

> On Tue, 3 Dec 2024 10:03:31 +0800 Furong Xu wrote:
> > I requested Jon to provide more info about "Tx DMA map failed" in previous
> > reply, and he does not respond yet.  
> 
> What does it mean to provide "more info" about a print statement from
> the driver? Is there a Kconfig which he needs to set to get more info?
> Perhaps you should provide a debug patch he can apply on his tree, that
> will print info about (1) which buffer mapping failed (head or frags);
> (2) what the physical address was of the buffer that couldn't be mapped.

A debug patch to print info about buffer makes no sense here.
Both Tegra186 Jetson TX2(tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
(tegra194-p2972-0000) enable IOMMU/SMMU for stmmac in its device-tree node,
buffer info should be investigated with detailed IOMMU/SMMU debug info from
drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c together.

I am not an expert in IOMMU, so I cannot help more.

Without the help from Jon, our only choice is revert as you said.

Thanks.
Thierry Reding Dec. 4, 2024, 1:57 p.m. UTC | #10
On Tue, Dec 03, 2024 at 11:16:37AM +0800, Furong Xu wrote:
> On Mon, 2 Dec 2024 18:34:25 -0800, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Tue, 3 Dec 2024 10:03:31 +0800 Furong Xu wrote:
> > > I requested Jon to provide more info about "Tx DMA map failed" in previous
> > > reply, and he does not respond yet.  
> > 
> > What does it mean to provide "more info" about a print statement from
> > the driver? Is there a Kconfig which he needs to set to get more info?
> > Perhaps you should provide a debug patch he can apply on his tree, that
> > will print info about (1) which buffer mapping failed (head or frags);
> > (2) what the physical address was of the buffer that couldn't be mapped.
> 
> A debug patch to print info about buffer makes no sense here.
> Both Tegra186 Jetson TX2(tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
> (tegra194-p2972-0000) enable IOMMU/SMMU for stmmac in its device-tree node,
> buffer info should be investigated with detailed IOMMU/SMMU debug info from
> drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c together.
> 
> I am not an expert in IOMMU, so I cannot help more.
> 
> Without the help from Jon, our only choice is revert as you said.

I was able to reproduce this locally and I get this splat:

--- >8 ---
[  228.179234] WARNING: CPU: 0 PID: 0 at drivers/iommu/io-pgtable-arm.c:346 __arm_lpae_map+0x388/0x4e4
[  228.188300] Modules linked in: snd_soc_tegra210_mixer snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra186_asrc snd_soc_tegra210_ope snd_soc_tegra210_adx snd_soc_tegra210_mvc snd_soc_tegra210_dmic snd_soc_tegra186_dspk snd_soc_tegra210_sfc snd_soc_tegra210_amx snd_soc_tegra210_i2s tegra_drm drm_dp_aux_bus cec drm_display_helper drm_client_lib tegra210_adma snd_soc_tegra210_ahub drm_kms_helper snd_hda_codec_hdmi snd_hda_tegra snd_soc_tegra_audio_graph_card at24 snd_hda_codec ina3221 snd_soc_audio_graph_card snd_soc_simple_card_utils tegra_bpmp_thermal tegra_xudc snd_hda_core tegra_aconnect host1x fuse drm backlight ipv6
[  228.243750] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S                 6.13.0-rc1-next-20241203 #30
[  228.253412] Tainted: [S]=CPU_OUT_OF_SPEC
[  228.257336] Hardware name: nvidia NVIDIA P2771-0000-500/NVIDIA P2771-0000-500, BIOS 2025.01-rc3-00040-g36352ae2e68e-dirty 01/01/2025
[  228.269239] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  228.276205] pc : __arm_lpae_map+0x388/0x4e4
[  228.280398] lr : __arm_lpae_map+0x120/0x4e4
[  228.284587] sp : ffff8000800037f0
[  228.287901] x29: ffff800080003800 x28: 0000000000000002 x27: 0000000000000001
[  228.295050] x26: 0000000000000001 x25: 0000000111580000 x24: 0000000000001000
[  228.302197] x23: 000000ffffc72000 x22: 0000000000000ec0 x21: 0000000000000003
[  228.309342] x20: 0000000000000001 x19: ffff00008574b000 x18: 0000000000000001
[  228.316486] x17: 0000000000000000 x16: 0000000000000001 x15: ffff800080003ad0
[  228.323631] x14: ffff00008574d000 x13: 0000000000000000 x12: 0000000000000001
[  228.330775] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000001000
[  228.337921] x8 : ffff00008674c390 x7 : ffff00008674c000 x6 : 0000000000000003
[  228.345066] x5 : 0000000000000003 x4 : 0000000000000001 x3 : 0000000000000002
[  228.352209] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff00008574b000
[  228.359356] Call trace:
[  228.361807]  __arm_lpae_map+0x388/0x4e4 (P)
[  228.366002]  __arm_lpae_map+0x120/0x4e4 (L)
[  228.370198]  __arm_lpae_map+0x120/0x4e4
[  228.374042]  __arm_lpae_map+0x120/0x4e4
[  228.377886]  __arm_lpae_map+0x120/0x4e4
[  228.381730]  arm_lpae_map_pages+0x108/0x250
[  228.385922]  arm_smmu_map_pages+0x40/0x120
[  228.390029]  __iommu_map+0xfc/0x1bc
[  228.393525]  iommu_map+0x38/0xc0
[  228.396759]  __iommu_dma_map+0xb4/0x1a4
[  228.400604]  iommu_dma_map_page+0x14c/0x27c
[  228.404795]  dma_map_page_attrs+0x1fc/0x280
[  228.408987]  stmmac_tso_xmit+0x2d0/0xbac
[  228.412920]  stmmac_xmit+0x230/0xd14
[  228.416505]  dev_hard_start_xmit+0x94/0x11c
[  228.420697]  sch_direct_xmit+0x8c/0x380
[  228.424540]  __qdisc_run+0x11c/0x66c
[  228.428121]  net_tx_action+0x168/0x228
[  228.431875]  handle_softirqs+0x100/0x244
[  228.435809]  __do_softirq+0x14/0x20
[  228.439303]  ____do_softirq+0x10/0x20
[  228.442972]  call_on_irq_stack+0x24/0x64
[  228.446903]  do_softirq_own_stack+0x1c/0x40
[  228.451091]  __irq_exit_rcu+0xd4/0x10c
[  228.454847]  irq_exit_rcu+0x10/0x1c
[  228.458343]  el1_interrupt+0x38/0x68
[  228.461927]  el1h_64_irq_handler+0x18/0x24
[  228.466032]  el1h_64_irq+0x6c/0x70
[  228.469438]  default_idle_call+0x28/0x58 (P)
[  228.473718]  default_idle_call+0x24/0x58 (L)
[  228.477998]  do_idle+0x1fc/0x260
[  228.481234]  cpu_startup_entry+0x34/0x3c
[  228.485163]  rest_init+0xdc/0xe0
[  228.488401]  console_on_rootfs+0x0/0x6c
[  228.492250]  __primary_switched+0x88/0x90
[  228.496270] ---[ end trace 0000000000000000 ]---
[  228.500950] dwc-eth-dwmac 2490000.ethernet: Tx dma map failed
--- >8 ---

This looks to be slightly different from what Jon was seeing. Looking at
the WARN_ON() that triggers this, it seems like for some reason the page
is getting mapped twice.

Not exactly sure why that would be happening, so adding Robin and Will,
maybe they can shed some light on this from the ARM SMMU side.

Robin, Will, any idea who could be the culprit here? Is this a map/unmap
imbalance or something else entirely?

A lot of the context isn't present in this thread anymore, but here's a
link to the top of the thread:

	https://lore.kernel.org/netdev/d8112193-0386-4e14-b516-37c2d838171a@nvidia.com/T/

Thanks,
Thierry
Robin Murphy Dec. 4, 2024, 2:06 p.m. UTC | #11
On 2024-12-04 1:57 pm, Thierry Reding wrote:
> On Tue, Dec 03, 2024 at 11:16:37AM +0800, Furong Xu wrote:
>> On Mon, 2 Dec 2024 18:34:25 -0800, Jakub Kicinski <kuba@kernel.org> wrote:
>>
>>> On Tue, 3 Dec 2024 10:03:31 +0800 Furong Xu wrote:
>>>> I requested Jon to provide more info about "Tx DMA map failed" in previous
>>>> reply, and he does not respond yet.
>>>
>>> What does it mean to provide "more info" about a print statement from
>>> the driver? Is there a Kconfig which he needs to set to get more info?
>>> Perhaps you should provide a debug patch he can apply on his tree, that
>>> will print info about (1) which buffer mapping failed (head or frags);
>>> (2) what the physical address was of the buffer that couldn't be mapped.
>>
>> A debug patch to print info about buffer makes no sense here.
>> Both Tegra186 Jetson TX2(tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
>> (tegra194-p2972-0000) enable IOMMU/SMMU for stmmac in its device-tree node,
>> buffer info should be investigated with detailed IOMMU/SMMU debug info from
>> drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c together.
>>
>> I am not an expert in IOMMU, so I cannot help more.
>>
>> Without the help from Jon, our only choice is revert as you said.
> 
> I was able to reproduce this locally and I get this splat:
> 
> --- >8 ---
> [  228.179234] WARNING: CPU: 0 PID: 0 at drivers/iommu/io-pgtable-arm.c:346 __arm_lpae_map+0x388/0x4e4
> [  228.188300] Modules linked in: snd_soc_tegra210_mixer snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra186_asrc snd_soc_tegra210_ope snd_soc_tegra210_adx snd_soc_tegra210_mvc snd_soc_tegra210_dmic snd_soc_tegra186_dspk snd_soc_tegra210_sfc snd_soc_tegra210_amx snd_soc_tegra210_i2s tegra_drm drm_dp_aux_bus cec drm_display_helper drm_client_lib tegra210_adma snd_soc_tegra210_ahub drm_kms_helper snd_hda_codec_hdmi snd_hda_tegra snd_soc_tegra_audio_graph_card at24 snd_hda_codec ina3221 snd_soc_audio_graph_card snd_soc_simple_card_utils tegra_bpmp_thermal tegra_xudc snd_hda_core tegra_aconnect host1x fuse drm backlight ipv6
> [  228.243750] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S                 6.13.0-rc1-next-20241203 #30
> [  228.253412] Tainted: [S]=CPU_OUT_OF_SPEC
> [  228.257336] Hardware name: nvidia NVIDIA P2771-0000-500/NVIDIA P2771-0000-500, BIOS 2025.01-rc3-00040-g36352ae2e68e-dirty 01/01/2025
> [  228.269239] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  228.276205] pc : __arm_lpae_map+0x388/0x4e4
> [  228.280398] lr : __arm_lpae_map+0x120/0x4e4
> [  228.284587] sp : ffff8000800037f0
> [  228.287901] x29: ffff800080003800 x28: 0000000000000002 x27: 0000000000000001
> [  228.295050] x26: 0000000000000001 x25: 0000000111580000 x24: 0000000000001000
> [  228.302197] x23: 000000ffffc72000 x22: 0000000000000ec0 x21: 0000000000000003
> [  228.309342] x20: 0000000000000001 x19: ffff00008574b000 x18: 0000000000000001
> [  228.316486] x17: 0000000000000000 x16: 0000000000000001 x15: ffff800080003ad0
> [  228.323631] x14: ffff00008574d000 x13: 0000000000000000 x12: 0000000000000001
> [  228.330775] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000001000
> [  228.337921] x8 : ffff00008674c390 x7 : ffff00008674c000 x6 : 0000000000000003
> [  228.345066] x5 : 0000000000000003 x4 : 0000000000000001 x3 : 0000000000000002
> [  228.352209] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff00008574b000
> [  228.359356] Call trace:
> [  228.361807]  __arm_lpae_map+0x388/0x4e4 (P)
> [  228.366002]  __arm_lpae_map+0x120/0x4e4 (L)
> [  228.370198]  __arm_lpae_map+0x120/0x4e4
> [  228.374042]  __arm_lpae_map+0x120/0x4e4
> [  228.377886]  __arm_lpae_map+0x120/0x4e4
> [  228.381730]  arm_lpae_map_pages+0x108/0x250
> [  228.385922]  arm_smmu_map_pages+0x40/0x120
> [  228.390029]  __iommu_map+0xfc/0x1bc
> [  228.393525]  iommu_map+0x38/0xc0
> [  228.396759]  __iommu_dma_map+0xb4/0x1a4
> [  228.400604]  iommu_dma_map_page+0x14c/0x27c
> [  228.404795]  dma_map_page_attrs+0x1fc/0x280
> [  228.408987]  stmmac_tso_xmit+0x2d0/0xbac
> [  228.412920]  stmmac_xmit+0x230/0xd14
> [  228.416505]  dev_hard_start_xmit+0x94/0x11c
> [  228.420697]  sch_direct_xmit+0x8c/0x380
> [  228.424540]  __qdisc_run+0x11c/0x66c
> [  228.428121]  net_tx_action+0x168/0x228
> [  228.431875]  handle_softirqs+0x100/0x244
> [  228.435809]  __do_softirq+0x14/0x20
> [  228.439303]  ____do_softirq+0x10/0x20
> [  228.442972]  call_on_irq_stack+0x24/0x64
> [  228.446903]  do_softirq_own_stack+0x1c/0x40
> [  228.451091]  __irq_exit_rcu+0xd4/0x10c
> [  228.454847]  irq_exit_rcu+0x10/0x1c
> [  228.458343]  el1_interrupt+0x38/0x68
> [  228.461927]  el1h_64_irq_handler+0x18/0x24
> [  228.466032]  el1h_64_irq+0x6c/0x70
> [  228.469438]  default_idle_call+0x28/0x58 (P)
> [  228.473718]  default_idle_call+0x24/0x58 (L)
> [  228.477998]  do_idle+0x1fc/0x260
> [  228.481234]  cpu_startup_entry+0x34/0x3c
> [  228.485163]  rest_init+0xdc/0xe0
> [  228.488401]  console_on_rootfs+0x0/0x6c
> [  228.492250]  __primary_switched+0x88/0x90
> [  228.496270] ---[ end trace 0000000000000000 ]---
> [  228.500950] dwc-eth-dwmac 2490000.ethernet: Tx dma map failed
> --- >8 ---
> 
> This looks to be slightly different from what Jon was seeing. Looking at
> the WARN_ON() that triggers this, it seems like for some reason the page
> is getting mapped twice.
> 
> Not exactly sure why that would be happening, so adding Robin and Will,
> maybe they can shed some light on this from the ARM SMMU side.
> 
> Robin, Will, any idea who could be the culprit here? Is this a map/unmap
> imbalance or something else entirely?

If valid PTEs are getting left behind in the pagetable, that would 
indicate that a previous dma_unmap_page() was called with a size smaller 
than its original dma_map_page(). Throwing CONFIG_DMA_API_DEBUG at it 
should hopefully shed more light.

Cheers,
Robin.

> A lot of the context isn't present in this thread anymore, but here's a
> link to the top of the thread:
> 
> 	https://lore.kernel.org/netdev/d8112193-0386-4e14-b516-37c2d838171a@nvidia.com/T/
> 
> Thanks,
> Thierry
Thierry Reding Dec. 4, 2024, 3:58 p.m. UTC | #12
On Wed, Dec 04, 2024 at 02:06:00PM +0000, Robin Murphy wrote:
> On 2024-12-04 1:57 pm, Thierry Reding wrote:
> > On Tue, Dec 03, 2024 at 11:16:37AM +0800, Furong Xu wrote:
> > > On Mon, 2 Dec 2024 18:34:25 -0800, Jakub Kicinski <kuba@kernel.org> wrote:
> > > 
> > > > On Tue, 3 Dec 2024 10:03:31 +0800 Furong Xu wrote:
> > > > > I requested Jon to provide more info about "Tx DMA map failed" in previous
> > > > > reply, and he does not respond yet.
> > > > 
> > > > What does it mean to provide "more info" about a print statement from
> > > > the driver? Is there a Kconfig which he needs to set to get more info?
> > > > Perhaps you should provide a debug patch he can apply on his tree, that
> > > > will print info about (1) which buffer mapping failed (head or frags);
> > > > (2) what the physical address was of the buffer that couldn't be mapped.
> > > 
> > > A debug patch to print info about buffer makes no sense here.
> > > Both Tegra186 Jetson TX2(tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
> > > (tegra194-p2972-0000) enable IOMMU/SMMU for stmmac in its device-tree node,
> > > buffer info should be investigated with detailed IOMMU/SMMU debug info from
> > > drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c together.
> > > 
> > > I am not an expert in IOMMU, so I cannot help more.
> > > 
> > > Without the help from Jon, our only choice is revert as you said.
> > 
> > I was able to reproduce this locally and I get this splat:
> > 
> > --- >8 ---
> > [  228.179234] WARNING: CPU: 0 PID: 0 at drivers/iommu/io-pgtable-arm.c:346 __arm_lpae_map+0x388/0x4e4
> > [  228.188300] Modules linked in: snd_soc_tegra210_mixer snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra186_asrc snd_soc_tegra210_ope snd_soc_tegra210_adx snd_soc_tegra210_mvc snd_soc_tegra210_dmic snd_soc_tegra186_dspk snd_soc_tegra210_sfc snd_soc_tegra210_amx snd_soc_tegra210_i2s tegra_drm drm_dp_aux_bus cec drm_display_helper drm_client_lib tegra210_adma snd_soc_tegra210_ahub drm_kms_helper snd_hda_codec_hdmi snd_hda_tegra snd_soc_tegra_audio_graph_card at24 snd_hda_codec ina3221 snd_soc_audio_graph_card snd_soc_simple_card_utils tegra_bpmp_thermal tegra_xudc snd_hda_core tegra_aconnect host1x fuse drm backlight ipv6
> > [  228.243750] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S                 6.13.0-rc1-next-20241203 #30
> > [  228.253412] Tainted: [S]=CPU_OUT_OF_SPEC
> > [  228.257336] Hardware name: nvidia NVIDIA P2771-0000-500/NVIDIA P2771-0000-500, BIOS 2025.01-rc3-00040-g36352ae2e68e-dirty 01/01/2025
> > [  228.269239] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [  228.276205] pc : __arm_lpae_map+0x388/0x4e4
> > [  228.280398] lr : __arm_lpae_map+0x120/0x4e4
> > [  228.284587] sp : ffff8000800037f0
> > [  228.287901] x29: ffff800080003800 x28: 0000000000000002 x27: 0000000000000001
> > [  228.295050] x26: 0000000000000001 x25: 0000000111580000 x24: 0000000000001000
> > [  228.302197] x23: 000000ffffc72000 x22: 0000000000000ec0 x21: 0000000000000003
> > [  228.309342] x20: 0000000000000001 x19: ffff00008574b000 x18: 0000000000000001
> > [  228.316486] x17: 0000000000000000 x16: 0000000000000001 x15: ffff800080003ad0
> > [  228.323631] x14: ffff00008574d000 x13: 0000000000000000 x12: 0000000000000001
> > [  228.330775] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000001000
> > [  228.337921] x8 : ffff00008674c390 x7 : ffff00008674c000 x6 : 0000000000000003
> > [  228.345066] x5 : 0000000000000003 x4 : 0000000000000001 x3 : 0000000000000002
> > [  228.352209] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff00008574b000
> > [  228.359356] Call trace:
> > [  228.361807]  __arm_lpae_map+0x388/0x4e4 (P)
> > [  228.366002]  __arm_lpae_map+0x120/0x4e4 (L)
> > [  228.370198]  __arm_lpae_map+0x120/0x4e4
> > [  228.374042]  __arm_lpae_map+0x120/0x4e4
> > [  228.377886]  __arm_lpae_map+0x120/0x4e4
> > [  228.381730]  arm_lpae_map_pages+0x108/0x250
> > [  228.385922]  arm_smmu_map_pages+0x40/0x120
> > [  228.390029]  __iommu_map+0xfc/0x1bc
> > [  228.393525]  iommu_map+0x38/0xc0
> > [  228.396759]  __iommu_dma_map+0xb4/0x1a4
> > [  228.400604]  iommu_dma_map_page+0x14c/0x27c
> > [  228.404795]  dma_map_page_attrs+0x1fc/0x280
> > [  228.408987]  stmmac_tso_xmit+0x2d0/0xbac
> > [  228.412920]  stmmac_xmit+0x230/0xd14
> > [  228.416505]  dev_hard_start_xmit+0x94/0x11c
> > [  228.420697]  sch_direct_xmit+0x8c/0x380
> > [  228.424540]  __qdisc_run+0x11c/0x66c
> > [  228.428121]  net_tx_action+0x168/0x228
> > [  228.431875]  handle_softirqs+0x100/0x244
> > [  228.435809]  __do_softirq+0x14/0x20
> > [  228.439303]  ____do_softirq+0x10/0x20
> > [  228.442972]  call_on_irq_stack+0x24/0x64
> > [  228.446903]  do_softirq_own_stack+0x1c/0x40
> > [  228.451091]  __irq_exit_rcu+0xd4/0x10c
> > [  228.454847]  irq_exit_rcu+0x10/0x1c
> > [  228.458343]  el1_interrupt+0x38/0x68
> > [  228.461927]  el1h_64_irq_handler+0x18/0x24
> > [  228.466032]  el1h_64_irq+0x6c/0x70
> > [  228.469438]  default_idle_call+0x28/0x58 (P)
> > [  228.473718]  default_idle_call+0x24/0x58 (L)
> > [  228.477998]  do_idle+0x1fc/0x260
> > [  228.481234]  cpu_startup_entry+0x34/0x3c
> > [  228.485163]  rest_init+0xdc/0xe0
> > [  228.488401]  console_on_rootfs+0x0/0x6c
> > [  228.492250]  __primary_switched+0x88/0x90
> > [  228.496270] ---[ end trace 0000000000000000 ]---
> > [  228.500950] dwc-eth-dwmac 2490000.ethernet: Tx dma map failed
> > --- >8 ---
> > 
> > This looks to be slightly different from what Jon was seeing. Looking at
> > the WARN_ON() that triggers this, it seems like for some reason the page
> > is getting mapped twice.
> > 
> > Not exactly sure why that would be happening, so adding Robin and Will,
> > maybe they can shed some light on this from the ARM SMMU side.
> > 
> > Robin, Will, any idea who could be the culprit here? Is this a map/unmap
> > imbalance or something else entirely?
> 
> If valid PTEs are getting left behind in the pagetable, that would indicate
> that a previous dma_unmap_page() was called with a size smaller than its
> original dma_map_page(). Throwing CONFIG_DMA_API_DEBUG at it should
> hopefully shed more light.

Bull's-eye! DMA_API_DEBUG does flag this:

--- >8 ---
[   60.469121] DMA-API: dwc-eth-dwmac 2490000.ethernet: device driver tries to free DMA memory it has not allocated [device add ress=0x000000ffffcf65c0] [size=66 bytes]
1
[   60.486534] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:972 check_unmap+0x564/0x8f0
[   60.494493] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra210_ope snd_soc_tegra186_asrc snd_soc_tegra210_amx snd_soc_tegra210_mv
c snd_soc_tegra210_mixer snd_soc_tegra210_dmic snd_soc_tegra210_sfc snd_soc_tegra186_dspk snd_soc_tegra210_i2s snd_soc_tegra210_adx tegra_drm drm_dp_aux_bus ce
c drm_display_helper drm_client_lib drm_kms_helper tegra210_adma snd_soc_tegra210_ahub snd_hda_codec_hdmi snd_soc_tegra_audio_graph_card snd_hda_tegra snd_soc_
audio_graph_card snd_hda_codec snd_soc_simple_card_utils at24 snd_hda_core ina3221 tegra_aconnect tegra_xudc tegra_bpmp_thermal host1x fuse drm backlight ipv6
[   60.549857] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S                 6.13.0-rc1-next-20241203 #31
[   60.559504] Tainted: [S]=CPU_OUT_OF_SPEC
[   60.563423] Hardware name: nvidia NVIDIA P2771-0000-500/NVIDIA P2771-0000-500, BIOS 2025.01-rc3-00040-g36352ae2e68e-dirty 01/01/2025
[   60.575317] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   60.582273] pc : check_unmap+0x564/0x8f0
[   60.586197] lr : check_unmap+0x564/0x8f0
[   60.590117] sp : ffff800080003b50
[   60.593428] x29: ffff800080003b50 x28: ffff0000825309c0 x27: ffff0000825313c0
[   60.600562] x26: 00000000000001b9 x25: 0000000000000001 x24: ffff00008011b410
[   60.607696] x23: ffff800082059ec0 x22: 0000000000000000 x21: ffff8000825b25c8
[   60.614829] x20: ffff800080003bc0 x19: 000000ffffcf65c0 x18: 0000000000000006
[   60.621962] x17: 645b206465746163 x16: 0000000000000000 x15: 0720072007200720
[   60.629095] x14: ffff800082074960 x13: 0720072007200720 x12: 0720072007200720
[   60.636229] x11: ffff800082074960 x10: 0000000000000299 x9 : ffff8000820cc960
[   60.643362] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000820cc960
[   60.650496] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   60.657629] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800082064900
[   60.664763] Call trace:
[   60.667206]  check_unmap+0x564/0x8f0 (P)
[   60.671131]  check_unmap+0x564/0x8f0 (L)
[   60.675055]  debug_dma_unmap_page+0xac/0xc0
[   60.679239]  dma_unmap_page_attrs+0xf4/0x200
[   60.683508]  stmmac_tx_clean.constprop.0+0x1ac/0x7bc
[   60.688474]  stmmac_napi_poll_tx+0xdc/0x168
[   60.692657]  __napi_poll+0x38/0x180
[   60.696148]  net_rx_action+0x158/0x2c0
[   60.699897]  handle_softirqs+0x100/0x244
[   60.703821]  __do_softirq+0x14/0x20
[   60.707309]  ____do_softirq+0x10/0x20
[   60.710970]  call_on_irq_stack+0x24/0x64
[   60.714891]  do_softirq_own_stack+0x1c/0x40
[   60.719072]  __irq_exit_rcu+0xd4/0x10c
[   60.722821]  irq_exit_rcu+0x10/0x1c
[   60.726308]  el1_interrupt+0x38/0x68
[   60.729886]  el1h_64_irq_handler+0x18/0x24
[   60.733980]  el1h_64_irq+0x6c/0x70
[   60.737381]  default_idle_call+0x28/0x58 (P)
[   60.741652]  default_idle_call+0x24/0x58 (L)
[   60.745920]  do_idle+0x1fc/0x260
[   60.749149]  cpu_startup_entry+0x34/0x3c
[   60.753068]  rest_init+0xdc/0xe0
[   60.756296]  console_on_rootfs+0x0/0x6c
[   60.760135]  __primary_switched+0x88/0x90
[   60.764146] ---[ end trace 0000000000000000 ]---
--- >8 ---

This doesn't match the location from earlier, but at least there's
something afoot here that needs fixing. I suppose this could simply be
hiding any subsequent errors, so once this is fixed we might see other
similar issues.

Furong, I can look into this some more, but I'm not at all familiar with
this part of the driver, so I don't really know where this could be
originating from. Any pointers would be appreciated. Also, if you think
there's anything I should try, I do have this setup that I can test on
locally.

Thierry
Russell King (Oracle) Dec. 4, 2024, 4:39 p.m. UTC | #13
On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
> This doesn't match the location from earlier, but at least there's
> something afoot here that needs fixing. I suppose this could simply be
> hiding any subsequent errors, so once this is fixed we might see other
> similar issues.

Well, having a quick look at this, the first thing which stands out is:

In stmmac_tx_clean(), we have:

                if (likely(tx_q->tx_skbuff_dma[entry].buf &&
                           tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T
_XDP_TX)) {
                        if (tx_q->tx_skbuff_dma[entry].map_as_page)
                                dma_unmap_page(priv->device,
                                               tx_q->tx_skbuff_dma[entry].buf,
                                               tx_q->tx_skbuff_dma[entry].len,
                                               DMA_TO_DEVICE);
                        else
                                dma_unmap_single(priv->device,
                                                 tx_q->tx_skbuff_dma[entry].buf,
                                                 tx_q->tx_skbuff_dma[entry].len,
                                                 DMA_TO_DEVICE);
                        tx_q->tx_skbuff_dma[entry].buf = 0;
                        tx_q->tx_skbuff_dma[entry].len = 0;
                        tx_q->tx_skbuff_dma[entry].map_as_page = false;
                }

So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
DMA region.

Now if we look at stmmac_tso_xmit():

        des = dma_map_single(priv->device, skb->data, skb_headlen(skb),
                             DMA_TO_DEVICE);
        if (dma_mapping_error(priv->device, des))
                goto dma_map_err;

        if (priv->dma_cap.addr64 <= 32) {
...
        } else {
...
                des += proto_hdr_len;
...
	}

        tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
        tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
        tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
        tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;

This will result in stmmac_tx_clean() calling dma_unmap_single() using
"des" and "skb_headlen(skb)" as the buffer start and length.

One of the requirements of the DMA mapping API is that the DMA handle
returned by the map operation will be passed into the unmap function.
Not something that was offset. The length will also be the same.

We can clearly see above that there is a case where the DMA handle has
been offset by proto_hdr_len, and when this is so, the value that is
passed into the unmap operation no longer matches this requirement.

So, a question to the reporter - what is the value of
priv->dma_cap.addr64 in your failing case? You should see the value
in the "Using %d/%d bits DMA host/device width" kernel message.

Thanks.
Jon Hunter Dec. 4, 2024, 5:02 p.m. UTC | #14
Hi Russell,

On 04/12/2024 16:39, Russell King (Oracle) wrote:
> On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
>> This doesn't match the location from earlier, but at least there's
>> something afoot here that needs fixing. I suppose this could simply be
>> hiding any subsequent errors, so once this is fixed we might see other
>> similar issues.
> 
> Well, having a quick look at this, the first thing which stands out is:
> 
> In stmmac_tx_clean(), we have:
> 
>                  if (likely(tx_q->tx_skbuff_dma[entry].buf &&
>                             tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T
> _XDP_TX)) {
>                          if (tx_q->tx_skbuff_dma[entry].map_as_page)
>                                  dma_unmap_page(priv->device,
>                                                 tx_q->tx_skbuff_dma[entry].buf,
>                                                 tx_q->tx_skbuff_dma[entry].len,
>                                                 DMA_TO_DEVICE);
>                          else
>                                  dma_unmap_single(priv->device,
>                                                   tx_q->tx_skbuff_dma[entry].buf,
>                                                   tx_q->tx_skbuff_dma[entry].len,
>                                                   DMA_TO_DEVICE);
>                          tx_q->tx_skbuff_dma[entry].buf = 0;
>                          tx_q->tx_skbuff_dma[entry].len = 0;
>                          tx_q->tx_skbuff_dma[entry].map_as_page = false;
>                  }
> 
> So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
> DMA region.
> 
> Now if we look at stmmac_tso_xmit():
> 
>          des = dma_map_single(priv->device, skb->data, skb_headlen(skb),
>                               DMA_TO_DEVICE);
>          if (dma_mapping_error(priv->device, des))
>                  goto dma_map_err;
> 
>          if (priv->dma_cap.addr64 <= 32) {
> ...
>          } else {
> ...
>                  des += proto_hdr_len;
> ...
> 	}
> 
>          tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
>          tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
>          tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
>          tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> 
> This will result in stmmac_tx_clean() calling dma_unmap_single() using
> "des" and "skb_headlen(skb)" as the buffer start and length.
> 
> One of the requirements of the DMA mapping API is that the DMA handle
> returned by the map operation will be passed into the unmap function.
> Not something that was offset. The length will also be the same.
> 
> We can clearly see above that there is a case where the DMA handle has
> been offset by proto_hdr_len, and when this is so, the value that is
> passed into the unmap operation no longer matches this requirement.
> 
> So, a question to the reporter - what is the value of
> priv->dma_cap.addr64 in your failing case? You should see the value
> in the "Using %d/%d bits DMA host/device width" kernel message.

It is ...

  dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width

Thanks
Jon
Jon Hunter Dec. 4, 2024, 5:03 p.m. UTC | #15
On 03/12/2024 03:16, Furong Xu wrote:
> On Mon, 2 Dec 2024 18:34:25 -0800, Jakub Kicinski <kuba@kernel.org> wrote:
> 
>> On Tue, 3 Dec 2024 10:03:31 +0800 Furong Xu wrote:
>>> I requested Jon to provide more info about "Tx DMA map failed" in previous
>>> reply, and he does not respond yet.
>>
>> What does it mean to provide "more info" about a print statement from
>> the driver? Is there a Kconfig which he needs to set to get more info?
>> Perhaps you should provide a debug patch he can apply on his tree, that
>> will print info about (1) which buffer mapping failed (head or frags);
>> (2) what the physical address was of the buffer that couldn't be mapped.
> 
> A debug patch to print info about buffer makes no sense here.
> Both Tegra186 Jetson TX2(tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier
> (tegra194-p2972-0000) enable IOMMU/SMMU for stmmac in its device-tree node,
> buffer info should be investigated with detailed IOMMU/SMMU debug info from
> drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c together.
> 
> I am not an expert in IOMMU, so I cannot help more.
> 
> Without the help from Jon, our only choice is revert as you said.


Sorry Thierry and I were digging into this offline but yes I can provide 
any more info that is needed.

Jon
Russell King (Oracle) Dec. 4, 2024, 5:45 p.m. UTC | #16
On Wed, Dec 04, 2024 at 05:02:19PM +0000, Jon Hunter wrote:
> Hi Russell,
> 
> On 04/12/2024 16:39, Russell King (Oracle) wrote:
> > On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
> > > This doesn't match the location from earlier, but at least there's
> > > something afoot here that needs fixing. I suppose this could simply be
> > > hiding any subsequent errors, so once this is fixed we might see other
> > > similar issues.
> > 
> > Well, having a quick look at this, the first thing which stands out is:
> > 
> > In stmmac_tx_clean(), we have:
> > 
> >                  if (likely(tx_q->tx_skbuff_dma[entry].buf &&
> >                             tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T
> > _XDP_TX)) {
> >                          if (tx_q->tx_skbuff_dma[entry].map_as_page)
> >                                  dma_unmap_page(priv->device,
> >                                                 tx_q->tx_skbuff_dma[entry].buf,
> >                                                 tx_q->tx_skbuff_dma[entry].len,
> >                                                 DMA_TO_DEVICE);
> >                          else
> >                                  dma_unmap_single(priv->device,
> >                                                   tx_q->tx_skbuff_dma[entry].buf,
> >                                                   tx_q->tx_skbuff_dma[entry].len,
> >                                                   DMA_TO_DEVICE);
> >                          tx_q->tx_skbuff_dma[entry].buf = 0;
> >                          tx_q->tx_skbuff_dma[entry].len = 0;
> >                          tx_q->tx_skbuff_dma[entry].map_as_page = false;
> >                  }
> > 
> > So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
> > DMA region.
> > 
> > Now if we look at stmmac_tso_xmit():
> > 
> >          des = dma_map_single(priv->device, skb->data, skb_headlen(skb),
> >                               DMA_TO_DEVICE);
> >          if (dma_mapping_error(priv->device, des))
> >                  goto dma_map_err;
> > 
> >          if (priv->dma_cap.addr64 <= 32) {
> > ...
> >          } else {
> > ...
> >                  des += proto_hdr_len;
> > ...
> > 	}
> > 
> >          tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
> >          tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
> >          tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
> >          tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> > 
> > This will result in stmmac_tx_clean() calling dma_unmap_single() using
> > "des" and "skb_headlen(skb)" as the buffer start and length.
> > 
> > One of the requirements of the DMA mapping API is that the DMA handle
> > returned by the map operation will be passed into the unmap function.
> > Not something that was offset. The length will also be the same.
> > 
> > We can clearly see above that there is a case where the DMA handle has
> > been offset by proto_hdr_len, and when this is so, the value that is
> > passed into the unmap operation no longer matches this requirement.
> > 
> > So, a question to the reporter - what is the value of
> > priv->dma_cap.addr64 in your failing case? You should see the value
> > in the "Using %d/%d bits DMA host/device width" kernel message.
> 
> It is ...
> 
>  dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width

So yes, "des" is being offset, which will upset the unmap operation.
Please try the following patch, thanks:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 9b262cdad60b..c81ea8cdfe6e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4192,8 +4192,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct stmmac_txq_stats *txq_stats;
 	struct stmmac_tx_queue *tx_q;
 	u32 pay_len, mss, queue;
+	dma_addr_t tso_des, des;
 	u8 proto_hdr_len, hdr;
-	dma_addr_t des;
 	bool set_ic;
 	int i;
 
@@ -4289,14 +4289,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		/* If needed take extra descriptors to fill the remaining payload */
 		tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE;
+		tso_des = des;
 	} else {
 		stmmac_set_desc_addr(priv, first, des);
 		tmp_pay_len = pay_len;
-		des += proto_hdr_len;
+		tso_des = des + proto_hdr_len;
 		pay_len = 0;
 	}
 
-	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
+	stmmac_tso_allocator(priv, tso_des, tmp_pay_len, (nfrags == 0), queue);
 
 	/* In case two or more DMA transmit descriptors are allocated for this
 	 * non-paged SKB data, the DMA buffer address should be saved to
Thierry Reding Dec. 4, 2024, 6:18 p.m. UTC | #17
On Wed, Dec 04, 2024 at 05:45:43PM +0000, Russell King (Oracle) wrote:
> On Wed, Dec 04, 2024 at 05:02:19PM +0000, Jon Hunter wrote:
> > Hi Russell,
> > 
> > On 04/12/2024 16:39, Russell King (Oracle) wrote:
> > > On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
> > > > This doesn't match the location from earlier, but at least there's
> > > > something afoot here that needs fixing. I suppose this could simply be
> > > > hiding any subsequent errors, so once this is fixed we might see other
> > > > similar issues.
> > > 
> > > Well, having a quick look at this, the first thing which stands out is:
> > > 
> > > In stmmac_tx_clean(), we have:
> > > 
> > >                  if (likely(tx_q->tx_skbuff_dma[entry].buf &&
> > >                             tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T
> > > _XDP_TX)) {
> > >                          if (tx_q->tx_skbuff_dma[entry].map_as_page)
> > >                                  dma_unmap_page(priv->device,
> > >                                                 tx_q->tx_skbuff_dma[entry].buf,
> > >                                                 tx_q->tx_skbuff_dma[entry].len,
> > >                                                 DMA_TO_DEVICE);
> > >                          else
> > >                                  dma_unmap_single(priv->device,
> > >                                                   tx_q->tx_skbuff_dma[entry].buf,
> > >                                                   tx_q->tx_skbuff_dma[entry].len,
> > >                                                   DMA_TO_DEVICE);
> > >                          tx_q->tx_skbuff_dma[entry].buf = 0;
> > >                          tx_q->tx_skbuff_dma[entry].len = 0;
> > >                          tx_q->tx_skbuff_dma[entry].map_as_page = false;
> > >                  }
> > > 
> > > So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
> > > DMA region.
> > > 
> > > Now if we look at stmmac_tso_xmit():
> > > 
> > >          des = dma_map_single(priv->device, skb->data, skb_headlen(skb),
> > >                               DMA_TO_DEVICE);
> > >          if (dma_mapping_error(priv->device, des))
> > >                  goto dma_map_err;
> > > 
> > >          if (priv->dma_cap.addr64 <= 32) {
> > > ...
> > >          } else {
> > > ...
> > >                  des += proto_hdr_len;
> > > ...
> > > 	}
> > > 
> > >          tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
> > >          tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
> > >          tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
> > >          tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> > > 
> > > This will result in stmmac_tx_clean() calling dma_unmap_single() using
> > > "des" and "skb_headlen(skb)" as the buffer start and length.
> > > 
> > > One of the requirements of the DMA mapping API is that the DMA handle
> > > returned by the map operation will be passed into the unmap function.
> > > Not something that was offset. The length will also be the same.
> > > 
> > > We can clearly see above that there is a case where the DMA handle has
> > > been offset by proto_hdr_len, and when this is so, the value that is
> > > passed into the unmap operation no longer matches this requirement.
> > > 
> > > So, a question to the reporter - what is the value of
> > > priv->dma_cap.addr64 in your failing case? You should see the value
> > > in the "Using %d/%d bits DMA host/device width" kernel message.
> > 
> > It is ...
> > 
> >  dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width
> 
> So yes, "des" is being offset, which will upset the unmap operation.
> Please try the following patch, thanks:
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 9b262cdad60b..c81ea8cdfe6e 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4192,8 +4192,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct stmmac_txq_stats *txq_stats;
>  	struct stmmac_tx_queue *tx_q;
>  	u32 pay_len, mss, queue;
> +	dma_addr_t tso_des, des;
>  	u8 proto_hdr_len, hdr;
> -	dma_addr_t des;
>  	bool set_ic;
>  	int i;
>  
> @@ -4289,14 +4289,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  		/* If needed take extra descriptors to fill the remaining payload */
>  		tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE;
> +		tso_des = des;
>  	} else {
>  		stmmac_set_desc_addr(priv, first, des);
>  		tmp_pay_len = pay_len;
> -		des += proto_hdr_len;
> +		tso_des = des + proto_hdr_len;
>  		pay_len = 0;
>  	}
>  
> -	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
> +	stmmac_tso_allocator(priv, tso_des, tmp_pay_len, (nfrags == 0), queue);
>  
>  	/* In case two or more DMA transmit descriptors are allocated for this
>  	 * non-paged SKB data, the DMA buffer address should be saved to

I see, that makes sense. Looks like this has been broken for a few years
(since commit 34c15202896d ("net: stmmac: Fix the problem of tso_xmit"))
and Furong's patch ended up exposing it.

Anyway, this seems to fix it for me. I can usually trigger the issue
within one or two iperf runs, with your patch I haven't seen it break
after a dozen or so runs.

It may be good to have Jon's test results as well, but looks good so
far.

Thanks!
Thierry
Furong Xu Dec. 5, 2024, 9:34 a.m. UTC | #18
Hi Russell,

On Wed, 4 Dec 2024 17:45:43 +0000, "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> So yes, "des" is being offset, which will upset the unmap operation.
> Please try the following patch, thanks:
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 9b262cdad60b..c81ea8cdfe6e 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4192,8 +4192,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct stmmac_txq_stats *txq_stats;
>  	struct stmmac_tx_queue *tx_q;
>  	u32 pay_len, mss, queue;
> +	dma_addr_t tso_des, des;
>  	u8 proto_hdr_len, hdr;
> -	dma_addr_t des;
>  	bool set_ic;
>  	int i;
>  
> @@ -4289,14 +4289,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  		/* If needed take extra descriptors to fill the remaining payload */
>  		tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE;
> +		tso_des = des;
>  	} else {
>  		stmmac_set_desc_addr(priv, first, des);
>  		tmp_pay_len = pay_len;
> -		des += proto_hdr_len;
> +		tso_des = des + proto_hdr_len;
>  		pay_len = 0;
>  	}
>  
> -	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
> +	stmmac_tso_allocator(priv, tso_des, tmp_pay_len, (nfrags == 0), queue);
>  
>  	/* In case two or more DMA transmit descriptors are allocated for this
>  	 * non-paged SKB data, the DMA buffer address should be saved to
> 

Much appreciated for your comments and suggestions, I sent a new patch to fix
this issue. Please let me know if you have any new advice.
https://lore.kernel.org/netdev/20241205091830.3719609-1-0x1207@gmail.com/

Thanks,
Furong
Jon Hunter Dec. 5, 2024, 10:57 a.m. UTC | #19
On 04/12/2024 18:18, Thierry Reding wrote:
> On Wed, Dec 04, 2024 at 05:45:43PM +0000, Russell King (Oracle) wrote:
>> On Wed, Dec 04, 2024 at 05:02:19PM +0000, Jon Hunter wrote:
>>> Hi Russell,
>>>
>>> On 04/12/2024 16:39, Russell King (Oracle) wrote:
>>>> On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
>>>>> This doesn't match the location from earlier, but at least there's
>>>>> something afoot here that needs fixing. I suppose this could simply be
>>>>> hiding any subsequent errors, so once this is fixed we might see other
>>>>> similar issues.
>>>>
>>>> Well, having a quick look at this, the first thing which stands out is:
>>>>
>>>> In stmmac_tx_clean(), we have:
>>>>
>>>>                   if (likely(tx_q->tx_skbuff_dma[entry].buf &&
>>>>                              tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T
>>>> _XDP_TX)) {
>>>>                           if (tx_q->tx_skbuff_dma[entry].map_as_page)
>>>>                                   dma_unmap_page(priv->device,
>>>>                                                  tx_q->tx_skbuff_dma[entry].buf,
>>>>                                                  tx_q->tx_skbuff_dma[entry].len,
>>>>                                                  DMA_TO_DEVICE);
>>>>                           else
>>>>                                   dma_unmap_single(priv->device,
>>>>                                                    tx_q->tx_skbuff_dma[entry].buf,
>>>>                                                    tx_q->tx_skbuff_dma[entry].len,
>>>>                                                    DMA_TO_DEVICE);
>>>>                           tx_q->tx_skbuff_dma[entry].buf = 0;
>>>>                           tx_q->tx_skbuff_dma[entry].len = 0;
>>>>                           tx_q->tx_skbuff_dma[entry].map_as_page = false;
>>>>                   }
>>>>
>>>> So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
>>>> DMA region.
>>>>
>>>> Now if we look at stmmac_tso_xmit():
>>>>
>>>>           des = dma_map_single(priv->device, skb->data, skb_headlen(skb),
>>>>                                DMA_TO_DEVICE);
>>>>           if (dma_mapping_error(priv->device, des))
>>>>                   goto dma_map_err;
>>>>
>>>>           if (priv->dma_cap.addr64 <= 32) {
>>>> ...
>>>>           } else {
>>>> ...
>>>>                   des += proto_hdr_len;
>>>> ...
>>>> 	}
>>>>
>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
>>>>
>>>> This will result in stmmac_tx_clean() calling dma_unmap_single() using
>>>> "des" and "skb_headlen(skb)" as the buffer start and length.
>>>>
>>>> One of the requirements of the DMA mapping API is that the DMA handle
>>>> returned by the map operation will be passed into the unmap function.
>>>> Not something that was offset. The length will also be the same.
>>>>
>>>> We can clearly see above that there is a case where the DMA handle has
>>>> been offset by proto_hdr_len, and when this is so, the value that is
>>>> passed into the unmap operation no longer matches this requirement.
>>>>
>>>> So, a question to the reporter - what is the value of
>>>> priv->dma_cap.addr64 in your failing case? You should see the value
>>>> in the "Using %d/%d bits DMA host/device width" kernel message.
>>>
>>> It is ...
>>>
>>>   dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width
>>
>> So yes, "des" is being offset, which will upset the unmap operation.
>> Please try the following patch, thanks:
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 9b262cdad60b..c81ea8cdfe6e 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -4192,8 +4192,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>>   	struct stmmac_txq_stats *txq_stats;
>>   	struct stmmac_tx_queue *tx_q;
>>   	u32 pay_len, mss, queue;
>> +	dma_addr_t tso_des, des;
>>   	u8 proto_hdr_len, hdr;
>> -	dma_addr_t des;
>>   	bool set_ic;
>>   	int i;
>>   
>> @@ -4289,14 +4289,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>>   
>>   		/* If needed take extra descriptors to fill the remaining payload */
>>   		tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE;
>> +		tso_des = des;
>>   	} else {
>>   		stmmac_set_desc_addr(priv, first, des);
>>   		tmp_pay_len = pay_len;
>> -		des += proto_hdr_len;
>> +		tso_des = des + proto_hdr_len;
>>   		pay_len = 0;
>>   	}
>>   
>> -	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
>> +	stmmac_tso_allocator(priv, tso_des, tmp_pay_len, (nfrags == 0), queue);
>>   
>>   	/* In case two or more DMA transmit descriptors are allocated for this
>>   	 * non-paged SKB data, the DMA buffer address should be saved to
> 
> I see, that makes sense. Looks like this has been broken for a few years
> (since commit 34c15202896d ("net: stmmac: Fix the problem of tso_xmit"))
> and Furong's patch ended up exposing it.
> 
> Anyway, this seems to fix it for me. I can usually trigger the issue
> within one or two iperf runs, with your patch I haven't seen it break
> after a dozen or so runs.
> 
> It may be good to have Jon's test results as well, but looks good so
> far.


I have been running tests on my side and so far so good too. I have not 
seen any more mapping failure cases.

Russell, if you are planning to send a fix for this, please add my ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!
Jon
Jon Hunter Dec. 5, 2024, 11:02 a.m. UTC | #20
On 05/12/2024 10:57, Jon Hunter wrote:
> 
> On 04/12/2024 18:18, Thierry Reding wrote:
>> On Wed, Dec 04, 2024 at 05:45:43PM +0000, Russell King (Oracle) wrote:
>>> On Wed, Dec 04, 2024 at 05:02:19PM +0000, Jon Hunter wrote:
>>>> Hi Russell,
>>>>
>>>> On 04/12/2024 16:39, Russell King (Oracle) wrote:
>>>>> On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
>>>>>> This doesn't match the location from earlier, but at least there's
>>>>>> something afoot here that needs fixing. I suppose this could 
>>>>>> simply be
>>>>>> hiding any subsequent errors, so once this is fixed we might see 
>>>>>> other
>>>>>> similar issues.
>>>>>
>>>>> Well, having a quick look at this, the first thing which stands out 
>>>>> is:
>>>>>
>>>>> In stmmac_tx_clean(), we have:
>>>>>
>>>>>                   if (likely(tx_q->tx_skbuff_dma[entry].buf &&
>>>>>                              tx_q->tx_skbuff_dma[entry].buf_type != 
>>>>> STMMAC_TXBUF_T
>>>>> _XDP_TX)) {
>>>>>                           if (tx_q->tx_skbuff_dma[entry].map_as_page)
>>>>>                                   dma_unmap_page(priv->device,
>>>>>                                                  tx_q- 
>>>>> >tx_skbuff_dma[entry].buf,
>>>>>                                                  tx_q- 
>>>>> >tx_skbuff_dma[entry].len,
>>>>>                                                  DMA_TO_DEVICE);
>>>>>                           else
>>>>>                                   dma_unmap_single(priv->device,
>>>>>                                                    tx_q- 
>>>>> >tx_skbuff_dma[entry].buf,
>>>>>                                                    tx_q- 
>>>>> >tx_skbuff_dma[entry].len,
>>>>>                                                    DMA_TO_DEVICE);
>>>>>                           tx_q->tx_skbuff_dma[entry].buf = 0;
>>>>>                           tx_q->tx_skbuff_dma[entry].len = 0;
>>>>>                           tx_q->tx_skbuff_dma[entry].map_as_page = 
>>>>> false;
>>>>>                   }
>>>>>
>>>>> So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
>>>>> DMA region.
>>>>>
>>>>> Now if we look at stmmac_tso_xmit():
>>>>>
>>>>>           des = dma_map_single(priv->device, skb->data, 
>>>>> skb_headlen(skb),
>>>>>                                DMA_TO_DEVICE);
>>>>>           if (dma_mapping_error(priv->device, des))
>>>>>                   goto dma_map_err;
>>>>>
>>>>>           if (priv->dma_cap.addr64 <= 32) {
>>>>> ...
>>>>>           } else {
>>>>> ...
>>>>>                   des += proto_hdr_len;
>>>>> ...
>>>>>     }
>>>>>
>>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
>>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
>>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
>>>>>           tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = 
>>>>> STMMAC_TXBUF_T_SKB;
>>>>>
>>>>> This will result in stmmac_tx_clean() calling dma_unmap_single() using
>>>>> "des" and "skb_headlen(skb)" as the buffer start and length.
>>>>>
>>>>> One of the requirements of the DMA mapping API is that the DMA handle
>>>>> returned by the map operation will be passed into the unmap function.
>>>>> Not something that was offset. The length will also be the same.
>>>>>
>>>>> We can clearly see above that there is a case where the DMA handle has
>>>>> been offset by proto_hdr_len, and when this is so, the value that is
>>>>> passed into the unmap operation no longer matches this requirement.
>>>>>
>>>>> So, a question to the reporter - what is the value of
>>>>> priv->dma_cap.addr64 in your failing case? You should see the value
>>>>> in the "Using %d/%d bits DMA host/device width" kernel message.
>>>>
>>>> It is ...
>>>>
>>>>   dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device 
>>>> width
>>>
>>> So yes, "des" is being offset, which will upset the unmap operation.
>>> Please try the following patch, thanks:
>>>
>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/ 
>>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>>> index 9b262cdad60b..c81ea8cdfe6e 100644
>>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>>> @@ -4192,8 +4192,8 @@ static netdev_tx_t stmmac_tso_xmit(struct 
>>> sk_buff *skb, struct net_device *dev)
>>>       struct stmmac_txq_stats *txq_stats;
>>>       struct stmmac_tx_queue *tx_q;
>>>       u32 pay_len, mss, queue;
>>> +    dma_addr_t tso_des, des;
>>>       u8 proto_hdr_len, hdr;
>>> -    dma_addr_t des;
>>>       bool set_ic;
>>>       int i;
>>> @@ -4289,14 +4289,15 @@ static netdev_tx_t stmmac_tso_xmit(struct 
>>> sk_buff *skb, struct net_device *dev)
>>>           /* If needed take extra descriptors to fill the remaining 
>>> payload */
>>>           tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE;
>>> +        tso_des = des;
>>>       } else {
>>>           stmmac_set_desc_addr(priv, first, des);
>>>           tmp_pay_len = pay_len;
>>> -        des += proto_hdr_len;
>>> +        tso_des = des + proto_hdr_len;
>>>           pay_len = 0;
>>>       }
>>> -    stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
>>> +    stmmac_tso_allocator(priv, tso_des, tmp_pay_len, (nfrags == 0), 
>>> queue);
>>>       /* In case two or more DMA transmit descriptors are allocated 
>>> for this
>>>        * non-paged SKB data, the DMA buffer address should be saved to
>>
>> I see, that makes sense. Looks like this has been broken for a few years
>> (since commit 34c15202896d ("net: stmmac: Fix the problem of tso_xmit"))
>> and Furong's patch ended up exposing it.
>>
>> Anyway, this seems to fix it for me. I can usually trigger the issue
>> within one or two iperf runs, with your patch I haven't seen it break
>> after a dozen or so runs.
>>
>> It may be good to have Jon's test results as well, but looks good so
>> far.
> 
> 
> I have been running tests on my side and so far so good too. I have not 
> seen any more mapping failure cases.
> 
> Russell, if you are planning to send a fix for this, please add my ...
> 
> Tested-by: Jon Hunter <jonathanh@nvidia.com>

Nevermind I see Furong already sent a fix.

Jon
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d3895d7eecfc..208dbc68aaf9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4304,11 +4304,6 @@  static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (dma_mapping_error(priv->device, des))
 		goto dma_map_err;
 
-	tx_q->tx_skbuff_dma[first_entry].buf = des;
-	tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
-	tx_q->tx_skbuff_dma[first_entry].map_as_page = false;
-	tx_q->tx_skbuff_dma[first_entry].buf_type = STMMAC_TXBUF_T_SKB;
-
 	if (priv->dma_cap.addr64 <= 32) {
 		first->des0 = cpu_to_le32(des);
 
@@ -4327,6 +4322,23 @@  static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
 
+	/* In case two or more DMA transmit descriptors are allocated for this
+	 * non-paged SKB data, the DMA buffer address should be saved to
+	 * tx_q->tx_skbuff_dma[].buf corresponding to the last descriptor,
+	 * and leave the other tx_q->tx_skbuff_dma[].buf as NULL to guarantee
+	 * that stmmac_tx_clean() does not unmap the entire DMA buffer too early
+	 * since the tail areas of the DMA buffer can be accessed by DMA engine
+	 * sooner or later.
+	 * By saving the DMA buffer address to tx_q->tx_skbuff_dma[].buf
+	 * corresponding to the last descriptor, stmmac_tx_clean() will unmap
+	 * this DMA buffer right after the DMA engine completely finishes the
+	 * full buffer transmission.
+	 */
+	tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
+	tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
+	tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
+	tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
+
 	/* Prepare fragments */
 	for (i = 0; i < nfrags; i++) {
 		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];