Message ID | 20241021061023.2162701-1-0x1207@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [net,v1] net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data | expand |
On 2024-10-21 at 11:40:23, Furong Xu (0x1207@gmail.com) wrote: > In case the non-paged data of a SKB carries protocol header and protocol > payload to be transmitted on a certain platform that the DMA AXI address > width is configured to 40-bit/48-bit, or the size of the non-paged data > is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI > address width is configured to 32-bit, then this SKB requires at least > two DMA transmit descriptors to serve it. > > For example, three descriptors are allocated to split one DMA buffer > mapped from one piece of non-paged data: > dma_desc[N + 0], > dma_desc[N + 1], > dma_desc[N + 2]. > Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold > extra information to be reused in stmmac_tx_clean(): > tx_q->tx_skbuff_dma[N + 0], > tx_q->tx_skbuff_dma[N + 1], > tx_q->tx_skbuff_dma[N + 2]. > Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer > address returned by DMA mapping call. stmmac_tx_clean() will try to > unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf > is a valid buffer address. > > The expected behavior that saves DMA buffer address of this non-paged > data to tx_q->tx_skbuff_dma[entry].buf is: > tx_q->tx_skbuff_dma[N + 0].buf = NULL; > tx_q->tx_skbuff_dma[N + 1].buf = NULL; > tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single(); > Unfortunately, the current code misbehaves like this: > tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single(); > tx_q->tx_skbuff_dma[N + 1].buf = NULL; > tx_q->tx_skbuff_dma[N + 2].buf = NULL; > > On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the > DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address > obviously, then the DMA buffer will be unmapped immediately. > There may be a rare case that the DMA engine does not finish the > pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go > horribly wrong, DMA is going to access a unmapped/unreferenced memory > region, corrupted data will be transmited or iommu fault will be > triggered :( > > In contrast, the for-loop that maps SKB fragments behaves perfectly > as expected, and that is how the driver should do for both non-paged > data and paged frags actually. > > This patch corrects DMA map/unmap sequences by fixing the array index > for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address. > > Tested and verified on DWXGMAC CORE 3.20a > > Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> > Fixes: f748be531d70 ("stmmac: support new GMAC4") > Signed-off-by: Furong Xu <0x1207@gmail.com> > --- > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++----- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index d3895d7eecfc..208dbc68aaf9 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -4304,11 +4304,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) > if (dma_mapping_error(priv->device, des)) > goto dma_map_err; > > - tx_q->tx_skbuff_dma[first_entry].buf = des; > - tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb); > - tx_q->tx_skbuff_dma[first_entry].map_as_page = false; > - tx_q->tx_skbuff_dma[first_entry].buf_type = STMMAC_TXBUF_T_SKB; > - > if (priv->dma_cap.addr64 <= 32) { > first->des0 = cpu_to_le32(des); > > @@ -4327,6 +4322,23 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) > > stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); > > + /* In case two or more DMA transmit descriptors are allocated for this > + * non-paged SKB data, the DMA buffer address should be saved to > + * tx_q->tx_skbuff_dma[].buf corresponding to the last descriptor, > + * and leave the other tx_q->tx_skbuff_dma[].buf as NULL to guarantee > + * that stmmac_tx_clean() does not unmap the entire DMA buffer too early > + * since the tail areas of the DMA buffer can be accessed by DMA engine > + * sooner or later. > + * By saving the DMA buffer address to tx_q->tx_skbuff_dma[].buf > + * corresponding to the last descriptor, stmmac_tx_clean() will unmap > + * this DMA buffer right after the DMA engine completely finishes the > + * full buffer transmission. > + */ > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb); > + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; > + > /* Prepare fragments */ > for (i = 0; i < nfrags; i++) { > const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; > -- Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
On Mon, Oct 21, 2024 at 02:10:23PM +0800, Furong Xu wrote: > In case the non-paged data of a SKB carries protocol header and protocol > payload to be transmitted on a certain platform that the DMA AXI address > width is configured to 40-bit/48-bit, or the size of the non-paged data > is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI > address width is configured to 32-bit, then this SKB requires at least > two DMA transmit descriptors to serve it. > > For example, three descriptors are allocated to split one DMA buffer > mapped from one piece of non-paged data: > dma_desc[N + 0], > dma_desc[N + 1], > dma_desc[N + 2]. > Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold > extra information to be reused in stmmac_tx_clean(): > tx_q->tx_skbuff_dma[N + 0], > tx_q->tx_skbuff_dma[N + 1], > tx_q->tx_skbuff_dma[N + 2]. > Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer > address returned by DMA mapping call. stmmac_tx_clean() will try to > unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf > is a valid buffer address. > > The expected behavior that saves DMA buffer address of this non-paged > data to tx_q->tx_skbuff_dma[entry].buf is: > tx_q->tx_skbuff_dma[N + 0].buf = NULL; > tx_q->tx_skbuff_dma[N + 1].buf = NULL; > tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single(); > Unfortunately, the current code misbehaves like this: > tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single(); > tx_q->tx_skbuff_dma[N + 1].buf = NULL; > tx_q->tx_skbuff_dma[N + 2].buf = NULL; > > On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the > DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address > obviously, then the DMA buffer will be unmapped immediately. > There may be a rare case that the DMA engine does not finish the > pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go > horribly wrong, DMA is going to access a unmapped/unreferenced memory > region, corrupted data will be transmited or iommu fault will be > triggered :( > > In contrast, the for-loop that maps SKB fragments behaves perfectly > as expected, and that is how the driver should do for both non-paged > data and paged frags actually. > > This patch corrects DMA map/unmap sequences by fixing the array index > for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address. > > Tested and verified on DWXGMAC CORE 3.20a > > Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> > Fixes: f748be531d70 ("stmmac: support new GMAC4") > Signed-off-by: Furong Xu <0x1207@gmail.com> Thanks for the very thorough explanation, much appreciated. Reviewed-by: Simon Horman <horms@kernel.org>
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Mon, 21 Oct 2024 14:10:23 +0800 you wrote: > In case the non-paged data of a SKB carries protocol header and protocol > payload to be transmitted on a certain platform that the DMA AXI address > width is configured to 40-bit/48-bit, or the size of the non-paged data > is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI > address width is configured to 32-bit, then this SKB requires at least > two DMA transmit descriptors to serve it. > > [...] Here is the summary with links: - [net,v1] net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data https://git.kernel.org/netdev/net/c/66600fac7a98 You are awesome, thank you!
Hi Furong, On 21/10/2024 07:10, Furong Xu wrote: > In case the non-paged data of a SKB carries protocol header and protocol > payload to be transmitted on a certain platform that the DMA AXI address > width is configured to 40-bit/48-bit, or the size of the non-paged data > is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI > address width is configured to 32-bit, then this SKB requires at least > two DMA transmit descriptors to serve it. > > For example, three descriptors are allocated to split one DMA buffer > mapped from one piece of non-paged data: > dma_desc[N + 0], > dma_desc[N + 1], > dma_desc[N + 2]. > Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold > extra information to be reused in stmmac_tx_clean(): > tx_q->tx_skbuff_dma[N + 0], > tx_q->tx_skbuff_dma[N + 1], > tx_q->tx_skbuff_dma[N + 2]. > Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer > address returned by DMA mapping call. stmmac_tx_clean() will try to > unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf > is a valid buffer address. > > The expected behavior that saves DMA buffer address of this non-paged > data to tx_q->tx_skbuff_dma[entry].buf is: > tx_q->tx_skbuff_dma[N + 0].buf = NULL; > tx_q->tx_skbuff_dma[N + 1].buf = NULL; > tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single(); > Unfortunately, the current code misbehaves like this: > tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single(); > tx_q->tx_skbuff_dma[N + 1].buf = NULL; > tx_q->tx_skbuff_dma[N + 2].buf = NULL; > > On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the > DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address > obviously, then the DMA buffer will be unmapped immediately. > There may be a rare case that the DMA engine does not finish the > pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go > horribly wrong, DMA is going to access a unmapped/unreferenced memory > region, corrupted data will be transmited or iommu fault will be > triggered :( > > In contrast, the for-loop that maps SKB fragments behaves perfectly > as expected, and that is how the driver should do for both non-paged > data and paged frags actually. > > This patch corrects DMA map/unmap sequences by fixing the array index > for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address. > > Tested and verified on DWXGMAC CORE 3.20a > > Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> > Fixes: f748be531d70 ("stmmac: support new GMAC4") > Signed-off-by: Furong Xu <0x1207@gmail.com> > --- > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++----- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index d3895d7eecfc..208dbc68aaf9 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -4304,11 +4304,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) > if (dma_mapping_error(priv->device, des)) > goto dma_map_err; > > - tx_q->tx_skbuff_dma[first_entry].buf = des; > - tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb); > - tx_q->tx_skbuff_dma[first_entry].map_as_page = false; > - tx_q->tx_skbuff_dma[first_entry].buf_type = STMMAC_TXBUF_T_SKB; > - > if (priv->dma_cap.addr64 <= 32) { > first->des0 = cpu_to_le32(des); > > @@ -4327,6 +4322,23 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) > > stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); > > + /* In case two or more DMA transmit descriptors are allocated for this > + * non-paged SKB data, the DMA buffer address should be saved to > + * tx_q->tx_skbuff_dma[].buf corresponding to the last descriptor, > + * and leave the other tx_q->tx_skbuff_dma[].buf as NULL to guarantee > + * that stmmac_tx_clean() does not unmap the entire DMA buffer too early > + * since the tail areas of the DMA buffer can be accessed by DMA engine > + * sooner or later. > + * By saving the DMA buffer address to tx_q->tx_skbuff_dma[].buf > + * corresponding to the last descriptor, stmmac_tx_clean() will unmap > + * this DMA buffer right after the DMA engine completely finishes the > + * full buffer transmission. > + */ > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb); > + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; > + > /* Prepare fragments */ > for (i = 0; i < nfrags; i++) { > const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; I have noticed a lot of intermittent failures on a couple of our boards starting with Linux v6.12. I have finally bisected the issue to this change and reverting this change fixes the problem. The boards where I am seeing this issue on are our Tegra186 Jetson TX2 (tegra186-p2771-0000) and Tegra194 Jetson AGX Xavier (tegra194-p2972-0000). Tegra184 has: dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x41 Tegra194 has: dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x50 Otherwise all the other propreties printed on boot are the same for both ... dwc-eth-dwmac 2490000.ethernet: DWMAC4/5 dwc-eth-dwmac 2490000.ethernet: DMA HW capability register supported dwc-eth-dwmac 2490000.ethernet: RX Checksum Offload Engine supported dwc-eth-dwmac 2490000.ethernet: TX Checksum insertion supported dwc-eth-dwmac 2490000.ethernet: Wake-Up On Lan supported dwc-eth-dwmac 2490000.ethernet: TSO supported dwc-eth-dwmac 2490000.ethernet: Enable RX Mitigation via HW Watchdog Timer dwc-eth-dwmac 2490000.ethernet: Enabled L3L4 Flow TC (entries=8) dwc-eth-dwmac 2490000.ethernet: Enabled RFS Flow TC (entries=10) dwc-eth-dwmac 2490000.ethernet: TSO feature enabled dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width Looking at the console logs, when the problem occurs I see the following prints ... [ 245.571688] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed [ 245.575349] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed I also caught this crash ... [ 245.576690] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 245.576715] Mem abort info: [ 245.577009] ESR = 0x0000000096000004 [ 245.577040] EC = 0x25: DABT (current EL), IL = 32 bits [ 245.577142] SET = 0, FnV = 0 [ 245.577355] EA = 0, S1PTW = 0 [ 245.577439] FSC = 0x04: level 0 translation fault [ 245.577557] Data abort info: [ 245.577628] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 245.577753] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 245.577878] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 245.578018] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106300000 [ 245.578168] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [ 245.578390] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 245.578528] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra_pcm tegra_drm snd_soc_tegra186_asrc snd_soc_tegra210_mixer snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic drm_dp_aux_bus snd_soc_tegra210_adx snd_soc_tegra210_amx cec snd_soc_tegra210_sfc drm_display_helper snd_soc_tegra210_i2s drm_kms_helper snd_soc_tegra_audio_graph_card ucsi_ccg typec_ucsi snd_soc_rt5659 snd_soc_audio_graph_card drm backlight tegra210_adma snd_soc_tegra210_ahub crct10dif_ce snd_soc_simple_card_utils pwm_fan snd_soc_rl6231 typec ina3221 snd_hda_codec_hdmi tegra_aconnect pwm_tegra snd_hda_tegra snd_hda_codec snd_hda_core phy_tegra194_p2u tegra_xudc at24 lm90 pcie_tegra194 host1x tegra_bpmp_thermal ip_tables x_tables ipv6 [ 245.626942] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.12.0 #5 [ 245.635072] Tainted: [W]=WARN [ 245.638220] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT) [ 245.645039] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 245.651870] pc : skb_release_data+0x100/0x1e4 [ 245.656334] lr : sk_skb_reason_drop+0x44/0xb0 [ 245.660797] sp : ffff800080003c80 [ 245.664206] x29: ffff800080003c80 x28: ffff000083d58980 x27: 0000000000000900 [ 245.671813] x26: ffff000083d5c980 x25: ffff0000937c03c0 x24: 0000000000000002 [ 245.678906] x23: 00000000ffffffff x22: 0000000000000001 x21: ffff000096ae8200 [ 245.686258] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000004860 [ 245.693605] x17: ffff80037be97000 x16: ffff800080000000 x15: ffff8000827d4968 [ 245.700870] x14: fffffffffffe485f x13: 2e656572662d7265 x12: 7466612d65737520 [ 245.707957] x11: ffff8000827d49e8 x10: ffff8000827d49e8 x9 : 00000000ffffefff [ 245.715306] x8 : ffff80008282c9e8 x7 : 0000000000017fe8 x6 : 00000000fffff000 [ 245.722825] x5 : ffff0003fde07348 x4 : ffff0000937c03c0 x3 : ffff0000937c0280 [ 245.729762] x2 : 0000000000000140 x1 : ffff0000937c03c0 x0 : 0000000000000000 [ 245.737009] Call trace: [ 245.739459] skb_release_data+0x100/0x1e4 [ 245.743657] sk_skb_reason_drop+0x44/0xb0 [ 245.747684] dev_kfree_skb_any_reason+0x44/0x50 [ 245.752490] stmmac_tx_clean+0x1ec/0x798 [ 245.756177] stmmac_napi_poll_tx+0x6c/0x144 [ 245.760199] __napi_poll+0x38/0x190 [ 245.763868] net_rx_action+0x140/0x294 [ 245.767888] handle_softirqs+0x120/0x24c [ 245.771574] __do_softirq+0x14/0x20 [ 245.775326] ____do_softirq+0x10/0x1c [ 245.778748] call_on_irq_stack+0x24/0x4c [ 245.782510] do_softirq_own_stack+0x1c/0x2c [ 245.786964] irq_exit_rcu+0x8c/0xc4 [ 245.790463] el1_interrupt+0x38/0x68 [ 245.794139] el1h_64_irq_handler+0x18/0x24 [ 245.798166] el1h_64_irq+0x64/0x68 [ 245.801318] default_idle_call+0x28/0x3c [ 245.805166] do_idle+0x208/0x264 [ 245.808576] cpu_startup_entry+0x34/0x3c [ 245.812088] kernel_init+0x0/0x1d8 [ 245.815594] start_kernel+0x5c0/0x708 [ 245.819076] __primary_switched+0x80/0x88 [ 245.823295] Code: 97fff632 72001c1f 54000161 370005b3 (f9400661) [ 245.829151] ---[ end trace 0000000000000000 ]--- And here is another crash ... [ 149.986210] dwc-eth-dwmac 2490000.ethernet eth0: Tx DMA map failed [ 149.992845] kernel BUG at lib/dynamic_queue_limits.c:99! [ 149.998152] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP [ 150.004928] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic snd_soc_tegra186_dspk snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_sfc snd_soc_tegra210_i2s tegra_drm drm_dp_aux_bus cec drm_display_helper drm_kms_helper tegra210_adma snd_soc_tegra210_ahub drm backlight snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_simple_card_utils crct10dif_ce tegra_bpmp_thermal at24 tegra_aconnect snd_hda_codec_hdmi tegra_xudc snd_hda_tegra snd_hda_codec snd_hda_core ina3221 host1x ip_tables x_tables ipv6 [ 150.061268] CPU: 5 UID: 102 PID: 240 Comm: systemd-resolve Tainted: G S W 6.12.0-dirty #7 [ 150.070654] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 150.075438] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT) [ 150.081348] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 150.088303] pc : dql_completed+0x1fc/0x20c [ 150.092401] lr : stmmac_tx_clean+0x3b4/0x798 [ 150.096669] sp : ffff800082d73d00 [ 150.099979] x29: ffff800082d73d00 x28: ffff000080898980 x27: 0000000000002ce0 [ 150.107115] x26: ffff00008089c980 x25: 0000000000000000 x24: ffff000083c88000 [ 150.114248] x23: 0000000000000000 x22: 0000000000000001 x21: ffff000080898980 [ 150.121380] x20: 0000000000000000 x19: 0000000000000168 x18: 0000000000006540 [ 150.128513] x17: ffff800172d32000 x16: ffff800082d70000 x15: ffff8000827d4968 [ 150.135646] x14: fffffffffffe653f x13: 2e656572662d7265 x12: 7466612d65737520 [ 150.142781] x11: ffff8000827d49e8 x10: ffff8000827d49e8 x9 : 0000000000000000 [ 150.149913] x8 : 000000000003ca11 x7 : 0000000000017fe8 x6 : 000000000003ca11 [ 150.157046] x5 : ffff000080d09140 x4 : ffff0001f4cb0840 x3 : 000000000010fe05 [ 150.164181] x2 : 0000000000000000 x1 : 000000000000004a x0 : ffff000083c88080 [ 150.171314] Call trace: [ 150.173757] dql_completed+0x1fc/0x20c [ 150.177507] stmmac_napi_poll_tx+0x6c/0x144 [ 150.181688] __napi_poll+0x38/0x190 [ 150.185174] net_rx_action+0x140/0x294 [ 150.188921] handle_softirqs+0x120/0x24c [ 150.192843] __do_softirq+0x14/0x20 [ 150.196328] ____do_softirq+0x10/0x1c [ 150.199987] call_on_irq_stack+0x24/0x4c [ 150.203908] do_softirq_own_stack+0x1c/0x2c [ 150.208088] do_softirq+0x54/0x6c [ 150.211401] __local_bh_enable_ip+0x8c/0x98 [ 150.215583] __dev_queue_xmit+0x4e4/0xd6c [ 150.219588] ip_finish_output2+0x4cc/0x5e8 [ 150.223682] __ip_finish_output+0xac/0x17c [ 150.227776] ip_finish_output+0x34/0x10c [ 150.231696] ip_output+0x68/0xfc [ 150.234921] __ip_queue_xmit+0x16c/0x464 [ 150.238840] ip_queue_xmit+0x14/0x20 [ 150.242413] __tcp_transmit_skb+0x490/0xc4c [ 150.246593] tcp_connect+0xa08/0xdbc [ 150.250167] tcp_v4_connect+0x35c/0x494 [ 150.253999] __inet_stream_connect+0xf8/0x3c8 [ 150.258354] inet_stream_connect+0x48/0x70 [ 150.262447] __sys_connect+0xe0/0xfc [ 150.266021] __arm64_sys_connect+0x20/0x30 [ 150.270113] invoke_syscall+0x48/0x110 [ 150.273860] el0_svc_common.constprop.0+0xc8/0xe8 [ 150.278561] do_el0_svc+0x20/0x2c [ 150.281875] el0_svc+0x30/0xd0 [ 150.284929] el0t_64_sync_handler+0x13c/0x158 [ 150.289282] el0t_64_sync+0x190/0x194 [ 150.292945] Code: 7a401860 5400008b 2a0403e3 17ffff9c (d4210000) [ 150.299033] ---[ end trace 0000000000000000 ]--- [ 150.303647] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt Let me know if you need any more information. Thanks Jon
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index d3895d7eecfc..208dbc68aaf9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4304,11 +4304,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) if (dma_mapping_error(priv->device, des)) goto dma_map_err; - tx_q->tx_skbuff_dma[first_entry].buf = des; - tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb); - tx_q->tx_skbuff_dma[first_entry].map_as_page = false; - tx_q->tx_skbuff_dma[first_entry].buf_type = STMMAC_TXBUF_T_SKB; - if (priv->dma_cap.addr64 <= 32) { first->des0 = cpu_to_le32(des); @@ -4327,6 +4322,23 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); + /* In case two or more DMA transmit descriptors are allocated for this + * non-paged SKB data, the DMA buffer address should be saved to + * tx_q->tx_skbuff_dma[].buf corresponding to the last descriptor, + * and leave the other tx_q->tx_skbuff_dma[].buf as NULL to guarantee + * that stmmac_tx_clean() does not unmap the entire DMA buffer too early + * since the tail areas of the DMA buffer can be accessed by DMA engine + * sooner or later. + * By saving the DMA buffer address to tx_q->tx_skbuff_dma[].buf + * corresponding to the last descriptor, stmmac_tx_clean() will unmap + * this DMA buffer right after the DMA engine completely finishes the + * full buffer transmission. + */ + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des; + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb); + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false; + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; + /* Prepare fragments */ for (i = 0; i < nfrags; i++) { const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
In case the non-paged data of a SKB carries protocol header and protocol payload to be transmitted on a certain platform that the DMA AXI address width is configured to 40-bit/48-bit, or the size of the non-paged data is bigger than TSO_MAX_BUFF_SIZE on a certain platform that the DMA AXI address width is configured to 32-bit, then this SKB requires at least two DMA transmit descriptors to serve it. For example, three descriptors are allocated to split one DMA buffer mapped from one piece of non-paged data: dma_desc[N + 0], dma_desc[N + 1], dma_desc[N + 2]. Then three elements of tx_q->tx_skbuff_dma[] will be allocated to hold extra information to be reused in stmmac_tx_clean(): tx_q->tx_skbuff_dma[N + 0], tx_q->tx_skbuff_dma[N + 1], tx_q->tx_skbuff_dma[N + 2]. Now we focus on tx_q->tx_skbuff_dma[entry].buf, which is the DMA buffer address returned by DMA mapping call. stmmac_tx_clean() will try to unmap the DMA buffer _ONLY_IF_ tx_q->tx_skbuff_dma[entry].buf is a valid buffer address. The expected behavior that saves DMA buffer address of this non-paged data to tx_q->tx_skbuff_dma[entry].buf is: tx_q->tx_skbuff_dma[N + 0].buf = NULL; tx_q->tx_skbuff_dma[N + 1].buf = NULL; tx_q->tx_skbuff_dma[N + 2].buf = dma_map_single(); Unfortunately, the current code misbehaves like this: tx_q->tx_skbuff_dma[N + 0].buf = dma_map_single(); tx_q->tx_skbuff_dma[N + 1].buf = NULL; tx_q->tx_skbuff_dma[N + 2].buf = NULL; On the stmmac_tx_clean() side, when dma_desc[N + 0] is closed by the DMA engine, tx_q->tx_skbuff_dma[N + 0].buf is a valid buffer address obviously, then the DMA buffer will be unmapped immediately. There may be a rare case that the DMA engine does not finish the pending dma_desc[N + 1], dma_desc[N + 2] yet. Now things will go horribly wrong, DMA is going to access a unmapped/unreferenced memory region, corrupted data will be transmited or iommu fault will be triggered :( In contrast, the for-loop that maps SKB fragments behaves perfectly as expected, and that is how the driver should do for both non-paged data and paged frags actually. This patch corrects DMA map/unmap sequences by fixing the array index for tx_q->tx_skbuff_dma[entry].buf when assigning DMA buffer address. Tested and verified on DWXGMAC CORE 3.20a Reported-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> Fixes: f748be531d70 ("stmmac: support new GMAC4") Signed-off-by: Furong Xu <0x1207@gmail.com> --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)