diff mbox series

[net,1/8] net/mlx5: Fix error path in multi-packet WQE transmit

Message ID 20240925202013.45374-2-saeed@kernel.org (mailing list archive)
State Accepted
Commit 2bcae12c795f32ddfbf8c80d1b5f1d3286341c32
Delegated to: Netdev Maintainers
Headers show
Series [net,1/8] net/mlx5: Fix error path in multi-packet WQE transmit | expand

Checks

Context Check Description
netdev/series_format success Pull request is its own cover letter
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-02--00-00 (tests: 750)

Commit Message

Saeed Mahameed Sept. 25, 2024, 8:20 p.m. UTC
From: Gerd Bayer <gbayer@linux.ibm.com>

Remove the erroneous unmap in case no DMA mapping was established

The multi-packet WQE transmit code attempts to obtain a DMA mapping for
the skb. This could fail, e.g. under memory pressure, when the IOMMU
driver just can't allocate more memory for page tables. While the code
tries to handle this in the path below the err_unmap label it erroneously
unmaps one entry from the sq's FIFO list of active mappings. Since the
current map attempt failed this unmap is removing some random DMA mapping
that might still be required. If the PCI function now presents that IOVA,
the IOMMU may assumes a rogue DMA access and e.g. on s390 puts the PCI
function in error state.

The erroneous behavior was seen in a stress-test environment that created
memory pressure.

Fixes: 5af75c747e2a ("net/mlx5e: Enhanced TX MPWQE for SKBs")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 -
 1 file changed, 1 deletion(-)

Comments

patchwork-bot+netdevbpf@kernel.org Oct. 3, 2024, 12:30 a.m. UTC | #1
Hello:

This series was applied to netdev/net.git (main)
by Saeed Mahameed <saeedm@nvidia.com>:

On Wed, 25 Sep 2024 13:20:06 -0700 you wrote:
> From: Gerd Bayer <gbayer@linux.ibm.com>
> 
> Remove the erroneous unmap in case no DMA mapping was established
> 
> The multi-packet WQE transmit code attempts to obtain a DMA mapping for
> the skb. This could fail, e.g. under memory pressure, when the IOMMU
> driver just can't allocate more memory for page tables. While the code
> tries to handle this in the path below the err_unmap label it erroneously
> unmaps one entry from the sq's FIFO list of active mappings. Since the
> current map attempt failed this unmap is removing some random DMA mapping
> that might still be required. If the PCI function now presents that IOVA,
> the IOMMU may assumes a rogue DMA access and e.g. on s390 puts the PCI
> function in error state.
> 
> [...]

Here is the summary with links:
  - [net,1/8] net/mlx5: Fix error path in multi-packet WQE transmit
    https://git.kernel.org/netdev/net/c/2bcae12c795f
  - [net,2/8] net/mlx5: Added cond_resched() to crdump collection
    https://git.kernel.org/netdev/net/c/ec7931558941
  - [net,3/8] net/mlx5e: Fix NULL deref in mlx5e_tir_builder_alloc()
    https://git.kernel.org/netdev/net/c/f25389e77950
  - [net,4/8] net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc
    https://git.kernel.org/netdev/net/c/19da17010a55
  - [net,5/8] net/mlx5: HWS, fixed double-free in error flow of creating SQ
    https://git.kernel.org/netdev/net/c/d8c561741ef8
  - [net,6/8] net/mlx5: HWS, changed E2BIG error to a negative return code
    https://git.kernel.org/netdev/net/c/d15525f30010
  - [net,7/8] net/mlx5e: SHAMPO, Fix overflow of hd_per_wq
    https://git.kernel.org/netdev/net/c/023d2a43ed0d
  - [net,8/8] net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice
    https://git.kernel.org/netdev/net/c/7b124695db40

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index b09e9abd39f3..f8c7912abe0e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -642,7 +642,6 @@  mlx5e_sq_xmit_mpwqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	return;
 
 err_unmap:
-	mlx5e_dma_unmap_wqe_err(sq, 1);
 	sq->stats->dropped++;
 	dev_kfree_skb_any(skb);
 	mlx5e_tx_flush(sq);