diff mbox series

[net] net/mlx5e: SHAMPO, Make reserved size independent of page size

Message ID 1742732906-166564-1-git-send-email-tariqt@nvidia.com (mailing list archive)
State Not Applicable
Headers show
Series [net] net/mlx5e: SHAMPO, Make reserved size independent of page size | expand

Commit Message

Tariq Toukan March 23, 2025, 12:28 p.m. UTC
From: Lama Kayal <lkayal@nvidia.com>

When hw-gro is enabled, the maximum number of header entries that are
needed per wqe (hd_per_wqe) is calculated based on the size of the
reservations among other parameters.

Miscalculation of the size of reservations leads to incorrect
calculation of hd_per_wqe as 0, particularly in the case of large page
size like in aarch64, this prevents the SHAMPO header from being
correctly initialized in the device, ultimately causing the following
cqe err that indicates a violation of PD.

 mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1180
 mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x510, ci 0x0, qn 0x1180, opcode 0xe, syndrome  0x4, vendor syndrome 0x32
 00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32
 00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00
 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a
 00000030: 00 00 00 9a 93 00 32 04 00 00 00 00 00 00 da e1

Use the correct formula for calculating the size of reservations,
precisely it shouldn't be dependent on page size, instead use the
correct multiply of MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE.

Fixes: e5ca8fb08ab2 ("net/mlx5e: Add control path for SHAMPO feature")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)


base-commit: ed3ba9b6e280e14cc3148c1b226ba453f02fa76c

Comments

Simon Horman March 25, 2025, 2:04 p.m. UTC | #1
On Sun, Mar 23, 2025 at 02:28:26PM +0200, Tariq Toukan wrote:
> From: Lama Kayal <lkayal@nvidia.com>
> 
> When hw-gro is enabled, the maximum number of header entries that are
> needed per wqe (hd_per_wqe) is calculated based on the size of the
> reservations among other parameters.
> 
> Miscalculation of the size of reservations leads to incorrect
> calculation of hd_per_wqe as 0, particularly in the case of large page
> size like in aarch64, this prevents the SHAMPO header from being
> correctly initialized in the device, ultimately causing the following
> cqe err that indicates a violation of PD.

Hi Lama, Tariq, all,

If I understand things correctly, hd_per_wqe is calculated
in mlx5e_shampo_hd_per_wqe() like this:

u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev,
                            struct mlx5e_params *params,                                                    struct mlx5e_rq_param *rq_param)
{
        int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
        u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, NULL));
        int pkt_per_resv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
        u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, NULL);
        int wqe_size = BIT(log_stride_sz) * num_strides;                                u32 hd_per_wqe;

        /* Assumption: hd_per_wqe % 8 == 0. */
        hd_per_wqe = (wqe_size / resv_size) * pkt_per_resv;                             mlx5_core_dbg(mdev, "%s hd_per_wqe = %d rsrv_size = %d wqe_size = %d pkt_per_resv = %d\n",                                                                                    __func__, hd_per_wqe, resv_size, wqe_size, pkt_per_resv);
        return hd_per_wqe;
}

I can see that if PAGE_SIZE was some multiple of 4k, and thus
larger than wqe_size, then this could lead to hd_per_wqe being zero.

But I note that mlx5e_mpwqe_get_log_stride_size() may return PAGE_SHIFT.
And I wonder if that leads to wqe_size being larger than expected by this
patch in cases where the PAGE_SIZE is greater than 4k.

Likewise in mlx5e_shampo_get_log_cq_size(), which seems to have a large overlap
codewise with mlx5e_shampo_hd_per_wqe().

> 
>  mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1180
>  mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x510, ci 0x0, qn 0x1180, opcode 0xe, syndrome  0x4, vendor syndrome 0x32
>  00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32
>  00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00
>  00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a
>  00000030: 00 00 00 9a 93 00 32 04 00 00 00 00 00 00 da e1
> 
> Use the correct formula for calculating the size of reservations,
> precisely it shouldn't be dependent on page size, instead use the
> correct multiply of MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE.
> 
> Fixes: e5ca8fb08ab2 ("net/mlx5e: Add control path for SHAMPO feature")
> Signed-off-by: Lama Kayal <lkayal@nvidia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> index 64b62ed17b07..31eb99f09c63 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> @@ -423,7 +423,7 @@ u8 mlx5e_shampo_get_log_pkt_per_rsrv(struct mlx5_core_dev *mdev,
>  				     struct mlx5e_params *params)
>  {
>  	u32 resrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) *
> -			 PAGE_SIZE;
> +			 MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE;
>  
>  	return order_base_2(DIV_ROUND_UP(resrv_size, params->sw_mtu));
>  }
> @@ -827,7 +827,8 @@ static u32 mlx5e_shampo_get_log_cq_size(struct mlx5_core_dev *mdev,
>  					struct mlx5e_params *params,
>  					struct mlx5e_xsk_param *xsk)
>  {
> -	int rsrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
> +	int rsrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) *
> +		MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE;
>  	u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
>  	int pkt_per_rsrv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
>  	u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, xsk);
> @@ -1036,7 +1037,8 @@ u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev,
>  			    struct mlx5e_params *params,
>  			    struct mlx5e_rq_param *rq_param)
>  {
> -	int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
> +	int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) *
> +		MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE;
>  	u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, NULL));
>  	int pkt_per_resv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
>  	u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, NULL);
> 
> base-commit: ed3ba9b6e280e14cc3148c1b226ba453f02fa76c
> -- 
> 2.31.1
> 
>
Tariq Toukan March 27, 2025, 5:58 p.m. UTC | #2
On 25/03/2025 16:04, Simon Horman wrote:
> On Sun, Mar 23, 2025 at 02:28:26PM +0200, Tariq Toukan wrote:
>> From: Lama Kayal <lkayal@nvidia.com>
>>
>> When hw-gro is enabled, the maximum number of header entries that are
>> needed per wqe (hd_per_wqe) is calculated based on the size of the
>> reservations among other parameters.
>>
>> Miscalculation of the size of reservations leads to incorrect
>> calculation of hd_per_wqe as 0, particularly in the case of large page
>> size like in aarch64, this prevents the SHAMPO header from being
>> correctly initialized in the device, ultimately causing the following
>> cqe err that indicates a violation of PD.
> 
> Hi Lama, Tariq, all,
> 
> If I understand things correctly, hd_per_wqe is calculated
> in mlx5e_shampo_hd_per_wqe() like this:
> 
> u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev,
>                              struct mlx5e_params *params,                                                    struct mlx5e_rq_param *rq_param)
> {
>          int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
>          u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, NULL));
>          int pkt_per_resv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
>          u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, NULL);
>          int wqe_size = BIT(log_stride_sz) * num_strides;                                u32 hd_per_wqe;
> 
>          /* Assumption: hd_per_wqe % 8 == 0. */
>          hd_per_wqe = (wqe_size / resv_size) * pkt_per_resv;                             mlx5_core_dbg(mdev, "%s hd_per_wqe = %d rsrv_size = %d wqe_size = %d pkt_per_resv = %d\n",                                                                                    __func__, hd_per_wqe, resv_size, wqe_size, pkt_per_resv);
>          return hd_per_wqe;
> }
> 
> I can see that if PAGE_SIZE was some multiple of 4k, and thus
> larger than wqe_size, then this could lead to hd_per_wqe being zero.
> 
> But I note that mlx5e_mpwqe_get_log_stride_size() may return PAGE_SHIFT.
> And I wonder if that leads to wqe_size being larger than expected by this
> patch in cases where the PAGE_SIZE is greater than 4k.
> 
> Likewise in mlx5e_shampo_get_log_cq_size(), which seems to have a large overlap
> codewise with mlx5e_shampo_hd_per_wqe().
> 
>>

Hi Simon,

Different settings lead to different combinations of num_strides and 
stride_size. However, they affect each other in a way that the resulting 
wqe_size has the expected (~preset) value.

In mlx5e_mpwqe_get_log_num_strides() you can see that if stride_size 
grows, then num_strides decreases accordingly.

In addition, to reduce mistakes/bugs, we have a few WARNs() along the 
calculations, in addition to a verifier function 
mlx5e_verify_rx_mpwqe_strides().

Thanks,
Tariq
patchwork-bot+netdevbpf@kernel.org March 28, 2025, 1:50 p.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 23 Mar 2025 14:28:26 +0200 you wrote:
> From: Lama Kayal <lkayal@nvidia.com>
> 
> When hw-gro is enabled, the maximum number of header entries that are
> needed per wqe (hd_per_wqe) is calculated based on the size of the
> reservations among other parameters.
> 
> Miscalculation of the size of reservations leads to incorrect
> calculation of hd_per_wqe as 0, particularly in the case of large page
> size like in aarch64, this prevents the SHAMPO header from being
> correctly initialized in the device, ultimately causing the following
> cqe err that indicates a violation of PD.
> 
> [...]

Here is the summary with links:
  - [net] net/mlx5e: SHAMPO, Make reserved size independent of page size
    https://git.kernel.org/netdev/net/c/fab058356885

You are awesome, thank you!
Simon Horman March 31, 2025, 9:46 a.m. UTC | #4
On Thu, Mar 27, 2025 at 07:58:30PM +0200, Tariq Toukan wrote:
> 
> 
> On 25/03/2025 16:04, Simon Horman wrote:
> > On Sun, Mar 23, 2025 at 02:28:26PM +0200, Tariq Toukan wrote:
> > > From: Lama Kayal <lkayal@nvidia.com>
> > > 
> > > When hw-gro is enabled, the maximum number of header entries that are
> > > needed per wqe (hd_per_wqe) is calculated based on the size of the
> > > reservations among other parameters.
> > > 
> > > Miscalculation of the size of reservations leads to incorrect
> > > calculation of hd_per_wqe as 0, particularly in the case of large page
> > > size like in aarch64, this prevents the SHAMPO header from being
> > > correctly initialized in the device, ultimately causing the following
> > > cqe err that indicates a violation of PD.
> > 
> > Hi Lama, Tariq, all,
> > 
> > If I understand things correctly, hd_per_wqe is calculated
> > in mlx5e_shampo_hd_per_wqe() like this:
> > 
> > u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev,
> >                              struct mlx5e_params *params,                                                    struct mlx5e_rq_param *rq_param)
> > {
> >          int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
> >          u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, NULL));
> >          int pkt_per_resv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
> >          u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, NULL);
> >          int wqe_size = BIT(log_stride_sz) * num_strides;                                u32 hd_per_wqe;
> > 
> >          /* Assumption: hd_per_wqe % 8 == 0. */
> >          hd_per_wqe = (wqe_size / resv_size) * pkt_per_resv;                             mlx5_core_dbg(mdev, "%s hd_per_wqe = %d rsrv_size = %d wqe_size = %d pkt_per_resv = %d\n",                                                                                    __func__, hd_per_wqe, resv_size, wqe_size, pkt_per_resv);
> >          return hd_per_wqe;
> > }
> > 
> > I can see that if PAGE_SIZE was some multiple of 4k, and thus
> > larger than wqe_size, then this could lead to hd_per_wqe being zero.
> > 
> > But I note that mlx5e_mpwqe_get_log_stride_size() may return PAGE_SHIFT.
> > And I wonder if that leads to wqe_size being larger than expected by this
> > patch in cases where the PAGE_SIZE is greater than 4k.
> > 
> > Likewise in mlx5e_shampo_get_log_cq_size(), which seems to have a large overlap
> > codewise with mlx5e_shampo_hd_per_wqe().
> > 
> > > 
> 
> Hi Simon,
> 
> Different settings lead to different combinations of num_strides and
> stride_size. However, they affect each other in a way that the resulting
> wqe_size has the expected (~preset) value.
> 
> In mlx5e_mpwqe_get_log_num_strides() you can see that if stride_size grows,
> then num_strides decreases accordingly.
> 
> In addition, to reduce mistakes/bugs, we have a few WARNs() along the
> calculations, in addition to a verifier function
> mlx5e_verify_rx_mpwqe_strides().

Hi Tariq,

Sorry for the slow response, I was AFK for a few days.

Thanks for confirming that this is as expected.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 64b62ed17b07..31eb99f09c63 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -423,7 +423,7 @@  u8 mlx5e_shampo_get_log_pkt_per_rsrv(struct mlx5_core_dev *mdev,
 				     struct mlx5e_params *params)
 {
 	u32 resrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) *
-			 PAGE_SIZE;
+			 MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE;
 
 	return order_base_2(DIV_ROUND_UP(resrv_size, params->sw_mtu));
 }
@@ -827,7 +827,8 @@  static u32 mlx5e_shampo_get_log_cq_size(struct mlx5_core_dev *mdev,
 					struct mlx5e_params *params,
 					struct mlx5e_xsk_param *xsk)
 {
-	int rsrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
+	int rsrv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) *
+		MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE;
 	u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
 	int pkt_per_rsrv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
 	u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, xsk);
@@ -1036,7 +1037,8 @@  u32 mlx5e_shampo_hd_per_wqe(struct mlx5_core_dev *mdev,
 			    struct mlx5e_params *params,
 			    struct mlx5e_rq_param *rq_param)
 {
-	int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) * PAGE_SIZE;
+	int resv_size = BIT(mlx5e_shampo_get_log_rsrv_size(mdev, params)) *
+		MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE;
 	u16 num_strides = BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, NULL));
 	int pkt_per_resv = BIT(mlx5e_shampo_get_log_pkt_per_rsrv(mdev, params));
 	u8 log_stride_sz = mlx5e_mpwqe_get_log_stride_size(mdev, params, NULL);