diff mbox series

[net-next,v2,1/2] mlx5/core: relax memory barrier in eq_update_ci()

Message ID 20241107183054.2443218-1-csander@purestorage.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2,1/2] mlx5/core: relax memory barrier in eq_update_ci() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: memory barrier without comment
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-11--21-00 (tests: 787)

Commit Message

Caleb Sander Nov. 7, 2024, 6:30 p.m. UTC
The memory barrier in eq_update_ci() after the doorbell write is a
significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see
3% of CPU time spent on the mfence instruction.

98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci
update") already relaxed the full memory barrier to just a write barrier
in mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb()
with wmb() in eq_update_ci() too.

On strongly ordered architectures, no barrier is actually needed because
the MMIO writes to the doorbell register are guaranteed to appear to the
device in the order they were made. However, the kernel's ordered MMIO
primitive writel() lacks a convenient big-endian interface.
Therefore, we opt to stick with __raw_writel() + a barrier.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
v2: keep memory barrier instead of using ordered writel()

 drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Parav Pandit Nov. 8, 2024, 10:46 a.m. UTC | #1
> From: Caleb Sander Mateos <csander@purestorage.com>
> Sent: Friday, November 8, 2024 12:01 AM
> 
> The memory barrier in eq_update_ci() after the doorbell write is a significant
> hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see 3% of CPU
> time spent on the mfence instruction.
> 
> 98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci
> update") already relaxed the full memory barrier to just a write barrier in
> mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb() with
> wmb() in eq_update_ci() too.
> 
> On strongly ordered architectures, no barrier is actually needed because the
> MMIO writes to the doorbell register are guaranteed to appear to the device in
> the order they were made. However, the kernel's ordered MMIO primitive
> writel() lacks a convenient big-endian interface.
> Therefore, we opt to stick with __raw_writel() + a barrier.
> 
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> ---
> v2: keep memory barrier instead of using ordered writel()
> 
>  drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> index 4b7f7131c560..b1edc71ffc6d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> @@ -70,11 +70,11 @@ static inline void eq_update_ci(struct mlx5_eq *eq, int
> arm)
>  	__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>  	u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> 
>  	__raw_writel((__force u32)cpu_to_be32(val), addr);
>  	/* We still want ordering, just not swabbing, so add a barrier */
> -	mb();
> +	wmb();
>  }
> 
>  int mlx5_eq_table_init(struct mlx5_core_dev *dev);  void
> mlx5_eq_table_cleanup(struct mlx5_core_dev *dev);  int
> mlx5_eq_table_create(struct mlx5_core_dev *dev);
> --
> 2.45.2

Reviewed-by: Parav Pandit <parav@nvidia.com>
Tariq Toukan Nov. 11, 2024, 11:59 a.m. UTC | #2
On 08/11/2024 12:46, Parav Pandit wrote:
> 
>> From: Caleb Sander Mateos <csander@purestorage.com>
>> Sent: Friday, November 8, 2024 12:01 AM
>>
>> The memory barrier in eq_update_ci() after the doorbell write is a significant
>> hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see 3% of CPU
>> time spent on the mfence instruction.
>>
>> 98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci
>> update") already relaxed the full memory barrier to just a write barrier in
>> mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb() with
>> wmb() in eq_update_ci() too.
>>
>> On strongly ordered architectures, no barrier is actually needed because the
>> MMIO writes to the doorbell register are guaranteed to appear to the device in
>> the order they were made. However, the kernel's ordered MMIO primitive
>> writel() lacks a convenient big-endian interface.
>> Therefore, we opt to stick with __raw_writel() + a barrier.
>>
>> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>> ---
>> v2: keep memory barrier instead of using ordered writel()
>>
>>   drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> index 4b7f7131c560..b1edc71ffc6d 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> @@ -70,11 +70,11 @@ static inline void eq_update_ci(struct mlx5_eq *eq, int
>> arm)
>>   	__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>>   	u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>>
>>   	__raw_writel((__force u32)cpu_to_be32(val), addr);
>>   	/* We still want ordering, just not swabbing, so add a barrier */
>> -	mb();
>> +	wmb();
>>   }
>>
>>   int mlx5_eq_table_init(struct mlx5_core_dev *dev);  void
>> mlx5_eq_table_cleanup(struct mlx5_core_dev *dev);  int
>> mlx5_eq_table_create(struct mlx5_core_dev *dev);
>> --
>> 2.45.2
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> 

Acked-by: Tariq Toukan <tariqt@nvidia.com>
Jakub Kicinski Nov. 12, 2024, 2:38 a.m. UTC | #3
On Thu,  7 Nov 2024 11:30:51 -0700 Caleb Sander Mateos wrote:
> The memory barrier in eq_update_ci() after the doorbell write is a
> significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see
> 3% of CPU time spent on the mfence instruction.

Applied, thanks. 

In the future please avoid sending patches in reply to older version.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
index 4b7f7131c560..b1edc71ffc6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
@@ -70,11 +70,11 @@  static inline void eq_update_ci(struct mlx5_eq *eq, int arm)
 	__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
 	u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
 
 	__raw_writel((__force u32)cpu_to_be32(val), addr);
 	/* We still want ordering, just not swabbing, so add a barrier */
-	mb();
+	wmb();
 }
 
 int mlx5_eq_table_init(struct mlx5_core_dev *dev);
 void mlx5_eq_table_cleanup(struct mlx5_core_dev *dev);
 int mlx5_eq_table_create(struct mlx5_core_dev *dev);