diff mbox series

[net-next,4/4] net/mlx5: Add sensor name to temperature event message

Message ID 20250213094641.226501-5-tariqt@nvidia.com (mailing list archive)
State Not Applicable
Headers show
Series mlx5: Add sensor name in temperature message | expand

Commit Message

Tariq Toukan Feb. 13, 2025, 9:46 a.m. UTC
From: Shahar Shitrit <shshitrit@nvidia.com>

Previously, a temperature event message included a bitmap indicating
which sensors detect high temperatures.

To enhance clarity, we modify the message format to explicitly list
the names of the overheating sensors, alongside the sensors bitmap.
If HWMON is not configured, the event message remains unchanged.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/events.c  | 31 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/hwmon.c   |  5 +++
 .../net/ethernet/mellanox/mlx5/core/hwmon.h   |  1 +
 3 files changed, 34 insertions(+), 3 deletions(-)

Comments

Simon Horman Feb. 15, 2025, 7:29 p.m. UTC | #1
On Thu, Feb 13, 2025 at 11:46:41AM +0200, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> Previously, a temperature event message included a bitmap indicating
> which sensors detect high temperatures.
> 
> To enhance clarity, we modify the message format to explicitly list
> the names of the overheating sensors, alongside the sensors bitmap.
> If HWMON is not configured, the event message remains unchanged.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>

...

> +#if IS_ENABLED(CONFIG_HWMON)
> +static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon,
> +					  u64 bit_set, int bit_set_offset)
> +{
> +	unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
> +	int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
> +	int i;
> +
> +	for_each_set_bit(i, bit_set_ptr, num_bits) {
> +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
> +
> +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
> +	}
> +}

nit:

If you have to respin for some other reason, please consider limiting lines
to 80 columns wide or less here and elsewhere in this patch where it
doesn't reduce readability (subjective I know).

e.g.:

static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev,
                                          struct mlx5_hwmon *hwmon,
                                          u64 bit_set, int bit_set_offset)
{
        unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
        int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
        int i;

        for_each_set_bit(i, bit_set_ptr, num_bits) {
                const char *sensor_name;

                sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);

                mlx5_core_warn(dev, "Sensor name[%d]: %s\n",
                               i + bit_set_offset, sensor_name);
        }
}

...
Jakub Kicinski Feb. 18, 2025, 12:27 a.m. UTC | #2
On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote:
> > +	for_each_set_bit(i, bit_set_ptr, num_bits) {
> > +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
> > +
> > +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
> > +	}
> > +}  
> 
> nit:
> 
> If you have to respin for some other reason, please consider limiting lines
> to 80 columns wide or less here and elsewhere in this patch where it
> doesn't reduce readability (subjective I know).

+1, please try to catch such situations going forward
Tariq Toukan Feb. 19, 2025, 1 p.m. UTC | #3
On 18/02/2025 2:27, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote:
>>> +	for_each_set_bit(i, bit_set_ptr, num_bits) {
>>> +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
>>> +
>>> +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
>>> +	}
>>> +}
>>
>> nit:
>>
>> If you have to respin for some other reason, please consider limiting lines
>> to 80 columns wide or less here and elsewhere in this patch where it
>> doesn't reduce readability (subjective I know).
> 
> +1, please try to catch such situations going forward
> 

Hi Jakub,

This was not missed.
This is not a new thing...
We've been enforcing a max line length of 100 chars in mlx5 driver for 
the past few years.
I don't have the full image now, but I'm convinced that this dates back 
to an agreement between the mlx5 and netdev maintainers at that time.

80 chars could be too restrictive, especially with today's large 
monitors, while 100-chars is still highly readable.
This is subjective of course...

If you don't have a strong preference, we'll keep the current 100 chars 
limit. Otherwise, just let me know and we'll start enforcing the 
80-chars limit for future patches.

Regards,
Tariq
Jakub Kicinski Feb. 19, 2025, 3:28 p.m. UTC | #4
On Wed, 19 Feb 2025 15:00:57 +0200 Tariq Toukan wrote:
> >> If you have to respin for some other reason, please consider limiting lines
> >> to 80 columns wide or less here and elsewhere in this patch where it
> >> doesn't reduce readability (subjective I know).  
> > 
> > +1, please try to catch such situations going forward
> 
> This was not missed.
> This is not a new thing...
> We've been enforcing a max line length of 100 chars in mlx5 driver for 
> the past few years.
> I don't have the full image now, but I'm convinced that this dates back 
> to an agreement between the mlx5 and netdev maintainers at that time.
> 
> 80 chars could be too restrictive, especially with today's large 
> monitors, while 100-chars is still highly readable.
> This is subjective of course...
> 
> If you don't have a strong preference, we'll keep the current 100 chars 
> limit. Otherwise, just let me know and we'll start enforcing the 
> 80-chars limit for future patches.

Right, I think mlx5 is the only exception to the 80 column guidance.
I don't think it's resulting in more readable code, so yes, my
preference is to end this experiment.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index e85a9042e3c2..01c5f5990f9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -6,6 +6,7 @@ 
 #include "mlx5_core.h"
 #include "lib/eq.h"
 #include "lib/events.h"
+#include "hwmon.h"
 
 struct mlx5_event_nb {
 	struct mlx5_nb  nb;
@@ -153,11 +154,28 @@  static int any_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+#if IS_ENABLED(CONFIG_HWMON)
+static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon,
+					  u64 bit_set, int bit_set_offset)
+{
+	unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
+	int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
+	int i;
+
+	for_each_set_bit(i, bit_set_ptr, num_bits) {
+		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
+
+		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
+	}
+}
+#endif /* CONFIG_HWMON */
+
 /* type == MLX5_EVENT_TYPE_TEMP_WARN_EVENT */
 static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 {
 	struct mlx5_event_nb *event_nb = mlx5_nb_cof(nb, struct mlx5_event_nb, nb);
 	struct mlx5_events   *events   = event_nb->ctx;
+	struct mlx5_core_dev *dev      = events->dev;
 	struct mlx5_eqe      *eqe      = data;
 	u64 value_lsb;
 	u64 value_msb;
@@ -169,10 +187,17 @@  static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 	value_lsb &= 0x1;
 	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
 
-	if (net_ratelimit())
-		mlx5_core_warn(events->dev,
-			       "High temperature on sensors with bit set %#llx %#llx",
+	if (net_ratelimit()) {
+		mlx5_core_warn(dev, "High temperature on sensors with bit set %#llx %#llx.\n",
 			       value_msb, value_lsb);
+#if IS_ENABLED(CONFIG_HWMON)
+		if (dev->hwmon) {
+			print_sensor_names_in_bit_set(dev, dev->hwmon, value_lsb, 0);
+			print_sensor_names_in_bit_set(dev, dev->hwmon, value_msb,
+						      sizeof(value_lsb) * BITS_PER_BYTE);
+		}
+#endif
+	}
 
 	return NOTIFY_OK;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
index 353f81dccd1c..4ba2636d7fb6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
@@ -416,3 +416,8 @@  void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev)
 	mlx5_hwmon_free(hwmon);
 	mdev->hwmon = NULL;
 }
+
+const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel)
+{
+	return hwmon->temp_channel_desc[channel].sensor_name;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
index 999654a9b9da..f38271c22c10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
@@ -10,6 +10,7 @@ 
 
 int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev);
 void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev);
+const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel);
 
 #else
 static inline int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev)