Message ID | 20250213094641.226501-5-tariqt@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 46fd50cfcc12368bed9ae5257cc3beaea5b3c193 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mlx5: Add sensor name in temperature message | expand |
On Thu, Feb 13, 2025 at 11:46:41AM +0200, Tariq Toukan wrote: > From: Shahar Shitrit <shshitrit@nvidia.com> > > Previously, a temperature event message included a bitmap indicating > which sensors detect high temperatures. > > To enhance clarity, we modify the message format to explicitly list > the names of the overheating sensors, alongside the sensors bitmap. > If HWMON is not configured, the event message remains unchanged. > > Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> > Reviewed-by: Carolina Jubran <cjubran@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> ... > +#if IS_ENABLED(CONFIG_HWMON) > +static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon, > + u64 bit_set, int bit_set_offset) > +{ > + unsigned long *bit_set_ptr = (unsigned long *)&bit_set; > + int num_bits = sizeof(bit_set) * BITS_PER_BYTE; > + int i; > + > + for_each_set_bit(i, bit_set_ptr, num_bits) { > + const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset); > + > + mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name); > + } > +} nit: If you have to respin for some other reason, please consider limiting lines to 80 columns wide or less here and elsewhere in this patch where it doesn't reduce readability (subjective I know). e.g.: static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon, u64 bit_set, int bit_set_offset) { unsigned long *bit_set_ptr = (unsigned long *)&bit_set; int num_bits = sizeof(bit_set) * BITS_PER_BYTE; int i; for_each_set_bit(i, bit_set_ptr, num_bits) { const char *sensor_name; sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset); mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name); } } ...
On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote: > > + for_each_set_bit(i, bit_set_ptr, num_bits) { > > + const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset); > > + > > + mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name); > > + } > > +} > > nit: > > If you have to respin for some other reason, please consider limiting lines > to 80 columns wide or less here and elsewhere in this patch where it > doesn't reduce readability (subjective I know). +1, please try to catch such situations going forward
On 18/02/2025 2:27, Jakub Kicinski wrote: > On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote: >>> + for_each_set_bit(i, bit_set_ptr, num_bits) { >>> + const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset); >>> + >>> + mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name); >>> + } >>> +} >> >> nit: >> >> If you have to respin for some other reason, please consider limiting lines >> to 80 columns wide or less here and elsewhere in this patch where it >> doesn't reduce readability (subjective I know). > > +1, please try to catch such situations going forward > Hi Jakub, This was not missed. This is not a new thing... We've been enforcing a max line length of 100 chars in mlx5 driver for the past few years. I don't have the full image now, but I'm convinced that this dates back to an agreement between the mlx5 and netdev maintainers at that time. 80 chars could be too restrictive, especially with today's large monitors, while 100-chars is still highly readable. This is subjective of course... If you don't have a strong preference, we'll keep the current 100 chars limit. Otherwise, just let me know and we'll start enforcing the 80-chars limit for future patches. Regards, Tariq
On Wed, 19 Feb 2025 15:00:57 +0200 Tariq Toukan wrote: > >> If you have to respin for some other reason, please consider limiting lines > >> to 80 columns wide or less here and elsewhere in this patch where it > >> doesn't reduce readability (subjective I know). > > > > +1, please try to catch such situations going forward > > This was not missed. > This is not a new thing... > We've been enforcing a max line length of 100 chars in mlx5 driver for > the past few years. > I don't have the full image now, but I'm convinced that this dates back > to an agreement between the mlx5 and netdev maintainers at that time. > > 80 chars could be too restrictive, especially with today's large > monitors, while 100-chars is still highly readable. > This is subjective of course... > > If you don't have a strong preference, we'll keep the current 100 chars > limit. Otherwise, just let me know and we'll start enforcing the > 80-chars limit for future patches. Right, I think mlx5 is the only exception to the 80 column guidance. I don't think it's resulting in more readable code, so yes, my preference is to end this experiment.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c index e85a9042e3c2..01c5f5990f9a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/events.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c @@ -6,6 +6,7 @@ #include "mlx5_core.h" #include "lib/eq.h" #include "lib/events.h" +#include "hwmon.h" struct mlx5_event_nb { struct mlx5_nb nb; @@ -153,11 +154,28 @@ static int any_notifier(struct notifier_block *nb, return NOTIFY_OK; } +#if IS_ENABLED(CONFIG_HWMON) +static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon, + u64 bit_set, int bit_set_offset) +{ + unsigned long *bit_set_ptr = (unsigned long *)&bit_set; + int num_bits = sizeof(bit_set) * BITS_PER_BYTE; + int i; + + for_each_set_bit(i, bit_set_ptr, num_bits) { + const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset); + + mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name); + } +} +#endif /* CONFIG_HWMON */ + /* type == MLX5_EVENT_TYPE_TEMP_WARN_EVENT */ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data) { struct mlx5_event_nb *event_nb = mlx5_nb_cof(nb, struct mlx5_event_nb, nb); struct mlx5_events *events = event_nb->ctx; + struct mlx5_core_dev *dev = events->dev; struct mlx5_eqe *eqe = data; u64 value_lsb; u64 value_msb; @@ -169,10 +187,17 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data) value_lsb &= 0x1; value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb); - if (net_ratelimit()) - mlx5_core_warn(events->dev, - "High temperature on sensors with bit set %#llx %#llx", + if (net_ratelimit()) { + mlx5_core_warn(dev, "High temperature on sensors with bit set %#llx %#llx.\n", value_msb, value_lsb); +#if IS_ENABLED(CONFIG_HWMON) + if (dev->hwmon) { + print_sensor_names_in_bit_set(dev, dev->hwmon, value_lsb, 0); + print_sensor_names_in_bit_set(dev, dev->hwmon, value_msb, + sizeof(value_lsb) * BITS_PER_BYTE); + } +#endif + } return NOTIFY_OK; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c index 353f81dccd1c..4ba2636d7fb6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c @@ -416,3 +416,8 @@ void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev) mlx5_hwmon_free(hwmon); mdev->hwmon = NULL; } + +const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel) +{ + return hwmon->temp_channel_desc[channel].sensor_name; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h index 999654a9b9da..f38271c22c10 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h @@ -10,6 +10,7 @@ int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev); void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev); +const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel); #else static inline int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev)