diff mbox series

[05/25] lnet: print device status in net show command

Message ID 1627933851-7603-6-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series Sync to OpenSFS tree as of Aug 2, 2021 | expand

Commit Message

James Simmons Aug. 2, 2021, 7:50 p.m. UTC
From: Cyril Bordage <cbordage@whamcloud.com>

A device can be in fatal state, if the cable was disconnected, or the
port brought down on the switch side. In these cases, the LND (o2iblnd
for now), will flag the device in fatal state. That device will not be
used any further. However, it's health will not be decremented. This
causes some confusion when examining the state of the node.
It is better to print the device status in the output of the lnetctl
net show command.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14114
Lustre-commit: f75ff33d9fbefd69 ("LU-14114 lnet: print device status in net show command")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44169
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/lnet-dlc.h | 1 +
 net/lnet/lnet/api-ni.c             | 2 ++
 2 files changed, 3 insertions(+)
diff mbox series

Patch

diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index c1c063f..ef60224 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -190,6 +190,7 @@  struct lnet_ioctl_local_ni_hstats {
 	__u32 hlni_local_no_route;
 	__u32 hlni_local_timeout;
 	__u32 hlni_local_error;
+	__s32 hlni_fatal_error;
 	__s32 hlni_health_value;
 	__u32 hlni_ping_count;
 	__u64 hlni_next_ping;
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index ec28139..4513d8d 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3692,6 +3692,8 @@  u32 lnet_get_dlc_seq_locked(void)
 		atomic_read(&ni->ni_hstats.hlt_local_timeout);
 	stats->hlni_local_error =
 		atomic_read(&ni->ni_hstats.hlt_local_error);
+	stats->hlni_fatal_error =
+		atomic_read(&ni->ni_fatal_error_on);
 	stats->hlni_health_value =
 		atomic_read(&ni->ni_healthv);
 	stats->hlni_ping_count = ni->ni_ping_count;