Message ID | DM4PR84MB1373DCD07EABD28D88129C50FDBF9@DM4PR84MB1373.NAMPRD84.PROD.OUTLOOK.COM (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [for-next] scsi: Implement host state statistics | expand |
On Wed, Mar 15, 2023 at 06:08:19AM +0000, Seymour, Shane M wrote: > The following patch implements host state statistics via sysfs. The intent > is to allow user space to see the state changes and be able to report when > a host changes state. The files do not separate out the time spent into > each state but only into three: Why does userspace care about these things at all? What tool needs them and what can userspace do with the information? > > $ ll /sys/class/scsi_host/host0/stats > total 0 > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_first_change_time > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_last_change_time > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_other_count > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_other_ns > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_recovery_count > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_recovery_ns > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_running_count > -r--r--r--. 1 root root 4096 Mar 13 22:43 state_running_ns > > They are running, recovery and other. The running state is SHOST_CREATED > and SHOST_RUNNING. The recovery state is SHOST_RECOVERY, > SHOST_CANCEL_RECOVERY, and SHOST_DEL_RECOVERY. Any other state gets > accounted for in other. > > The current state is not accounted for in the information read. Because > of that you must read: > > 1. The last_change_time for that host > 2. the current state of a host and the uptime > 3. each of the above *count and *ns files > 4. Re-read the last_change_time > 5. Compare the two last_change_time values read and if different try again. > 6. The total time read from the *ns files is subtracted from the uptime and > that time is then allocated to the current state time. > > The first change time is to determine when the host was created so programs > can determine if it was newly created or not. > > A (GPLv2) program called hostmond will be released in a few months that > will monitor these interfaces and report (local host only via syslog(3C)) > when hosts change state. We kind of need to see this before the kernel changes can be accepted for obvious reasons, what is preventing that from happening now? > +static ssize_t state_first_change_time_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct Scsi_Host *shost = class_to_shost(dev); > + > + return scnprintf(buf, PAGE_SIZE, "%lld", > + ktime_to_ns(shost->stats->state_first_change_time)); Please always use sysfs_emit() instead of the crazy scnprintf() for sysfs entries. > +struct scsi_host_stats { > + ktime_t state_running_ns; > + ktime_t state_recovery_ns; > + ktime_t state_other_ns; > + ktime_t state_first_change_time; > + ktime_t state_last_change_time; > + uint32_t state_running_count; > + uint32_t state_recovery_count; > + uint32_t state_other_count; u32 is a kernel type, not uint32_t please, but I don't know what the scsi layer is used to. thanks, greg k-h
On Wed, 2023-03-15 at 07:36 +0100, Greg KH wrote: > On Wed, Mar 15, 2023 at 06:08:19AM +0000, Seymour, Shane M wrote: > > The following patch implements host state statistics via sysfs. The > > intent is to allow user space to see the state changes and be able > > to report when a host changes state. The files do not separate out > > the time spent into each state but only into three: > > Why does userspace care about these things at all? This is the most important question: Why are times spent in various states and transition counts important? Is this some kind of predictive failure system, or is it simply logging? If it's logging, wouldn't you get better information if we output state changes as they occur then they'd appear as timestamped entries in the syslog from which all these statistics could be deduced? > What tool needs them and what can userspace do with the > information? > > [...] > > A (GPLv2) program called hostmond will be released in a few months > > that will monitor these interfaces and report (local host only via > > syslog(3C)) when hosts change state. > > We kind of need to see this before the kernel changes can be accepted > for obvious reasons, what is preventing that from happening now? I don't think that's a requirement. The whole point of sysfs is it's user readable, so we don't need a tool to make use of its entries. On the other hand if this tool can help elucidate the use case for these statistics, then publishing it now would be useful to help everyone else understand why this is useful. James
> On Wed, Mar 15, 2023 at 06:08:19AM +0000, Seymour, Shane M wrote: > > The following patch implements host state statistics via sysfs. The intent > > is to allow user space to see the state changes and be able to report when > > a host changes state. The files do not separate out the time spent into > > each state but only into three: > > Why does userspace care about these things at all? What tool needs them > and what can userspace do with the information? > In enterprise setups you may a significant number of LUNs presented to a system (100s to 1000s) via a single HBA (usually via FC). Having a HBA going into error handling causes issues. Every time a host goes into SCSI EH all I/O to that host is blocked until SCSI EH completes. That means waiting for every I/O to either complete or timeout before starting any recovery processing. At this time there is no way for anything outside of the kernel to know if a HBA is having any issues. The cause of those issues can vary significantly, just two examples: 1) Storage end point issues 2) SAN issues (e.g. laser transmit power at any point in the SAN) My experience with downstream distros is that nobody seems to notice the noise that SCSI EH produces (LUN, device, bus, host resets) and we see it when we get a vmcore and have to try and work out what caused an I/O hang. I wanted to be more proactive in warning users that you've got a potential storage issue you need to look at. It won't help when you have a sudden massive issue but if you have an issue that is slowly getting worse over a period of time you will at least get some warning. > > > > A (GPLv2) program called hostmond will be released in a few months that > > will monitor these interfaces and report (local host only via syslog(3C)) > > when hosts change state. > > We kind of need to see this before the kernel changes can be accepted > for obvious reasons, what is preventing that from happening now? If you don't mind I'll answer this in my reply to James' email soon since he commented about this. > > Please always use sysfs_emit() instead of the crazy scnprintf() for > sysfs entries. No problem I can make that change. > > u32 is a kernel type, not uint32_t please, but I don't know what the > scsi layer is used to. No problem I can make that change. > > thanks, > > greg k-h Thank you for your willingness to provide feedback. Shane
> On Wed, 2023-03-15 at 07:36 +0100, Greg KH wrote: > > On Wed, Mar 15, 2023 at 06:08:19AM +0000, Seymour, Shane M wrote: > > > The following patch implements host state statistics via sysfs. The > > > intent is to allow user space to see the state changes and be able > > > to report when a host changes state. The files do not separate out > > > the time spent into each state but only into three: > > > > Why does userspace care about these things at all? > > This is the most important question: Why are times spent in various > states and transition counts important? Is this some kind of > predictive failure system, or is it simply logging? If it's logging, > wouldn't you get better information if we output state changes as they > occur then they'd appear as timestamped entries in the syslog from > which all these statistics could be deduced? Hi James, I had to write something to read the statistics to ensure that what was being provided was sane and usable. Currently the program does: 1) Logging of state changes (with a count and what the current state is). 2) Logging a percentage of time spent in recovery over the last interval (default 10 minutes) if that percentage is increasing. I do plan on implementing the following in the near future: 1) Keeping statistical information in memory (for at least): a) Hourly for the last 96 hours b) Daily for the last 90 days 2) Analysing that data hourly and daily to determine if there is a trend that is increasing or decreasing in terms of the count and the time spent (if any) in recovery. That is are things getting better, worse, or staying the same. My end goal is to provide at least some warning that there may be a storage issue and if it appears to be getting worse. I do want the user space program to be something more than just something that logs messages about state changes. In regard to your idea about outputting state changes it's interesting but I can see several drawbacks. The first is if you use syslog you don't really have any idea where the messages will end up. Different distros have different destinations (e.g. messages vs syslog vs systemd journal) and you can configure the syslog daemon so that the messages always end up on a different system. There will be issues handling those files as well. You need to cope with log file rotation, how many copies of old messages/syslog files are kept when rotated, if they are compressed or not (and reading them when they are), are any missing, how far to go back if there are a lot of old messages/syslog files. I think you would need to look at them all to determine what files were relevant and needed to be processed. Having said that none of those issues are insurmountable but it makes it hard to do the analysis I want to implement on the data. The variability of the quantity of available data (how many messages/syslog files you have) over a period of time provides challenges. > > > What tool needs them and what can userspace do with the > > information? > > > > [...] > > > A (GPLv2) program called hostmond will be released in a few months > > > that will monitor these interfaces and report (local host only via > > > syslog(3C)) when hosts change state. > > > > We kind of need to see this before the kernel changes can be accepted > > for obvious reasons, what is preventing that from happening now? > > I don't think that's a requirement. The whole point of sysfs is it's > user readable, so we don't need a tool to make use of its entries. On > the other hand if this tool can help elucidate the use case for these > statistics, then publishing it now would be useful to help everyone > else understand why this is useful. The main use of the existing code would be making it easier to work out how to read the statistics from the sysfs files at the moment. If the feedback is wait until I've fully implemented the user space program with the analysis component and made it available I'm more than happy to do that. Thanks Shane > > James
On 3/15/23 23:41, Seymour, Shane M wrote: >> On Wed, Mar 15, 2023 at 06:08:19AM +0000, Seymour, Shane M wrote: >>> The following patch implements host state statistics via sysfs. The intent >>> is to allow user space to see the state changes and be able to report when >>> a host changes state. The files do not separate out the time spent into >>> each state but only into three: >> >> Why does userspace care about these things at all? What tool needs them >> and what can userspace do with the information? >> > > In enterprise setups you may a significant number of LUNs presented to a > system (100s to 1000s) via a single HBA (usually via FC). Having a HBA going > into error handling causes issues. Every time a host goes into SCSI EH all > I/O to that host is blocked until SCSI EH completes. That means waiting for > every I/O to either complete or timeout before starting any recovery > processing. > > At this time there is no way for anything outside of the kernel to know if a > HBA is having any issues. The cause of those issues can vary significantly, > just two examples: > > 1) Storage end point issues > 2) SAN issues (e.g. laser transmit power at any point in the SAN) > > My experience with downstream distros is that nobody seems to notice the > noise that SCSI EH produces (LUN, device, bus, host resets) and we see it > when we get a vmcore and have to try and work out what caused an I/O hang. I hear you. Especially, the fact that the very desirable asynchronous aborts and even eh with escalations seems pretty much silent as long as a SCSI command succeeds within one of the allowed retries. I suspect this was done in order not to unsettle users by showing intermediate recovery, which can still lead to successful I/O eventually. FWIW, at some point we figured out a nice scsi_logging_level of 4605, in order to see any problems with lun probing ("why don't I get my volume in Linux?") or timeouts/aborts/eh ("why are things so slow?") without producing kernel messages for regular good I/O. Of course, it's not set by default, but can be dynamically set if one suspects such problems. > I wanted to be more proactive in warning users that you've got a potential > storage issue you need to look at. It won't help when you have a sudden > massive issue but if you have an issue that is slowly getting worse over > a period of time you will at least get some warning. > >>> >>> A (GPLv2) program called hostmond will be released in a few months that >>> will monitor these interfaces and report (local host only via syslog(3C)) >>> when hosts change state. >> >> We kind of need to see this before the kernel changes can be accepted >> for obvious reasons, what is preventing that from happening now? > > If you don't mind I'll answer this in my reply to James' email soon since > he commented about this. > >> >> Please always use sysfs_emit() instead of the crazy scnprintf() for >> sysfs entries. > > No problem I can make that change. > >> >> u32 is a kernel type, not uint32_t please, but I don't know what the >> scsi layer is used to. > > No problem I can make that change. > >> >> thanks, >> >> greg k-h > > Thank you for your willingness to provide feedback. > > Shane
diff --git a/Documentation/ABI/testing/sysfs-class-scsi_host b/Documentation/ABI/testing/sysfs-class-scsi_host index 7c98d8f43c45..2ce266df812f 100644 --- a/Documentation/ABI/testing/sysfs-class-scsi_host +++ b/Documentation/ABI/testing/sysfs-class-scsi_host @@ -117,3 +117,75 @@ KernelVersion: v2.6.39 Contact: linux-ide@vger.kernel.org Description: (RO) Displays supported enclosure management message types. + +What: /sys/class/scsi_host/hostX/stats/state_first_change_time +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is the time since boot where this SCSI host saw it's + initial state change. For most SCSI hosts this should be + relatively close to the time the system booted up (if the + Low Level Driver (LLD) was loaded at boot). + + Looking at this file will help you determine when the host + was created. The time value read from this file is the + number of nanoseconds since boot. + +What: /sys/class/scsi_host/hostX/stats/state_last_change_time +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is the last time that the state of the SCSI host + changed. This can be used in two ways. You can read it + twice - once before reading any statistics and once after. + + If the value has changed between reads you might consider + re-reading the statistics files again as the data will + have changed. + +What: /sys/class/scsi_host/hostX/stats/state_other_count +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is a simple count of the number of times this SCSI + host has transitioned out of this state. + +What: /sys/class/scsi_host/hostX/stats/state_other_ns +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is the amount of time spent in a state of other. + +What: /sys/class/scsi_host/hostX/stats/state_recovery_count +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is a simple count of the number of times this SCSI + host has transitioned out of this state. + +What: /sys/class/scsi_host/hostX/stats/state_recovery_ns +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is the amount of time spent in a state of recovery. + +What: /sys/class/scsi_host/hostX/stats/state_running_count +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is a simple count of the number of times this SCSI + host has transitioned out of this state. + +What: /sys/class/scsi_host/hostX/stats/state_running_ns +Date: March 2023 +Kernel Version: 6.4 +Contact: shane.seymour@hpe.com +Description: + (RO) This is the amount of time spent in a state of running. diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index f7f62e56afca..1a010488b52b 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -64,6 +64,49 @@ static struct class shost_class = { .dev_groups = scsi_shost_groups, }; +/** + * + * scsi_host_do_state_stats - Maintains statistics for presentation + * via sysfs of host state changes per host. + * @shost - scsi host to track state change stats for. + * @oldstate - the previous state that the host was in. + * + **/ +static void scsi_host_do_state_stats(struct Scsi_Host *shost, + enum scsi_host_state oldstate) +{ + ktime_t now, last; + + now = ktime_get(); + last = shost->stats->state_last_change_time; + + if (last == 0) { + shost->stats->state_first_change_time = now; + shost->stats->state_last_change_time = now; + last = now; + } + + switch (oldstate) { + + case SHOST_CREATED: + case SHOST_RUNNING: + shost->stats->state_running_ns += ktime_sub(now, last); + shost->stats->state_running_count++; + break; + case SHOST_RECOVERY: + case SHOST_CANCEL_RECOVERY: + case SHOST_DEL_RECOVERY: + shost->stats->state_recovery_ns += ktime_sub(now, last); + shost->stats->state_recovery_count++; + break; + default: + shost->stats->state_other_ns += ktime_sub(now, last); + shost->stats->state_other_count++; + break; + } + shost->stats->state_last_change_time = now; +} + /** * scsi_host_set_state - Take the given host through the host state model. * @shost: scsi host to change the state of. @@ -146,6 +189,7 @@ int scsi_host_set_state(struct Scsi_Host *shost, enum scsi_host_state state) break; } shost->shost_state = state; + scsi_host_do_state_stats(shost, oldstate); return 0; illegal: @@ -369,6 +413,8 @@ static void scsi_host_dev_release(struct device *dev) ida_free(&host_index_ida, shost->host_no); + kfree(shost->stats); + if (shost->shost_state != SHOST_CREATED) put_device(parent); kfree(shost); @@ -401,6 +447,12 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) if (!shost) return NULL; + shost->stats = kzalloc(sizeof(struct scsi_host_stats), GFP_KERNEL); + if (!shost->stats) { + kfree(shost); + return NULL; + } + shost->host_lock = &shost->default_lock; spin_lock_init(shost->host_lock); shost->shost_state = SHOST_CREATED; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index ee28f73af4d4..00d41c8820aa 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -428,8 +428,110 @@ static const struct attribute_group scsi_shost_attr_group = { .attrs = scsi_sysfs_shost_attrs, }; +/* + * sysfs functions for shost state statistics. + */ + +static ssize_t state_first_change_time_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%lld", + ktime_to_ns(shost->stats->state_first_change_time)); +} +static DEVICE_ATTR_RO(state_first_change_time); + +static ssize_t state_last_change_time_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%lld", + ktime_to_ns(shost->stats->state_last_change_time)); +} +static DEVICE_ATTR_RO(state_last_change_time); + +static ssize_t state_running_ns_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%lld", + ktime_to_ns(shost->stats->state_running_ns)); +} +static DEVICE_ATTR_RO(state_running_ns); + +static ssize_t state_recovery_ns_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%lld", + ktime_to_ns(shost->stats->state_recovery_ns)); +} +static DEVICE_ATTR_RO(state_recovery_ns); + +static ssize_t state_other_ns_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%lld", + ktime_to_ns(shost->stats->state_other_ns)); +} +static DEVICE_ATTR_RO(state_other_ns); + +static ssize_t state_running_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%d", + shost->stats->state_running_count); +} +static DEVICE_ATTR_RO(state_running_count); + +static ssize_t state_recovery_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%d", + shost->stats->state_recovery_count); +} +static DEVICE_ATTR_RO(state_recovery_count); + +static ssize_t state_other_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct Scsi_Host *shost = class_to_shost(dev); + + return scnprintf(buf, PAGE_SIZE, "%d", + shost->stats->state_other_count); +} +static DEVICE_ATTR_RO(state_other_count); + +static struct attribute *scsi_shost_stats[] = { + &dev_attr_state_last_change_time.attr, + &dev_attr_state_first_change_time.attr, + &dev_attr_state_running_ns.attr, + &dev_attr_state_recovery_ns.attr, + &dev_attr_state_other_ns.attr, + &dev_attr_state_running_count.attr, + &dev_attr_state_recovery_count.attr, + &dev_attr_state_other_count.attr, + NULL +}; + +static struct attribute_group scsi_shost_stats_group = { + .name = "stats", + .attrs = scsi_shost_stats, +}; + const struct attribute_group *scsi_shost_groups[] = { &scsi_shost_attr_group, + &scsi_shost_stats_group, NULL }; diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 587cc767bb67..e6229567a295 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -513,11 +513,26 @@ struct scsi_host_template { return rc; \ } +/* + * host statistics + */ + +struct scsi_host_stats { + ktime_t state_running_ns; + ktime_t state_recovery_ns; + ktime_t state_other_ns; + ktime_t state_first_change_time; + ktime_t state_last_change_time; + uint32_t state_running_count; + uint32_t state_recovery_count; + uint32_t state_other_count; +}; /* * shost state: If you alter this, you also need to alter scsi_sysfs.c * (for the ascii descriptions) and the state model enforcer: - * scsi_host_set_state() + * scsi_host_set_state() and if host state statistics are accounted + * for correctly in scsi_host_do_state_stats() */ enum scsi_host_state { SHOST_CREATED = 1, @@ -704,6 +719,11 @@ struct Scsi_Host { */ struct device *dma_dev; + /* + * Host statistics + */ + struct scsi_host_stats *stats; + /* * We should ensure that this is aligned, both for better performance * and also because some compilers (m68k) don't automatically force
The following patch implements host state statistics via sysfs. The intent is to allow user space to see the state changes and be able to report when a host changes state. The files do not separate out the time spent into each state but only into three: $ ll /sys/class/scsi_host/host0/stats total 0 -r--r--r--. 1 root root 4096 Mar 13 22:43 state_first_change_time -r--r--r--. 1 root root 4096 Mar 13 22:43 state_last_change_time -r--r--r--. 1 root root 4096 Mar 13 22:43 state_other_count -r--r--r--. 1 root root 4096 Mar 13 22:43 state_other_ns -r--r--r--. 1 root root 4096 Mar 13 22:43 state_recovery_count -r--r--r--. 1 root root 4096 Mar 13 22:43 state_recovery_ns -r--r--r--. 1 root root 4096 Mar 13 22:43 state_running_count -r--r--r--. 1 root root 4096 Mar 13 22:43 state_running_ns They are running, recovery and other. The running state is SHOST_CREATED and SHOST_RUNNING. The recovery state is SHOST_RECOVERY, SHOST_CANCEL_RECOVERY, and SHOST_DEL_RECOVERY. Any other state gets accounted for in other. The current state is not accounted for in the information read. Because of that you must read: 1. The last_change_time for that host 2. the current state of a host and the uptime 3. each of the above *count and *ns files 4. Re-read the last_change_time 5. Compare the two last_change_time values read and if different try again. 6. The total time read from the *ns files is subtracted from the uptime and that time is then allocated to the current state time. The first change time is to determine when the host was created so programs can determine if it was newly created or not. A (GPLv2) program called hostmond will be released in a few months that will monitor these interfaces and report (local host only via syslog(3C)) when hosts change state. Signed-off-by: Shane Seymour <shane.seymour@hpe.com> ---