Message ID | 20240814122900.13525-3-mariusz.tkaczyk@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCIe Enclosure LED Management | expand |
On Wed, Aug 14, 2024 at 02:28:59PM +0200, Mariusz Tkaczyk wrote: > Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows > managing LED in storage enclosures. NPEM is indication oriented > and it does not give direct access to LED. Although each of > the indications *could* represent an individual LED, multiple > indications could also be represented as a single, > multi-color LED or a single LED blinking in a specific interval. > The specification leaves that open. > ... > Driver is projected to be exclusive NPEM extended capability manager. > It waits up to 1 second after imposing new request, it doesn't verify if > controller is busy before write, assuming that mutex lock gives protection > from concurrent updates. > Driver is not registered if _DSM LED management > is available. IMO we should drop this sentence (more details below). > NPEM is a PCIe extended capability so it should be registered in > pcie_init_capabilities() but it is not possible due to LED dependency. > Parent pci_device must be added earlier for led_classdev_register() > to be successful. NPEM does not require configuration on kernel side, it > is safe to register LED devices later. > > Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1] I can update this myself, no need to repost just for this, but I think these links are pointless because they're useless except for PCI-SIG members, and I don't want to rely them being permalinks anyway. A reference like "PCIe r6.1" is universally and permanently meaningful. > +struct npem { > + struct pci_dev *dev; > + const struct npem_ops *ops; > + struct mutex lock; > + u16 pos; > + u32 supported_indications; > + u32 active_indications; > + > + /* > + * Use lazy loading for active_indications to not play with initcalls. > + * It is needed to allow _DSM initialization on DELL platforms, where > + * ACPI_IPMI must be loaded first. > + */ > + unsigned int active_inds_initialized:1; What's going on here? I hope we can at least move this to the _DSM patch since it seems related to that, not to the NPEM capability. I don't understand the initcall reference or what "lazy loading" means. Is there some existing ACPI ordering that guarantees ACPI_IPMI happens first? Why do we need some Dell-specific thing here? What is ACPI_IPMI? I guess it refers to the "acpi_ipmi" module, acpi_ipmi.c? > +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b, 0x8c, 0xb7, 0x74, 0x7e,\ > + 0xd5, 0x1e, 0x19, 0x4d) > +#define GET_SUPPORTED_STATES_DSM 1 > +#define GET_STATE_DSM 2 > +#define SET_STATE_DSM 3 > + > +static const guid_t dsm_guid = DSM_GUID; > + > +static bool npem_has_dsm(struct pci_dev *pdev) > +{ > + acpi_handle handle; > + > + handle = ACPI_HANDLE(&pdev->dev); > + if (!handle) > + return false; > + > + return acpi_check_dsm(handle, &dsm_guid, 0x1, > + BIT(GET_SUPPORTED_STATES_DSM) | > + BIT(GET_STATE_DSM) | BIT(SET_STATE_DSM)); > +} > +void pci_npem_create(struct pci_dev *dev) > +{ > + const struct npem_ops *ops = &npem_ops; > + int pos = 0, ret; > + u32 cap; > + > + if (!npem_has_dsm(dev)) { > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); > + if (pos == 0) > + return; > + > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > + return; > + } else { > + /* > + * OS should use the DSM for LED control if it is available > + * PCI Firmware Spec r3.3 sec 4.7. > + */ > + return; > + } I know this is sort of a transient state since the next patch adds full _DSM support, but I do think (a) the fact that NPEM will stop working simply because firmware adds _DSM support is unexpected behavior, and (b) npem_has_dsm() and the other ACPI-related stuff would fit better in the next patch. It's a little strange to have them mixed here. > +++ b/include/uapi/linux/pci_regs.h > ... > +#define PCI_NPEM_CAP 0x04 /* NPEM capability register */ > +#define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ > + > +#define PCI_NPEM_CTRL 0x08 /* NPEM control register */ > +#define PCI_NPEM_CTRL_ENABLE 0x00000001 /* NPEM Enable */ Spaces instead of tabs after #define, as you did below (mostly), would make the diff prettier. > +#define PCI_NPEM_CMD_RESET 0x00000002 /* NPEM Reset Command */ > +#define PCI_NPEM_IND_OK 0x00000004 /* NPEM indication OK */ > +#define PCI_NPEM_IND_LOCATE 0x00000008 /* NPEM indication Locate */ > ... > +#define PCI_NPEM_STATUS 0x0c /* NPEM status register */ > +#define PCI_NPEM_STATUS_CC 0x00000001 /* NPEM Command completed */ Ditto. Bjorn
On Wed, Aug 14, 2024 at 04:49:30PM -0500, Bjorn Helgaas wrote: > On Wed, Aug 14, 2024 at 02:28:59PM +0200, Mariusz Tkaczyk wrote: > > + /* > > + * Use lazy loading for active_indications to not play with initcalls. > > + * It is needed to allow _DSM initialization on DELL platforms, where > > + * ACPI_IPMI must be loaded first. > > + */ > > + unsigned int active_inds_initialized:1; > > What's going on here? I hope we can at least move this to the _DSM > patch since it seems related to that, not to the NPEM capability. I > don't understand the initcall reference or what "lazy loading" means. In previous iterations of this series, the status of all LEDs was read on PCI device enumeration. That was done so that when user space reads the brightness is sysfs, it gets the correct value. The value is cached, it's not re-read from the register on every brightness read. (It's not guaranteed that all LEDs are off on enumeration. E.g. boot firmware may have fiddled with them, or the enclosure itself may have turned some of them on by itself, typically the "ok" LED.) However Stuart reported issues when the _DSM interface is used on Dell servers, because the _DSM requires IPMI drivers to access the NPEM registers. He got a ton of errors when LED status was read on enumeration because that was simply too early. Start of thread: https://lore.kernel.org/all/05455f36-7027-4fd6-8af7-4fe8e483f25c@gmail.com/ The solution is to read LED status lazily, when brightness is read or written for the first time through sysfs. At that point, IPMI drivers are typically loaded. Stuart reported success with this approach. There is still a possibility that users may see issues if they access brightness before IPMI drivers are loaded. Those drivers may be modules and user space might overzealously try to access brightness before they're loaded. Or user space may prevent them from loading by blacklisting or not installing them. In which case users get to keep the pieces. We discussed various alternative approaches in the above-linked thread but concluded that this pragmatic solution is the simplest that does the job for all but the most pathological use cases. We wanted to make this work on Dell servers, but at the same time minimize the contortions that we need to go through to accommodate their quirky implementation. The code uses lazy initialization of LED status even in the native NPEM case because it would make the code more complex to use early initialization for direct NPEM register access and lazy initialization for _DSM-mediated register access. > Is there some existing ACPI ordering that guarantees ACPI_IPMI happens > first? Why do we need some Dell-specific thing here? > > What is ACPI_IPMI? I guess it refers to the "acpi_ipmi" module, > acpi_ipmi.c? As it turned out in the above-linked thread, just forcing ACPI_IPMI=y for NPEM is not sufficient because additional (Dell-specific) IPMI drivers need to be loaded as well for NPEM register access to work through _DSM. > > +void pci_npem_create(struct pci_dev *dev) > > +{ > > + const struct npem_ops *ops = &npem_ops; > > + int pos = 0, ret; > > + u32 cap; > > + > > + if (!npem_has_dsm(dev)) { > > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); > > + if (pos == 0) > > + return; > > + > > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || > > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > > + return; > > + } else { > > + /* > > + * OS should use the DSM for LED control if it is available > > + * PCI Firmware Spec r3.3 sec 4.7. > > + */ > > + return; > > + } > > I know this is sort of a transient state since the next patch adds > full _DSM support, but I do think (a) the fact that NPEM will stop > working simply because firmware adds _DSM support is unexpected > behavior, and (b) npem_has_dsm() and the other ACPI-related stuff > would fit better in the next patch. It's a little strange to have > them mixed here. PCI Firmware Spec r3.3 sec 4.7 says: "OSPM should use this _DSM when available. If this _DSM is not available, OSPM should use Native PCIe Enclosure Management (NPEM) or SCSI Enclosure Services (SES) instead, if available." I realize that a "should" is not a "must", so Linux would in principle be allowed to use direct register access despite presence of the _DSM. However that doesn't feel safe. If the _DSM is present, I think it's fair to assume that the platform firmware wants to control at least a portion of the LEDs itself. Accessing those LEDs directly, behind the platform firmware's back, may cause issues. Not exposing the LEDs to the user in the _DSM case therefore seems safer. Which is why the ACPI stuff to query for _DSM presence is already in this patch instead of the succeeding one. Thanks, Lukas
On Thu, Aug 15, 2024 at 07:45:09AM +0200, Lukas Wunner wrote: > On Wed, Aug 14, 2024 at 04:49:30PM -0500, Bjorn Helgaas wrote: > > On Wed, Aug 14, 2024 at 02:28:59PM +0200, Mariusz Tkaczyk wrote: > > > + /* > > > + * Use lazy loading for active_indications to not play with initcalls. > > > + * It is needed to allow _DSM initialization on DELL platforms, where > > > + * ACPI_IPMI must be loaded first. > > > + */ > > > + unsigned int active_inds_initialized:1; > > > > What's going on here? I hope we can at least move this to the _DSM > > patch since it seems related to that, not to the NPEM capability. I > > don't understand the initcall reference or what "lazy loading" means. > > In previous iterations of this series, the status of all LEDs was > read on PCI device enumeration. That was done so that when user space > reads the brightness is sysfs, it gets the correct value. The value > is cached, it's not re-read from the register on every brightness read. > > (It's not guaranteed that all LEDs are off on enumeration. E.g. boot > firmware may have fiddled with them, or the enclosure itself may have > turned some of them on by itself, typically the "ok" LED.) > > However Stuart reported issues when the _DSM interface is used on > Dell servers, because the _DSM requires IPMI drivers to access the > NPEM registers. He got a ton of errors when LED status was read on > enumeration because that was simply too early. The dependency of _DSM on IPMI sounds like a purely ACPI problem. Is there no mechanism in ACPI to express that dependency? If _DSM claims the function is supported before the IPMI driver is ready, that sounds like a BIOS defect to me. If we're stuck with this, maybe the comment can be reworded. "Lazy loading" in a paragraph that also mentions initcalls and the "ACPI_IPMI" module makes it sound like we're talking about loading the *module* lazily, not just (IIUC) reading the LED status lazily. Maybe it could also explicitly say that the GET_STATE_DSM function depends on IPMI. I'm unhappy that we're getting our arm twisted here. If functionality depends on IPMI, there really needs to be a way for OSPM to manage that dependency. If we're working around a firmware defect, we need to be clear about that. > > > +void pci_npem_create(struct pci_dev *dev) > > > +{ > > > + const struct npem_ops *ops = &npem_ops; > > > + int pos = 0, ret; > > > + u32 cap; > > > + > > > + if (!npem_has_dsm(dev)) { > > > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); > > > + if (pos == 0) > > > + return; > > > + > > > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || > > > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > > > + return; > > > + } else { > > > + /* > > > + * OS should use the DSM for LED control if it is available > > > + * PCI Firmware Spec r3.3 sec 4.7. > > > + */ > > > + return; > > > + } > > > > I know this is sort of a transient state since the next patch adds > > full _DSM support, but I do think (a) the fact that NPEM will stop > > working simply because firmware adds _DSM support is unexpected > > behavior, and (b) npem_has_dsm() and the other ACPI-related stuff > > would fit better in the next patch. It's a little strange to have > > them mixed here. > > PCI Firmware Spec r3.3 sec 4.7 says: > > "OSPM should use this _DSM when available. If this _DSM is not > available, OSPM should use Native PCIe Enclosure Management (NPEM) > or SCSI Enclosure Services (SES) instead, if available." > > I realize that a "should" is not a "must", so Linux would in principle > be allowed to use direct register access despite presence of the _DSM. > > However that doesn't feel safe. If the _DSM is present, I think it's > fair to assume that the platform firmware wants to control at least > a portion of the LEDs itself. Accessing those LEDs directly, behind the > platform firmware's back, may cause issues. Not exposing the LEDs > to the user in the _DSM case therefore seems safer. > > Which is why the ACPI stuff to query for _DSM presence is already in > this patch instead of the succeeding one. The spec is regrettably vague about this, but that assumption isn't unreasonable. It does deserve a more explicit callout in the commit log and probably a dmesg note about why NPEM used to work but no longer does. Bjorn
On Thu, 15 Aug 2024 12:42:48 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > On Thu, Aug 15, 2024 at 07:45:09AM +0200, Lukas Wunner wrote: > > On Wed, Aug 14, 2024 at 04:49:30PM -0500, Bjorn Helgaas wrote: > > > On Wed, Aug 14, 2024 at 02:28:59PM +0200, Mariusz Tkaczyk wrote: > > > > + /* > > > > + * Use lazy loading for active_indications to not play with > > > > initcalls. > > > > + * It is needed to allow _DSM initialization on DELL > > > > platforms, where > > > > + * ACPI_IPMI must be loaded first. > > > > + */ > > > > + unsigned int active_inds_initialized:1; > > > > > > What's going on here? I hope we can at least move this to the _DSM > > > patch since it seems related to that, not to the NPEM capability. I > > > don't understand the initcall reference or what "lazy loading" means. > > > > In previous iterations of this series, the status of all LEDs was > > read on PCI device enumeration. That was done so that when user space > > reads the brightness is sysfs, it gets the correct value. The value > > is cached, it's not re-read from the register on every brightness read. > > > > (It's not guaranteed that all LEDs are off on enumeration. E.g. boot > > firmware may have fiddled with them, or the enclosure itself may have > > turned some of them on by itself, typically the "ok" LED.) > > > > However Stuart reported issues when the _DSM interface is used on > > Dell servers, because the _DSM requires IPMI drivers to access the > > NPEM registers. He got a ton of errors when LED status was read on > > enumeration because that was simply too early. > > The dependency of _DSM on IPMI sounds like a purely ACPI problem. Is > there no mechanism in ACPI to express that dependency? > > If _DSM claims the function is supported before the IPMI driver is > ready, that sounds like a BIOS defect to me. > > If we're stuck with this, maybe the comment can be reworded. "Lazy > loading" in a paragraph that also mentions initcalls and the > "ACPI_IPMI" module makes it sound like we're talking about loading the > *module* lazily, not just (IIUC) reading the LED status lazily. > > Maybe it could also explicitly say that the GET_STATE_DSM function > depends on IPMI. > > I'm unhappy that we're getting our arm twisted here. If functionality > depends on IPMI, there really needs to be a way for OSPM to manage > that dependency. If we're working around a firmware defect, we need > to be clear about that. Hi, I will move active_inds_initialized:1 to DSM commit and I will add better justification. For NPEM commit, get_active_indications() will be called once in pci_npem_init() to avoid referring _DSM specific issues in NPEM commit. > > > > > +void pci_npem_create(struct pci_dev *dev) > > > > +{ > > > > + const struct npem_ops *ops = &npem_ops; > > > > + int pos = 0, ret; > > > > + u32 cap; > > > > + > > > > + if (!npem_has_dsm(dev)) { > > > > + pos = pci_find_ext_capability(dev, > > > > PCI_EXT_CAP_ID_NPEM); > > > > + if (pos == 0) > > > > + return; > > > > + > > > > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, > > > > &cap) != 0 || > > > > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > > > > + return; > > > > + } else { > > > > + /* > > > > + * OS should use the DSM for LED control if it is > > > > available > > > > + * PCI Firmware Spec r3.3 sec 4.7. > > > > + */ > > > > + return; > > > > + } > > > > > > I know this is sort of a transient state since the next patch adds > > > full _DSM support, but I do think (a) the fact that NPEM will stop > > > working simply because firmware adds _DSM support is unexpected > > > behavior, and (b) npem_has_dsm() and the other ACPI-related stuff > > > would fit better in the next patch. It's a little strange to have > > > them mixed here. > > > > PCI Firmware Spec r3.3 sec 4.7 says: > > > > "OSPM should use this _DSM when available. If this _DSM is not > > available, OSPM should use Native PCIe Enclosure Management (NPEM) > > or SCSI Enclosure Services (SES) instead, if available." > > > > I realize that a "should" is not a "must", so Linux would in principle > > be allowed to use direct register access despite presence of the _DSM. > > > > However that doesn't feel safe. If the _DSM is present, I think it's > > fair to assume that the platform firmware wants to control at least > > a portion of the LEDs itself. Accessing those LEDs directly, behind the > > platform firmware's back, may cause issues. Not exposing the LEDs > > to the user in the _DSM case therefore seems safer. > > > > Which is why the ACPI stuff to query for _DSM presence is already in > > this patch instead of the succeeding one. > > The spec is regrettably vague about this, but that assumption isn't > unreasonable. It does deserve a more explicit callout in the commit > log and probably a dmesg note about why NPEM used to work but no > longer does. > In fact, there is theoretical case that after firmware update DSM is no longer available and NPEM is chosen. Given that, I will log chosen backed instead of trying to predict a change. It is easier to implement it this way. User can compare working/not-working dmesg logs to see a difference so printing backend used is enough I think. Thanks, Mariusz
On Thu, Aug 15, 2024 at 12:42:48PM -0500, Bjorn Helgaas wrote: > The dependency of _DSM on IPMI sounds like a purely ACPI problem. Is > there no mechanism in ACPI to express that dependency? Unfortunately there doesn't seem to be one. :( > If _DSM claims the function is supported before the IPMI driver is > ready, that sounds like a BIOS defect to me. > > If we're stuck with this, maybe the comment can be reworded. "Lazy > loading" in a paragraph that also mentions initcalls and the > "ACPI_IPMI" module makes it sound like we're talking about loading the > *module* lazily, not just (IIUC) reading the LED status lazily. > > Maybe it could also explicitly say that the GET_STATE_DSM function > depends on IPMI. > > I'm unhappy that we're getting our arm twisted here. If functionality > depends on IPMI, there really needs to be a way for OSPM to manage > that dependency. If we're working around a firmware defect, we need > to be clear about that. AFAICS lazy initialization of active indications was architected such that it is retried on every LED access until it succeeds: npem->active_inds_initialized is only set to true once npem->ops->get_active_indications() returns successfully. I'm assuming that the DSM method fails as it should on inaccessibility of the IPMI OpRegion. So users may see errors in dmesg when they access LEDs if IPMI drivers have not been loaded yet, but once they're loaded, those errors will go away and LED access should start working flawlessly. This way of lazily initializing the cached active_indications bit mask doesn't cost us much as far as code complexity is concerned, but should make things work 99.999% of the time on quirky platforms. Thanks, Lukas
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ecf47559f495..a2768b24678e 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -500,3 +500,66 @@ Description: console drivers from the device. Raw users of pci-sysfs resourceN attributes must be terminated prior to resizing. Success of the resizing operation is not guaranteed. + +What: /sys/bus/pci/devices/.../leds/*:enclosure:*/brightness +What: /sys/class/leds/*:enclosure:*/brightness +Date: August 2024 +KernelVersion: 6.12 +Description: + LED indications on PCIe storage enclosures which are controlled + through the NPEM interface (Native PCIe Enclosure Management, + PCIe r6.1 sec 6.28) are accessible as LED class devices, both + below /sys/class/leds and below NPEM-capable PCI devices. + + Although these LED class devices could be manipulated manually, + in practice they are typically manipulated automatically by an + application such as ledmon(8). + + The name of a LED class device is as follows: + <bdf>:enclosure:<indication> + where: + + - <bdf> is the domain, bus, device and function number + (e.g. 10000:02:05.0) + - <indication> is a short description of the LED indication + + Valid indications per PCIe r6.1 table 6-27 are: + + - ok (drive is functioning normally) + - locate (drive is being identified by an admin) + - fail (drive is not functioning properly) + - rebuild (drive is part of an array that is rebuilding) + - pfa (drive is predicted to fail soon) + - hotspare (drive is marked to be used as a replacement) + - ica (drive is part of an array that is degraded) + - ifa (drive is part of an array that is failed) + - idt (drive is not the right type for the connector) + - disabled (drive is disabled, removal is safe) + - specific0 to specific7 (enclosure-specific indications) + + Broadly, the indications fall into one of these categories: + + - to signify drive state (ok, locate, fail, idt, disabled) + - to signify drive role or state in a software RAID array + (rebuild, pfa, hotspare, ica, ifa) + - to signify any other role or state (specific0 to specific7) + + Mandatory indications per PCIe r6.1 sec 7.9.19.2 comprise: + ok, locate, fail, rebuild. All others are optional. + An LED class device is only visible if the corresponding + indication is supported by the device. + + To manipulate the indications, write 0 (LED_OFF) or 1 (LED_ON) + to the "brightness" file. Note that manipulating an indication + may implicitly manipulate other indications at the vendor's + discretion. E.g. when the user lights up the "ok" indication, + the vendor may choose to automatically turn off the "fail" + indication. The current state of an indication can be + retrieved by reading its "brightness" file. + + The PCIe Base Specification allows vendors leeway to choose + different colors or blinking patterns for the indications, + but they typically follow the IBPI standard. E.g. the "locate" + indication is usually presented as one or two LEDs blinking at + 4 Hz frequency: + https://en.wikipedia.org/wiki/International_Blinking_Pattern_Interpretation diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index aa4d1833f442..94beb0dd996d 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -143,6 +143,15 @@ config PCI_IOV If unsure, say N. +config PCI_NPEM + bool "Native PCIe Enclosure Management" + depends on LEDS_CLASS=y + help + Support for Native PCIe Enclosure Management. It allows managing LED + indications in storage enclosures. Enclosure must support following + indications: OK, Locate, Fail, Rebuild. Other indications are + optional. + config PCI_PRI bool "PCI PRI support" select PCI_ATS diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 8ddad57934a6..374c5c06d92f 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -35,6 +35,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o obj-$(CONFIG_VGA_ARB) += vgaarb.o obj-$(CONFIG_PCI_DOE) += doe.o obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o +obj-$(CONFIG_PCI_NPEM) += npem.o # Endpoint library must be initialized before its users obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c new file mode 100644 index 000000000000..cd1c18774747 --- /dev/null +++ b/drivers/pci/npem.c @@ -0,0 +1,449 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PCIe Enclosure management driver created for LED interfaces based on + * indications. It says *what indications* blink but does not specify *how* + * they blink - it is hardware defined. + * + * The driver name refers to Native PCIe Enclosure Management. It is + * first indication oriented standard with specification. + * + * Native PCIe Enclosure Management (NPEM) + * PCIe Base Specification r6.1 sec 6.28 + * PCIe Base Specification r6.1 sec 7.9.19 + * + * Copyright (c) 2023-2024 Intel Corporation + * Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> + */ + +#include <linux/acpi.h> +#include <linux/bitops.h> +#include <linux/errno.h> +#include <linux/iopoll.h> +#include <linux/leds.h> +#include <linux/mutex.h> +#include <linux/pci.h> +#include <linux/pci_regs.h> +#include <linux/types.h> +#include <linux/uleds.h> + +#include "pci.h" + +struct indication { + u32 bit; + const char *name; +}; + +static const struct indication npem_indications[] = { + {PCI_NPEM_IND_OK, "enclosure:ok"}, + {PCI_NPEM_IND_LOCATE, "enclosure:locate"}, + {PCI_NPEM_IND_FAIL, "enclosure:fail"}, + {PCI_NPEM_IND_REBUILD, "enclosure:rebuild"}, + {PCI_NPEM_IND_PFA, "enclosure:pfa"}, + {PCI_NPEM_IND_HOTSPARE, "enclosure:hotspare"}, + {PCI_NPEM_IND_ICA, "enclosure:ica"}, + {PCI_NPEM_IND_IFA, "enclosure:ifa"}, + {PCI_NPEM_IND_IDT, "enclosure:idt"}, + {PCI_NPEM_IND_DISABLED, "enclosure:disabled"}, + {PCI_NPEM_IND_SPEC_0, "enclosure:specific_0"}, + {PCI_NPEM_IND_SPEC_1, "enclosure:specific_1"}, + {PCI_NPEM_IND_SPEC_2, "enclosure:specific_2"}, + {PCI_NPEM_IND_SPEC_3, "enclosure:specific_3"}, + {PCI_NPEM_IND_SPEC_4, "enclosure:specific_4"}, + {PCI_NPEM_IND_SPEC_5, "enclosure:specific_5"}, + {PCI_NPEM_IND_SPEC_6, "enclosure:specific_6"}, + {PCI_NPEM_IND_SPEC_7, "enclosure:specific_7"}, + {0, NULL} +}; + +#define for_each_indication(ind, inds) \ + for (ind = inds; ind->bit; ind++) + +/* + * The driver has internal list of supported indications. Ideally, the driver + * should not touch bits that are not defined and for which LED devices are + * not exposed but in reality, it needs to turn them off. + * + * Otherwise, there will be no possibility to turn off indications turned on by + * other utilities or turned on by default and it leads to bad user experience. + * + * Additionally, it excludes NPEM commands like RESET or ENABLE. + */ +static u32 reg_to_indications(u32 caps, const struct indication *inds) +{ + const struct indication *ind; + u32 supported_indications = 0; + + for_each_indication(ind, inds) + supported_indications |= ind->bit; + + return caps & supported_indications; +} + +/** + * struct npem_led - LED details + * @indication: indication details + * @npem: npem device + * @name: LED name + * @led: LED device + */ +struct npem_led { + const struct indication *indication; + struct npem *npem; + char name[LED_MAX_NAME_SIZE]; + struct led_classdev led; +}; + +/** + * struct npem_ops - backend specific callbacks + * @inds: supported indications array, set of indications is backend specific + * @get_active_indications: get active indications + * npem: npem device + * inds: response buffer + * @set_active_indications: set new indications + * npem: npem device + * inds: bit mask to set + */ +struct npem_ops { + const struct indication *inds; + int (*get_active_indications)(struct npem *npem, u32 *inds); + int (*set_active_indications)(struct npem *npem, u32 inds); +}; + +/** + * struct npem - NPEM device properties + * @dev: PCIe device this driver is attached to + * @ops: Backend specific callbacks + * @lock: serialized accessing npem device from multiple LED devices + * @pos: NPEM backed only, NPEM capability offset + * @supported_indications: bit mask of supported indications + * non-indication and reserved bits are cleared + * @active_indications: bit mask of active indications + * non-indication and reserved bits are cleared + * @active_inds_initialized: if set then active_indications are initialized + * @led_cnt: Supported LEDs count + * @leds: supported LEDs + */ +struct npem { + struct pci_dev *dev; + const struct npem_ops *ops; + struct mutex lock; + u16 pos; + u32 supported_indications; + u32 active_indications; + + /* + * Use lazy loading for active_indications to not play with initcalls. + * It is needed to allow _DSM initialization on DELL platforms, where + * ACPI_IPMI must be loaded first. + */ + unsigned int active_inds_initialized:1; + + int led_cnt; + struct npem_led leds[]; +}; + +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val) +{ + int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val); + + return pcibios_err_to_errno(ret); +} + +static int npem_write_ctrl(struct npem *npem, u32 reg) +{ + int pos = npem->pos + PCI_NPEM_CTRL; + int ret = pci_write_config_dword(npem->dev, pos, reg); + + return pcibios_err_to_errno(ret); +} + +static int npem_get_active_indications(struct npem *npem, u32 *inds) +{ + u32 ctrl; + int ret; + + lockdep_assert_held(&npem->lock); + + ret = npem_read_reg(npem, PCI_NPEM_CTRL, &ctrl); + if (ret) + return ret; + + /* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */ + if (!(ctrl & PCI_NPEM_CTRL_ENABLE)) { + *inds = 0; + return 0; + } + + *inds = ctrl & npem->supported_indications; + + return 0; +} + +static int npem_set_active_indications(struct npem *npem, u32 inds) +{ + int ctrl, ret, ret_val; + u32 cc_status; + + lockdep_assert_held(&npem->lock); + + /* This bit is always required */ + ctrl = inds | PCI_NPEM_CTRL_ENABLE; + + ret = npem_write_ctrl(npem, ctrl); + if (ret) + return ret; + + /* + * For the case where a NPEM command has not completed immediately, + * it is recommended that software not continuously “spin” on polling + * the status register, but rather poll under interrupt at a reduced + * rate; for example at 10 ms intervals. + * + * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM + * Command Completed" + */ + ret = read_poll_timeout(npem_read_reg, ret_val, + ret_val || (cc_status & PCI_NPEM_STATUS_CC), + 10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem, + PCI_NPEM_STATUS, &cc_status); + if (ret) + return ret; + if (ret_val) + return ret_val; + + /* + * All writes to control register, including writes that do not change + * the register value, are NPEM commands and should eventually result + * in a command completion indication in the NPEM Status Register. + * + * PCIe Base Specification r6.1 sec 7.9.19.3 + * + * Register may not be updated, or other conflicting bits may be + * cleared. Spec is not strict here. Read NPEM Control register after + * write to keep cache in-sync. + */ + return npem_get_active_indications(npem, &npem->active_indications); +} + +static const struct npem_ops npem_ops = { + .inds = npem_indications, + .get_active_indications = npem_get_active_indications, + .set_active_indications = npem_set_active_indications, +}; + +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b, 0x8c, 0xb7, 0x74, 0x7e,\ + 0xd5, 0x1e, 0x19, 0x4d) +#define GET_SUPPORTED_STATES_DSM 1 +#define GET_STATE_DSM 2 +#define SET_STATE_DSM 3 + +static const guid_t dsm_guid = DSM_GUID; + +static bool npem_has_dsm(struct pci_dev *pdev) +{ + acpi_handle handle; + + handle = ACPI_HANDLE(&pdev->dev); + if (!handle) + return false; + + return acpi_check_dsm(handle, &dsm_guid, 0x1, + BIT(GET_SUPPORTED_STATES_DSM) | + BIT(GET_STATE_DSM) | BIT(SET_STATE_DSM)); +} + +static int npem_initialize_active_indications(struct npem *npem) +{ + int ret; + + lockdep_assert_held(&npem->lock); + + if (npem->active_inds_initialized) + return 0; + + ret = npem->ops->get_active_indications(npem, + &npem->active_indications); + if (ret) + return ret; + + npem->active_inds_initialized = true; + return 0; +} + +/* + * The status of each indicator is cached on first brightness_ get/set time and + * updated at write time. + * brightness_get() is only responsible for reflecting the last written/cached + * value. + */ +static enum led_brightness brightness_get(struct led_classdev *led) +{ + struct npem_led *nled = container_of(led, struct npem_led, led); + struct npem *npem = nled->npem; + int ret, val = 0; + + ret = mutex_lock_interruptible(&npem->lock); + if (ret) + return ret; + + ret = npem_initialize_active_indications(npem); + if (ret) + goto out; + + if (npem->active_indications & nled->indication->bit) + val = 1; + +out: + mutex_unlock(&npem->lock); + return val; +} + +static int brightness_set(struct led_classdev *led, + enum led_brightness brightness) +{ + struct npem_led *nled = container_of(led, struct npem_led, led); + struct npem *npem = nled->npem; + u32 indications; + int ret; + + ret = mutex_lock_interruptible(&npem->lock); + if (ret) + return ret; + + ret = npem_initialize_active_indications(npem); + if (ret) + goto out; + + if (brightness == 0) + indications = npem->active_indications & ~(nled->indication->bit); + else + indications = npem->active_indications | nled->indication->bit; + + ret = npem->ops->set_active_indications(npem, indications); + +out: + mutex_unlock(&npem->lock); + return ret; +} + +static void npem_free(struct npem *npem) +{ + struct npem_led *nled; + int cnt; + + if (!npem) + return; + + for (cnt = 0; cnt < npem->led_cnt; cnt++) { + nled = &npem->leds[cnt]; + + if (nled->name[0]) + led_classdev_unregister(&nled->led); + } + + mutex_destroy(&npem->lock); + kfree(npem); +} + +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled) +{ + struct led_classdev *led = &nled->led; + struct led_init_data init_data = {}; + char *name = nled->name; + int ret; + + init_data.devicename = pci_name(npem->dev); + init_data.default_label = nled->indication->name; + + ret = led_compose_name(&npem->dev->dev, &init_data, name); + if (ret) + return ret; + + led->name = name; + led->brightness_set_blocking = brightness_set; + led->brightness_get = brightness_get; + led->max_brightness = 1; + led->default_trigger = "none"; + led->flags = 0; + + ret = led_classdev_register(&npem->dev->dev, led); + if (ret) + /* Clear the name to indicate that it is not registered. */ + name[0] = 0; + return ret; +} + +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, + int pos, u32 caps) +{ + u32 supported = reg_to_indications(caps, ops->inds); + int supported_cnt = hweight32(supported); + const struct indication *indication; + struct npem_led *nled; + struct npem *npem; + int led_idx = 0; + int ret; + + npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL); + if (!npem) + return -ENOMEM; + + npem->supported_indications = supported; + npem->led_cnt = supported_cnt; + npem->pos = pos; + npem->dev = dev; + npem->ops = ops; + + mutex_init(&npem->lock); + + for_each_indication(indication, npem_indications) { + if (!(npem->supported_indications & indication->bit)) + continue; + + nled = &npem->leds[led_idx++]; + nled->indication = indication; + nled->npem = npem; + + ret = pci_npem_set_led_classdev(npem, nled); + if (ret) { + npem_free(npem); + return ret; + } + } + + dev->npem = npem; + return 0; +} + +void pci_npem_remove(struct pci_dev *dev) +{ + npem_free(dev->npem); +} + +void pci_npem_create(struct pci_dev *dev) +{ + const struct npem_ops *ops = &npem_ops; + int pos = 0, ret; + u32 cap; + + if (!npem_has_dsm(dev)) { + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); + if (pos == 0) + return; + + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || + (cap & PCI_NPEM_CAP_CAPABLE) == 0) + return; + } else { + /* + * OS should use the DSM for LED control if it is available + * PCI Firmware Spec r3.3 sec 4.7. + */ + return; + } + + ret = pci_npem_init(dev, ops, pos, cap); + if (ret) + pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n", + ret); +} diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 79c8398f3938..554fd9dfe25c 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -398,6 +398,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { } static inline void pci_doe_disconnected(struct pci_dev *pdev) { } #endif +#ifdef CONFIG_PCI_NPEM +void pci_npem_create(struct pci_dev *dev); +void pci_npem_remove(struct pci_dev *dev); +#else +static inline void pci_npem_create(struct pci_dev *dev) { } +static inline void pci_npem_remove(struct pci_dev *dev) { } +#endif + /** * pci_dev_set_io_state - Set the new error state if possible. * diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index b14b9876c030..17ee559c31a4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2593,6 +2593,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) dev->match_driver = false; ret = device_add(&dev->dev); WARN_ON(ret < 0); + + pci_npem_create(dev); } struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 910387e5bdbf..da9629c3a688 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -34,6 +34,8 @@ static void pci_destroy_dev(struct pci_dev *dev) if (!dev->dev.kobj.parent) return; + pci_npem_remove(dev); + device_del(&dev->dev); down_write(&pci_bus_sem); diff --git a/include/linux/pci.h b/include/linux/pci.h index 4cf89a4b4cbc..c9db853e269f 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -516,6 +516,9 @@ struct pci_dev { #endif #ifdef CONFIG_PCI_DOE struct xarray doe_mbs; /* Data Object Exchange mailboxes */ +#endif +#ifdef CONFIG_PCI_NPEM + struct npem *npem; /* Native PCIe Enclosure Management */ #endif u16 acs_cap; /* ACS Capability offset */ phys_addr_t rom; /* Physical address if not from BAR */ diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index 94c00996e633..c5e1b0573ff8 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -740,6 +740,7 @@ #define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ +#define PCI_EXT_CAP_ID_NPEM 0x29 /* Native PCIe Enclosure Management */ #define PCI_EXT_CAP_ID_PL_32GT 0x2A /* Physical Layer 32.0 GT/s */ #define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE @@ -1121,6 +1122,40 @@ #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 +/* Native PCIe Enclosure Management */ +#define PCI_NPEM_CAP 0x04 /* NPEM capability register */ +#define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ + +#define PCI_NPEM_CTRL 0x08 /* NPEM control register */ +#define PCI_NPEM_CTRL_ENABLE 0x00000001 /* NPEM Enable */ + +/* + * Native PCIe Enclosure Management indication bits and Reset command bit + * are corresponding for capability and control registers. + */ +#define PCI_NPEM_CMD_RESET 0x00000002 /* NPEM Reset Command */ +#define PCI_NPEM_IND_OK 0x00000004 /* NPEM indication OK */ +#define PCI_NPEM_IND_LOCATE 0x00000008 /* NPEM indication Locate */ +#define PCI_NPEM_IND_FAIL 0x00000010 /* NPEM indication Fail */ +#define PCI_NPEM_IND_REBUILD 0x00000020 /* NPEM indication Rebuild */ +#define PCI_NPEM_IND_PFA 0x00000040 /* NPEM indication Predicted Failure Analysis */ +#define PCI_NPEM_IND_HOTSPARE 0x00000080 /* NPEM indication Hot Spare */ +#define PCI_NPEM_IND_ICA 0x00000100 /* NPEM indication In Critical Array */ +#define PCI_NPEM_IND_IFA 0x00000200 /* NPEM indication In Failed Array */ +#define PCI_NPEM_IND_IDT 0x00000400 /* NPEM indication Invalid Device Type */ +#define PCI_NPEM_IND_DISABLED 0x00000800 /* NPEM indication Disabled */ +#define PCI_NPEM_IND_SPEC_0 0x01000000 +#define PCI_NPEM_IND_SPEC_1 0x02000000 +#define PCI_NPEM_IND_SPEC_2 0x04000000 +#define PCI_NPEM_IND_SPEC_3 0x08000000 +#define PCI_NPEM_IND_SPEC_4 0x10000000 +#define PCI_NPEM_IND_SPEC_5 0x20000000 +#define PCI_NPEM_IND_SPEC_6 0x40000000 +#define PCI_NPEM_IND_SPEC_7 0x80000000 + +#define PCI_NPEM_STATUS 0x0c /* NPEM status register */ +#define PCI_NPEM_STATUS_CC 0x00000001 /* NPEM Command completed */ + /* Data Object Exchange */ #define PCI_DOE_CAP 0x04 /* DOE Capabilities Register */ #define PCI_DOE_CAP_INT_SUP 0x00000001 /* Interrupt Support */