Message ID | 20240528131940.16924-3-mariusz.tkaczyk@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCIe Enclosure LED Management | expand |
On Tue, 28 May 2024, Mariusz Tkaczyk wrote: > Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows > managing LED in storage enclosures. NPEM is indication oriented > and it does not give direct access to LED. Although each of > the indications *could* represent an individual LED, multiple > indications could also be represented as a single, > multi-color LED or a single LED blinking in a specific interval. > The specification leaves that open. > > Each enabled indication (capability register bit on) is represented as a > ledclass_dev which can be controlled through sysfs. For every ledclass > device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0). > It is corresponding to NPEM control register (Indication bit on/off). > > Ledclass devices appear in sysfs as child devices (subdirectory) of PCI > device which has an NPEM Extended Capability and indication is enabled > in NPEM capability register. For example, these are leds created for > pcieport "10000:02:05.0" on my setup: > > leds/ > ├── 10000:02:05.0:enclosure:fail > ├── 10000:02:05.0:enclosure:locate > ├── 10000:02:05.0:enclosure:ok > └── 10000:02:05.0:enclosure:rebuild > > They can be also found in "/sys/class/leds" directory. Parent PCIe device > bdf is used to guarantee uniqueness across leds subsystem. > > To enable/disable fail indication "brightness" file can be edited: > echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness > echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness > > PCIe r6.1, sec 7.9.19.2 defines the possible indications. > > Multiple indications for same parent PCIe device can conflict and > hardware may update them when processing new request. To avoid issues, > driver refresh all indications by reading back control register. > > Driver is projected to be exclusive NPEM extended capability manager. > It waits up to 1 second after imposing new request, it doesn't verify if > controller is busy before write, assuming that mutex lock gives protection > from concurrent updates. Driver is not registered if _DSM LED management > is available. > > NPEM is a PCIe extended capability so it should be registered in > pcie_init_capabilities() but it is not possible due to LED dependency. > Parent pci_device must be added earlier for led_classdev_register() > to be successful. NPEM does not require configuration on kernel side, it > is safe to register LED devices later. > > Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1] > Suggested-by: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > --- > drivers/pci/Kconfig | 9 + > drivers/pci/Makefile | 1 + > drivers/pci/npem.c | 410 ++++++++++++++++++++++++++++++++++ > drivers/pci/pci.h | 8 + > drivers/pci/probe.c | 2 + > drivers/pci/remove.c | 2 + > include/linux/pci.h | 3 + > include/uapi/linux/pci_regs.h | 34 +++ > 8 files changed, 469 insertions(+) > create mode 100644 drivers/pci/npem.c > > diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig > index d35001589d88..e696e69ad516 100644 > --- a/drivers/pci/Kconfig > +++ b/drivers/pci/Kconfig > @@ -143,6 +143,15 @@ config PCI_IOV > > If unsure, say N. > > +config PCI_NPEM > + bool "Native PCIe Enclosure Management" > + depends on LEDS_CLASS=y > + help > + Support for Native PCIe Enclosure Management. It allows managing LED > + indications in storage enclosures. Enclosure must support following > + indications: OK, Locate, Fail, Rebuild, other indications are > + optional. > + > config PCI_PRI > bool "PCI PRI support" > select PCI_ATS > diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile > index 175302036890..cd5f655d4be9 100644 > --- a/drivers/pci/Makefile > +++ b/drivers/pci/Makefile > @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o > obj-$(CONFIG_VGA_ARB) += vgaarb.o > obj-$(CONFIG_PCI_DOE) += doe.o > obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o > +obj-$(CONFIG_PCI_NPEM) += npem.o > > # Endpoint library must be initialized before its users > obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ > diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c > new file mode 100644 > index 000000000000..a76a2044dab2 > --- /dev/null > +++ b/drivers/pci/npem.c > @@ -0,0 +1,410 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * PCIe Enclosure management driver created for LED interfaces based on > + * indications. It says *what indications* blink but does not specify *how* > + * they blink - it is hardware defined. > + * > + * The driver name refers to Native PCIe Enclosure Management. It is > + * first indication oriented standard with specification. > + * > + * Native PCIe Enclosure Management (NPEM) > + * PCIe Base Specification r6.1 sec 6.28 > + * PCIe Base Specification r6.1 sec 7.9.19 > + * > + * Copyright (c) 2023-2024 Intel Corporation > + * Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > + */ > + > +#include <linux/acpi.h> > +#include <linux/bitops.h> > +#include <linux/errno.h> > +#include <linux/iopoll.h> > +#include <linux/leds.h> > +#include <linux/mutex.h> > +#include <linux/pci.h> > +#include <linux/pci_regs.h> > +#include <linux/types.h> > +#include <linux/uleds.h> > + > +#include "pci.h" > + > +struct indication { > + u32 bit; > + const char *name; > +}; > + > +static const struct indication npem_indications[] = { > + {PCI_NPEM_IND_OK, "enclosure:ok"}, > + {PCI_NPEM_IND_LOCATE, "enclosure:locate"}, > + {PCI_NPEM_IND_FAIL, "enclosure:fail"}, > + {PCI_NPEM_IND_REBUILD, "enclosure:rebuild"}, > + {PCI_NPEM_IND_PFA, "enclosure:pfa"}, > + {PCI_NPEM_IND_HOTSPARE, "enclosure:hotspare"}, > + {PCI_NPEM_IND_ICA, "enclosure:ica"}, > + {PCI_NPEM_IND_IFA, "enclosure:ifa"}, > + {PCI_NPEM_IND_IDT, "enclosure:idt"}, > + {PCI_NPEM_IND_DISABLED, "enclosure:disabled"}, > + {PCI_NPEM_IND_SPEC_0, "enclosure:specific_0"}, > + {PCI_NPEM_IND_SPEC_1, "enclosure:specific_1"}, > + {PCI_NPEM_IND_SPEC_2, "enclosure:specific_2"}, > + {PCI_NPEM_IND_SPEC_3, "enclosure:specific_3"}, > + {PCI_NPEM_IND_SPEC_4, "enclosure:specific_4"}, > + {PCI_NPEM_IND_SPEC_5, "enclosure:specific_5"}, > + {PCI_NPEM_IND_SPEC_6, "enclosure:specific_6"}, > + {PCI_NPEM_IND_SPEC_7, "enclosure:specific_7"}, > + {0, NULL} > +}; > + > +#define for_each_indication(ind, inds) \ > + for (ind = inds; ind->bit; ind++) > + > +/* To avoid confusion, do not keep any special bits in indications */ > +static u32 reg_to_indications(u32 caps, const struct indication *inds) > +{ > + const struct indication *ind; > + u32 supported_indications = 0; > + > + for_each_indication(ind, inds) > + supported_indications |= ind->bit; > + > + return caps & supported_indications; > +} > + > +/** > + * struct npem_led - LED details > + * @indication: indication details > + * @npem: npem device > + * @name: LED name > + * @led: LED device > + */ > +struct npem_led { > + const struct indication *indication; > + struct npem *npem; > + char name[LED_MAX_NAME_SIZE]; > + struct led_classdev led; > +}; > + > +/** > + * struct npem_ops - backend specific callbacks > + * @inds: supported indications array > + * @get_active_indications: get active indications > + * npem: npem device > + * buf: response buffer > + * @set_active_indications: set new indications > + * npem: npem device > + * val: bit mask to set > + * > + * Handle communication with hardware. set_active_indications updates > + * npem->active_indications. > + */ > +struct npem_ops { > + const struct indication *inds; > + int (*get_active_indications)(struct npem *npem, u32 *buf); > + int (*set_active_indications)(struct npem *npem, u32 val); > +}; > + > +/** > + * struct npem - NPEM device properties > + * @dev: PCIe device this driver is attached to > + * @ops: Backend specific callbacks > + * @npem_lock: to keep concurrent updates serialized > + * @pos: NPEM capability offset (only relevant for NPEM direct register access, > + * not DSM access method) > + * @supported_indications: bit mask of supported indications > + * non-indication and reserved bits are cleared > + * @active_indications: bit mask of active indications > + * non-indication and reserved bits are cleared > + * @led_cnt: Supported LEDs count > + * @leds: supported LEDs > + */ > +struct npem { > + struct pci_dev *dev; > + const struct npem_ops *ops; > + struct mutex npem_lock; > + u16 pos; > + u32 supported_indications; > + u32 active_indications; > + int led_cnt; > + struct npem_led leds[]; > +}; > + > +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val) > +{ > + int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val); > + > + return pcibios_err_to_errno(ret); > +} > + > +static int npem_write_ctrl(struct npem *npem, u32 reg) > +{ > + int pos = npem->pos + PCI_NPEM_CTRL; > + int ret = pci_write_config_dword(npem->dev, pos, reg); > + > + return pcibios_err_to_errno(ret); > +} > + > +static int npem_get_active_indications(struct npem *npem, u32 *buf) > +{ > + int ret; > + > + ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf); > + if (ret) > + return ret; > + > + /* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */ > + if (!(*buf & PCI_NPEM_CTRL_ENABLE)) > + *buf = 0; > + > + /* Filter out not supported indications in response */ > + *buf &= npem->supported_indications; > + return 0; > +} > + > +static int npem_set_active_indications(struct npem *npem, u32 val) > +{ > + int ret, ret_val; > + u32 cc_status; > + > + lockdep_assert_held(&npem->npem_lock); > + > + /* This bit is always required */ > + val |= PCI_NPEM_CTRL_ENABLE; > + ret = npem_write_ctrl(npem, val); > + if (ret) > + return ret; > + > + /* > + * For the case where a NPEM command has not completed immediately, > + * it is recommended that software not continuously “spin” on polling > + * the status register, but rather poll under interrupt at a reduced > + * rate; for example at 10 ms intervals. > + * > + * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM > + * Command Completed" > + */ > + ret = read_poll_timeout(npem_read_reg, ret_val, > + ret_val || (cc_status & PCI_NPEM_STATUS_CC), > + 10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem, > + PCI_NPEM_STATUS, &cc_status); > + if (ret) > + return ret_val; > + > + /* > + * All writes to control register, including writes that do not change > + * the register value, are NPEM commands and should eventually result > + * in a command completion indication in the NPEM Status Register. > + * > + * PCIe Base Specification r6.1 sec 7.9.19.3 > + * > + * Register may not be updated, or other conflicting bits may be > + * cleared. Spec is not strict here. Read NPEM Control register after > + * write to keep cache in-sync. > + */ > + return npem_get_active_indications(npem, &npem->active_indications); > +} > + > +static const struct npem_ops npem_ops = { > + .inds = npem_indications, > + .get_active_indications = npem_get_active_indications, > + .set_active_indications = npem_set_active_indications, > +}; > + > +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b, 0x8c, 0xb7, 0x74, 0x7e,\ > + 0xd5, 0x1e, 0x19, 0x4d) > +#define GET_SUPPORTED_STATES_DSM BIT(1) > +#define GET_STATE_DSM BIT(2) > +#define SET_STATE_DSM BIT(3) > + > +static const guid_t dsm_guid = DSM_GUID; > + > +static bool npem_has_dsm(struct pci_dev *pdev) > +{ > + acpi_handle handle; > + > + handle = ACPI_HANDLE(&pdev->dev); > + if (!handle) > + return false; > + > + return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM | > + GET_STATE_DSM | SET_STATE_DSM); > +} > + > +/* > + * This function does not use ops->get_active_indications(). > + * It returns cached value, npem->npem_lock is held and it is safe. > + */ > +static enum led_brightness brightness_get(struct led_classdev *led) > +{ > + struct npem_led *nled = container_of(led, struct npem_led, led); > + struct npem *npem = nled->npem; > + int ret, val = LED_OFF; > + > + ret = mutex_lock_interruptible(&npem->npem_lock); > + if (ret) > + return ret; > + > + if (npem->active_indications & nled->indication->bit) > + val = LED_ON; > + > + mutex_unlock(&npem->npem_lock); > + > + return val; > +} > + > +static int brightness_set(struct led_classdev *led, > + enum led_brightness brightness) > +{ > + struct npem_led *nled = container_of(led, struct npem_led, led); > + struct npem *npem = nled->npem; > + u32 indications; > + int ret; > + > + ret = mutex_lock_interruptible(&npem->npem_lock); > + if (ret) > + return ret; > + > + if (brightness == LED_OFF) > + indications = npem->active_indications & ~(nled->indication->bit); > + else > + indications = npem->active_indications | nled->indication->bit; > + > + ret = npem->ops->set_active_indications(npem, indications); > + > + mutex_unlock(&npem->npem_lock); > + return ret; > +} > + > +static void npem_free(struct npem *npem) > +{ > + struct npem_led *nled; > + int cnt; > + > + for (cnt = 0; cnt < npem->led_cnt; cnt++) { > + nled = &npem->leds[cnt]; > + > + if (nled->name[0]) > + led_classdev_unregister(&nled->led); > + } > + > + mutex_destroy(&npem->npem_lock); > + kfree(npem); > +} > + > +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled) > +{ > + struct led_classdev *led = &nled->led; > + struct led_init_data init_data = {}; > + char *name = nled->name; > + int ret; > + > + init_data.devicename = pci_name(npem->dev); > + init_data.default_label = nled->indication->name; > + > + ret = led_compose_name(&npem->dev->dev, &init_data, name); > + if (ret) > + return ret; > + > + led->name = name; > + led->brightness_set_blocking = brightness_set; > + led->brightness_get = brightness_get; > + led->max_brightness = LED_ON; > + led->default_trigger = "none"; > + led->flags = 0; > + > + ret = led_classdev_register(&npem->dev->dev, led); > + if (ret) > + /* Clear the name to indicate that it is not registered. */ > + name[0] = 0; > + return ret; > +} > + > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, > + int pos, u32 caps) > +{ > + u32 supported = reg_to_indications(caps, ops->inds); > + int supported_cnt = hweight32(supported); > + const struct indication *indication; > + struct npem_led *nled; > + struct npem *npem; > + int led_idx = 0; > + u32 active; > + int ret; > + > + npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL); > + > + if (!npem) > + return -ENOMEM; Don't leave empty line between func and it's error handling. > + > + npem->supported_indications = supported; > + npem->led_cnt = supported_cnt; > + npem->pos = pos; > + npem->dev = dev; > + npem->ops = ops; > + > + ret = ops->get_active_indications(npem, &active); > + if (ret) { > + npem_free(npem); Just call kfree() directly here. As is, you're calling mutex_destroy() before mutex_init(). > + return -EACCES; > + } > + > + npem->active_indications = reg_to_indications(active, ops->inds); > + > + /* > + * Do not take npem->npem_lock, get_brightness() is called on > + * registration path. > + */ > + mutex_init(&npem->npem_lock); > + > + for_each_indication(indication, npem_indications) { > + if ((npem->supported_indications & indication->bit) == 0) if (!(...)) > + /* Do not register unsupported indication */ This sounds quite obvious comment, I'd drop it. > + continue; > + > + nled = &npem->leds[led_idx++]; > + nled->indication = indication; > + nled->npem = npem; > + > + ret = pci_npem_set_led_classdev(npem, nled); > + if (ret) { > + npem_free(npem); > + return ret; > + } > + } > + > + dev->npem = npem; > + return 0; > +} > + > +void pci_npem_remove(struct pci_dev *dev) > +{ > + if (dev->npem) > + npem_free(dev->npem); > +} > + > +void pci_npem_create(struct pci_dev *dev) > +{ > + const struct npem_ops *ops = &npem_ops; > + int pos = 0, ret; > + u32 cap; > + > + if (!npem_has_dsm(dev)) { > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); > + if (pos == 0) > + return; > + > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > + return; > + } else { > + /* > + * OS should use the DSM for LED control if it is available > + * PCI Firmware Spec r3.3 sec 4.7. > + */ > + return; > + } > + > + ret = pci_npem_init(dev, ops, pos, cap); > + if (ret) > + pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n", > + ret); > +} > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fd44565c4756..9dea8c7353ab 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { } > static inline void pci_doe_disconnected(struct pci_dev *pdev) { } > #endif > > +#ifdef CONFIG_PCI_NPEM > +void pci_npem_create(struct pci_dev *dev); > +void pci_npem_remove(struct pci_dev *dev); > +#else > +static inline void pci_npem_create(struct pci_dev *dev) { } > +static inline void pci_npem_remove(struct pci_dev *dev) { } > +#endif > + > /** > * pci_dev_set_io_state - Set the new error state if possible. > * > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 8e696e547565..b8ea6353e27a 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > dev->match_driver = false; > ret = device_add(&dev->dev); > WARN_ON(ret < 0); > + > + pci_npem_create(dev); > } > > struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index d749ea8250d6..1436f9cf1fea 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev) > if (!dev->dev.kobj.parent) > return; > > + pci_npem_remove(dev); > + > device_del(&dev->dev); > > down_write(&pci_bus_sem); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index fb004fd4e889..c327c2dd4527 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -517,6 +517,9 @@ struct pci_dev { > #endif > #ifdef CONFIG_PCI_DOE > struct xarray doe_mbs; /* Data Object Exchange mailboxes */ > +#endif > +#ifdef CONFIG_PCI_NPEM > + struct npem *npem; /* Native PCIe Enclosure Management */ > #endif > u16 acs_cap; /* ACS Capability offset */ > phys_addr_t rom; /* Physical address if not from BAR */ > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index 94c00996e633..97e8b7ed9998 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -740,6 +740,7 @@ > #define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > +#define PCI_EXT_CAP_ID_NPEM 0x29 /* Native PCIe Enclosure Management */ > #define PCI_EXT_CAP_ID_PL_32GT 0x2A /* Physical Layer 32.0 GT/s */ > #define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE > @@ -1121,6 +1122,39 @@ > #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 > #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 > > +/* Native PCIe Enclosure Management */ > +#define PCI_NPEM_CAP 0x04 /* NPEM capability register */ > +#define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ > + > +#define PCI_NPEM_CTRL 0x08 /* NPEM control register */ > +#define PCI_NPEM_CTRL_ENABLE 0x00000001 /* NPEM Enable */ I think the rest of the bits would belong here or after PCI_NPEM_CAP_CAPABLE. > +#define PCI_NPEM_STATUS 0x0c /* NPEM status register */ > +#define PCI_NPEM_STATUS_CC 0x00000001 /* NPEM Command completed */ > +/* > + * Native PCIe Enclosure Management indication bits and Reset command bit > + * are corresponding for capability and control registers. > + */ > +#define PCI_NPEM_CMD_RESET 0x00000002 /* NPEM Reset Command */ > +#define PCI_NPEM_IND_OK 0x00000004 /* NPEM indication OK */ > +#define PCI_NPEM_IND_LOCATE 0x00000008 /* NPEM indication Locate */ > +#define PCI_NPEM_IND_FAIL 0x00000010 /* NPEM indication Fail */ > +#define PCI_NPEM_IND_REBUILD 0x00000020 /* NPEM indication Rebuild */ > +#define PCI_NPEM_IND_PFA 0x00000040 /* NPEM indication Predicted Failure Analysis */ > +#define PCI_NPEM_IND_HOTSPARE 0x00000080 /* NPEM indication Hot Spare */ > +#define PCI_NPEM_IND_ICA 0x00000100 /* NPEM indication In Critical Array */ > +#define PCI_NPEM_IND_IFA 0x00000200 /* NPEM indication In Failed Array */ > +#define PCI_NPEM_IND_IDT 0x00000400 /* NPEM indication Invalid Device Type */ > +#define PCI_NPEM_IND_DISABLED 0x00000800 /* NPEM indication Disabled */ > +#define PCI_NPEM_IND_SPEC_0 0x00800000 > +#define PCI_NPEM_IND_SPEC_1 0x01000000 > +#define PCI_NPEM_IND_SPEC_2 0x02000000 > +#define PCI_NPEM_IND_SPEC_3 0x04000000 > +#define PCI_NPEM_IND_SPEC_4 0x08000000 > +#define PCI_NPEM_IND_SPEC_5 0x10000000 > +#define PCI_NPEM_IND_SPEC_6 0x20000000 > +#define PCI_NPEM_IND_SPEC_7 0x40000000 Are these off-by-one, shouldn't that field go all the way to the last bit which is 0x80000000 (the field is 31:24)??
Mariusz Tkaczyk wrote: > Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows > managing LED in storage enclosures. NPEM is indication oriented > and it does not give direct access to LED. Although each of > the indications *could* represent an individual LED, multiple > indications could also be represented as a single, > multi-color LED or a single LED blinking in a specific interval. > The specification leaves that open. > > Each enabled indication (capability register bit on) is represented as a > ledclass_dev which can be controlled through sysfs. For every ledclass > device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0). > It is corresponding to NPEM control register (Indication bit on/off). > > Ledclass devices appear in sysfs as child devices (subdirectory) of PCI > device which has an NPEM Extended Capability and indication is enabled > in NPEM capability register. For example, these are leds created for > pcieport "10000:02:05.0" on my setup: > > leds/ > ├── 10000:02:05.0:enclosure:fail > ├── 10000:02:05.0:enclosure:locate > ├── 10000:02:05.0:enclosure:ok > └── 10000:02:05.0:enclosure:rebuild > > They can be also found in "/sys/class/leds" directory. Parent PCIe device > bdf is used to guarantee uniqueness across leds subsystem. > > To enable/disable fail indication "brightness" file can be edited: > echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness > echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness > > PCIe r6.1, sec 7.9.19.2 defines the possible indications. > > Multiple indications for same parent PCIe device can conflict and > hardware may update them when processing new request. To avoid issues, > driver refresh all indications by reading back control register. > > Driver is projected to be exclusive NPEM extended capability manager. > It waits up to 1 second after imposing new request, it doesn't verify if > controller is busy before write, assuming that mutex lock gives protection > from concurrent updates. Driver is not registered if _DSM LED management > is available. > > NPEM is a PCIe extended capability so it should be registered in > pcie_init_capabilities() but it is not possible due to LED dependency. > Parent pci_device must be added earlier for led_classdev_register() > to be successful. NPEM does not require configuration on kernel side, it > is safe to register LED devices later. > > Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1] > Suggested-by: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > --- > drivers/pci/Kconfig | 9 + > drivers/pci/Makefile | 1 + > drivers/pci/npem.c | 410 ++++++++++++++++++++++++++++++++++ > drivers/pci/pci.h | 8 + > drivers/pci/probe.c | 2 + > drivers/pci/remove.c | 2 + > include/linux/pci.h | 3 + > include/uapi/linux/pci_regs.h | 34 +++ > 8 files changed, 469 insertions(+) > create mode 100644 drivers/pci/npem.c > > diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig > index d35001589d88..e696e69ad516 100644 > --- a/drivers/pci/Kconfig > +++ b/drivers/pci/Kconfig > @@ -143,6 +143,15 @@ config PCI_IOV > > If unsure, say N. > > +config PCI_NPEM > + bool "Native PCIe Enclosure Management" > + depends on LEDS_CLASS=y I would have expected depends on NEW_LEDS select LEDS_CLASS > + help > + Support for Native PCIe Enclosure Management. It allows managing LED > + indications in storage enclosures. Enclosure must support following > + indications: OK, Locate, Fail, Rebuild, other indications are > + optional. > + > config PCI_PRI > bool "PCI PRI support" > select PCI_ATS > diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile > index 175302036890..cd5f655d4be9 100644 > --- a/drivers/pci/Makefile > +++ b/drivers/pci/Makefile > @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o > obj-$(CONFIG_VGA_ARB) += vgaarb.o > obj-$(CONFIG_PCI_DOE) += doe.o > obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o > +obj-$(CONFIG_PCI_NPEM) += npem.o > > # Endpoint library must be initialized before its users > obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ > diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c > new file mode 100644 > index 000000000000..a76a2044dab2 > --- /dev/null > +++ b/drivers/pci/npem.c > @@ -0,0 +1,410 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * PCIe Enclosure management driver created for LED interfaces based on > + * indications. It says *what indications* blink but does not specify *how* > + * they blink - it is hardware defined. > + * > + * The driver name refers to Native PCIe Enclosure Management. It is > + * first indication oriented standard with specification. > + * > + * Native PCIe Enclosure Management (NPEM) > + * PCIe Base Specification r6.1 sec 6.28 > + * PCIe Base Specification r6.1 sec 7.9.19 > + * > + * Copyright (c) 2023-2024 Intel Corporation > + * Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > + */ > + > +#include <linux/acpi.h> > +#include <linux/bitops.h> > +#include <linux/errno.h> > +#include <linux/iopoll.h> > +#include <linux/leds.h> > +#include <linux/mutex.h> > +#include <linux/pci.h> > +#include <linux/pci_regs.h> > +#include <linux/types.h> > +#include <linux/uleds.h> > + > +#include "pci.h" > + > +struct indication { > + u32 bit; > + const char *name; > +}; > + > +static const struct indication npem_indications[] = { > + {PCI_NPEM_IND_OK, "enclosure:ok"}, > + {PCI_NPEM_IND_LOCATE, "enclosure:locate"}, > + {PCI_NPEM_IND_FAIL, "enclosure:fail"}, > + {PCI_NPEM_IND_REBUILD, "enclosure:rebuild"}, > + {PCI_NPEM_IND_PFA, "enclosure:pfa"}, > + {PCI_NPEM_IND_HOTSPARE, "enclosure:hotspare"}, > + {PCI_NPEM_IND_ICA, "enclosure:ica"}, > + {PCI_NPEM_IND_IFA, "enclosure:ifa"}, > + {PCI_NPEM_IND_IDT, "enclosure:idt"}, > + {PCI_NPEM_IND_DISABLED, "enclosure:disabled"}, > + {PCI_NPEM_IND_SPEC_0, "enclosure:specific_0"}, > + {PCI_NPEM_IND_SPEC_1, "enclosure:specific_1"}, > + {PCI_NPEM_IND_SPEC_2, "enclosure:specific_2"}, > + {PCI_NPEM_IND_SPEC_3, "enclosure:specific_3"}, > + {PCI_NPEM_IND_SPEC_4, "enclosure:specific_4"}, > + {PCI_NPEM_IND_SPEC_5, "enclosure:specific_5"}, > + {PCI_NPEM_IND_SPEC_6, "enclosure:specific_6"}, > + {PCI_NPEM_IND_SPEC_7, "enclosure:specific_7"}, > + {0, NULL} > +}; > + > +#define for_each_indication(ind, inds) \ > + for (ind = inds; ind->bit; ind++) > + > +/* To avoid confusion, do not keep any special bits in indications */ I am confused by this comment. What "special bits" is this referring to? Make sure comments are something that can help you remember 5 to 10 years what this function is doing. > +static u32 reg_to_indications(u32 caps, const struct indication *inds) > +{ > + const struct indication *ind; > + u32 supported_indications = 0; > + > + for_each_indication(ind, inds) > + supported_indications |= ind->bit; > + > + return caps & supported_indications; > +} > + > +/** > + * struct npem_led - LED details > + * @indication: indication details > + * @npem: npem device > + * @name: LED name > + * @led: LED device > + */ > +struct npem_led { > + const struct indication *indication; > + struct npem *npem; > + char name[LED_MAX_NAME_SIZE]; > + struct led_classdev led; > +}; > + > +/** > + * struct npem_ops - backend specific callbacks > + * @inds: supported indications array > + * @get_active_indications: get active indications > + * npem: npem device > + * buf: response buffer > + * @set_active_indications: set new indications > + * npem: npem device > + * val: bit mask to set > + * > + * Handle communication with hardware. set_active_indications updates > + * npem->active_indications. > + */ > +struct npem_ops { > + const struct indication *inds; @inds is not an operation, it feels like something that belongs as another member in 'struct npem'. What drove this data to join 'struct npem_ops'? > + int (*get_active_indications)(struct npem *npem, u32 *buf); > + int (*set_active_indications)(struct npem *npem, u32 val); > +}; > + > +/** > + * struct npem - NPEM device properties > + * @dev: PCIe device this driver is attached to > + * @ops: Backend specific callbacks > + * @npem_lock: to keep concurrent updates serialized > + * @pos: NPEM capability offset (only relevant for NPEM direct register access, > + * not DSM access method) > + * @supported_indications: bit mask of supported indications > + * non-indication and reserved bits are cleared > + * @active_indications: bit mask of active indications > + * non-indication and reserved bits are cleared > + * @led_cnt: Supported LEDs count > + * @leds: supported LEDs It would help to describe the locking model here since the comment for @npem_lock is not adding anything useful. > + */ > +struct npem { > + struct pci_dev *dev; > + const struct npem_ops *ops; > + struct mutex npem_lock; > + u16 pos; > + u32 supported_indications; > + u32 active_indications; > + int led_cnt; > + struct npem_led leds[]; > +}; > + > +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val) > +{ > + int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val); > + > + return pcibios_err_to_errno(ret); > +} > + > +static int npem_write_ctrl(struct npem *npem, u32 reg) > +{ > + int pos = npem->pos + PCI_NPEM_CTRL; > + int ret = pci_write_config_dword(npem->dev, pos, reg); > + > + return pcibios_err_to_errno(ret); > +} > + > +static int npem_get_active_indications(struct npem *npem, u32 *buf) > +{ > + int ret; > + > + ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf); > + if (ret) > + return ret; > + > + /* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */ > + if (!(*buf & PCI_NPEM_CTRL_ENABLE)) It feels odd to reuse @buf like this. I would have a local @ctrl variable and before returning do: *buf = ctrl; > + *buf = 0; > + > + /* Filter out not supported indications in response */ > + *buf &= npem->supported_indications; > + return 0; > +} > + > +static int npem_set_active_indications(struct npem *npem, u32 val) > +{ > + int ret, ret_val; > + u32 cc_status; > + > + lockdep_assert_held(&npem->npem_lock); > + > + /* This bit is always required */ > + val |= PCI_NPEM_CTRL_ENABLE; > + ret = npem_write_ctrl(npem, val); @val is too generic a name when it is known that this the control register value, how about call it @ctrl directly? > + if (ret) > + return ret; > + > + /* > + * For the case where a NPEM command has not completed immediately, > + * it is recommended that software not continuously “spin” on polling > + * the status register, but rather poll under interrupt at a reduced > + * rate; for example at 10 ms intervals. I think the use of read_poll_timeout() obviates the need for this comment. I.e. read_poll_timeout() is self documenting. > + * > + * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM > + * Command Completed" > + */ > + ret = read_poll_timeout(npem_read_reg, ret_val, > + ret_val || (cc_status & PCI_NPEM_STATUS_CC), > + 10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem, > + PCI_NPEM_STATUS, &cc_status); > + if (ret) > + return ret_val; > + > + /* > + * All writes to control register, including writes that do not change > + * the register value, are NPEM commands and should eventually result > + * in a command completion indication in the NPEM Status Register. > + * > + * PCIe Base Specification r6.1 sec 7.9.19.3 > + * > + * Register may not be updated, or other conflicting bits may be > + * cleared. Spec is not strict here. Read NPEM Control register after > + * write to keep cache in-sync. > + */ > + return npem_get_active_indications(npem, &npem->active_indications); > +} > + > +static const struct npem_ops npem_ops = { > + .inds = npem_indications, > + .get_active_indications = npem_get_active_indications, > + .set_active_indications = npem_set_active_indications, > +}; > + > +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b, 0x8c, 0xb7, 0x74, 0x7e,\ > + 0xd5, 0x1e, 0x19, 0x4d) > +#define GET_SUPPORTED_STATES_DSM BIT(1) > +#define GET_STATE_DSM BIT(2) > +#define SET_STATE_DSM BIT(3) > + > +static const guid_t dsm_guid = DSM_GUID; > + > +static bool npem_has_dsm(struct pci_dev *pdev) > +{ > + acpi_handle handle; > + > + handle = ACPI_HANDLE(&pdev->dev); > + if (!handle) > + return false; > + > + return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM | > + GET_STATE_DSM | SET_STATE_DSM); > +} > + > +/* > + * This function does not use ops->get_active_indications(). > + * It returns cached value, npem->npem_lock is held and it is safe. What about this function would make a reader think it was unsafe? It is evident that this function does not call ops->get_active_indications(), just as it is evident that it does not call kfree(). Would a better description be: "The status of each indicator is cached at init time and only updated at write time. brightness_get() is only responsible for reflecting the last written/cached value." > + */ > +static enum led_brightness brightness_get(struct led_classdev *led) > +{ > + struct npem_led *nled = container_of(led, struct npem_led, led); > + struct npem *npem = nled->npem; > + int ret, val = LED_OFF; > + > + ret = mutex_lock_interruptible(&npem->npem_lock); > + if (ret) > + return ret; > + > + if (npem->active_indications & nled->indication->bit) > + val = LED_ON; > + > + mutex_unlock(&npem->npem_lock); > + > + return val; > +} > + > +static int brightness_set(struct led_classdev *led, > + enum led_brightness brightness) > +{ > + struct npem_led *nled = container_of(led, struct npem_led, led); > + struct npem *npem = nled->npem; > + u32 indications; > + int ret; > + > + ret = mutex_lock_interruptible(&npem->npem_lock); > + if (ret) > + return ret; > + > + if (brightness == LED_OFF) > + indications = npem->active_indications & ~(nled->indication->bit); > + else > + indications = npem->active_indications | nled->indication->bit; > + > + ret = npem->ops->set_active_indications(npem, indications); > + > + mutex_unlock(&npem->npem_lock); > + return ret; > +} > + > +static void npem_free(struct npem *npem) > +{ > + struct npem_led *nled; > + int cnt; > + > + for (cnt = 0; cnt < npem->led_cnt; cnt++) { > + nled = &npem->leds[cnt]; > + > + if (nled->name[0]) > + led_classdev_unregister(&nled->led); > + } > + > + mutex_destroy(&npem->npem_lock); > + kfree(npem); > +} > + > +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled) > +{ > + struct led_classdev *led = &nled->led; > + struct led_init_data init_data = {}; > + char *name = nled->name; > + int ret; > + > + init_data.devicename = pci_name(npem->dev); > + init_data.default_label = nled->indication->name; > + > + ret = led_compose_name(&npem->dev->dev, &init_data, name); > + if (ret) > + return ret; > + > + led->name = name; > + led->brightness_set_blocking = brightness_set; > + led->brightness_get = brightness_get; > + led->max_brightness = LED_ON; > + led->default_trigger = "none"; > + led->flags = 0; > + > + ret = led_classdev_register(&npem->dev->dev, led); > + if (ret) > + /* Clear the name to indicate that it is not registered. */ > + name[0] = 0; > + return ret; > +} > + > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, > + int pos, u32 caps) > +{ > + u32 supported = reg_to_indications(caps, ops->inds); > + int supported_cnt = hweight32(supported); > + const struct indication *indication; > + struct npem_led *nled; > + struct npem *npem; > + int led_idx = 0; > + u32 active; > + int ret; > + > + npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL); > + > + if (!npem) > + return -ENOMEM; > + > + npem->supported_indications = supported; > + npem->led_cnt = supported_cnt; > + npem->pos = pos; > + npem->dev = dev; > + npem->ops = ops; > + > + ret = ops->get_active_indications(npem, &active); > + if (ret) { > + npem_free(npem); > + return -EACCES; get_active_indications() returns an errno, why translate to "Permission denied" which is likely to make someone go cross-eyed wondering why root is getting "Permission denied". > + } > + > + npem->active_indications = reg_to_indications(active, ops->inds); > + > + /* > + * Do not take npem->npem_lock, get_brightness() is called on > + * registration path. > + */ Delete this comment. Just put a lockdep_assert_held() in get_active_indications() and take the lock for caching the initial state. > + mutex_init(&npem->npem_lock); > + > + for_each_indication(indication, npem_indications) { > + if ((npem->supported_indications & indication->bit) == 0) > + /* Do not register unsupported indication */ > + continue; > + > + nled = &npem->leds[led_idx++]; > + nled->indication = indication; > + nled->npem = npem; > + > + ret = pci_npem_set_led_classdev(npem, nled); > + if (ret) { > + npem_free(npem); > + return ret; > + } > + } > + > + dev->npem = npem; > + return 0; > +} > + > +void pci_npem_remove(struct pci_dev *dev) > +{ > + if (dev->npem) > + npem_free(dev->npem); Just put a NULL check in npem_free(). > +} > + > +void pci_npem_create(struct pci_dev *dev) > +{ > + const struct npem_ops *ops = &npem_ops; > + int pos = 0, ret; > + u32 cap; > + > + if (!npem_has_dsm(dev)) { > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); > + if (pos == 0) > + return; > + > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > + return; > + } else { > + /* > + * OS should use the DSM for LED control if it is available > + * PCI Firmware Spec r3.3 sec 4.7. > + */ > + return; I assume this is a TODO for the next patch that add DSM support? > + } > + > + ret = pci_npem_init(dev, ops, pos, cap); > + if (ret) > + pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n", > + ret); > +} > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fd44565c4756..9dea8c7353ab 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { } > static inline void pci_doe_disconnected(struct pci_dev *pdev) { } > #endif > > +#ifdef CONFIG_PCI_NPEM > +void pci_npem_create(struct pci_dev *dev); > +void pci_npem_remove(struct pci_dev *dev); > +#else > +static inline void pci_npem_create(struct pci_dev *dev) { } > +static inline void pci_npem_remove(struct pci_dev *dev) { } > +#endif > + > /** > * pci_dev_set_io_state - Set the new error state if possible. > * > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 8e696e547565..b8ea6353e27a 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > dev->match_driver = false; > ret = device_add(&dev->dev); > WARN_ON(ret < 0); > + > + pci_npem_create(dev); > } > > struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index d749ea8250d6..1436f9cf1fea 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev) > if (!dev->dev.kobj.parent) > return; > > + pci_npem_remove(dev); > + > device_del(&dev->dev); > > down_write(&pci_bus_sem); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index fb004fd4e889..c327c2dd4527 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -517,6 +517,9 @@ struct pci_dev { > #endif > #ifdef CONFIG_PCI_DOE > struct xarray doe_mbs; /* Data Object Exchange mailboxes */ > +#endif > +#ifdef CONFIG_PCI_NPEM > + struct npem *npem; /* Native PCIe Enclosure Management */ > #endif > u16 acs_cap; /* ACS Capability offset */ > phys_addr_t rom; /* Physical address if not from BAR */ > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index 94c00996e633..97e8b7ed9998 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -740,6 +740,7 @@ > #define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > +#define PCI_EXT_CAP_ID_NPEM 0x29 /* Native PCIe Enclosure Management */ > #define PCI_EXT_CAP_ID_PL_32GT 0x2A /* Physical Layer 32.0 GT/s */ > #define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE > @@ -1121,6 +1122,39 @@ > #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 > #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 > > +/* Native PCIe Enclosure Management */ > +#define PCI_NPEM_CAP 0x04 /* NPEM capability register */ > +#define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ > + > +#define PCI_NPEM_CTRL 0x08 /* NPEM control register */ > +#define PCI_NPEM_CTRL_ENABLE 0x00000001 /* NPEM Enable */ > + > +#define PCI_NPEM_STATUS 0x0c /* NPEM status register */ > +#define PCI_NPEM_STATUS_CC 0x00000001 /* NPEM Command completed */ > +/* > + * Native PCIe Enclosure Management indication bits and Reset command bit > + * are corresponding for capability and control registers. > + */ > +#define PCI_NPEM_CMD_RESET 0x00000002 /* NPEM Reset Command */ > +#define PCI_NPEM_IND_OK 0x00000004 /* NPEM indication OK */ > +#define PCI_NPEM_IND_LOCATE 0x00000008 /* NPEM indication Locate */ > +#define PCI_NPEM_IND_FAIL 0x00000010 /* NPEM indication Fail */ > +#define PCI_NPEM_IND_REBUILD 0x00000020 /* NPEM indication Rebuild */ > +#define PCI_NPEM_IND_PFA 0x00000040 /* NPEM indication Predicted Failure Analysis */ > +#define PCI_NPEM_IND_HOTSPARE 0x00000080 /* NPEM indication Hot Spare */ > +#define PCI_NPEM_IND_ICA 0x00000100 /* NPEM indication In Critical Array */ > +#define PCI_NPEM_IND_IFA 0x00000200 /* NPEM indication In Failed Array */ > +#define PCI_NPEM_IND_IDT 0x00000400 /* NPEM indication Invalid Device Type */ > +#define PCI_NPEM_IND_DISABLED 0x00000800 /* NPEM indication Disabled */ > +#define PCI_NPEM_IND_SPEC_0 0x00800000 > +#define PCI_NPEM_IND_SPEC_1 0x01000000 > +#define PCI_NPEM_IND_SPEC_2 0x02000000 > +#define PCI_NPEM_IND_SPEC_3 0x04000000 > +#define PCI_NPEM_IND_SPEC_4 0x08000000 > +#define PCI_NPEM_IND_SPEC_5 0x10000000 > +#define PCI_NPEM_IND_SPEC_6 0x20000000 > +#define PCI_NPEM_IND_SPEC_7 0x40000000 Given no other driver needs this, I would define them locally in drivers/pci/npem.c.
On Tue, May 28, 2024 at 10:21:10PM -0700, Dan Williams wrote: > Mariusz Tkaczyk wrote: > > +config PCI_NPEM > > + bool "Native PCIe Enclosure Management" > > + depends on LEDS_CLASS=y > > I would have expected > > depends on NEW_LEDS > select LEDS_CLASS Hm, a quick "git grep -C 2 'depends on NEW_LEDS'" shows that noone else does that. Everyone else either selects both NEW_LEDS and LEDS_CLASS or depends on both or depends on just LEDS_CLASS. (Since LEDS_CLASS is constrained to "if NEW_LEDS", depending on both seems pointless, so I'm not sure why some people do that.) I guess it would be good to get guidance from leds maintainers what the preferred modus operandi is. > > +#define for_each_indication(ind, inds) \ > > + for (ind = inds; ind->bit; ind++) > > + > > +/* To avoid confusion, do not keep any special bits in indications */ > > I am confused by this comment. What "special bits" is this referring to? I think it's referring to bit 0 in the Status and Control register, which is a master "NPEM Capable" and "NPEM Enable" bit. > > +struct npem_ops { > > + const struct indication *inds; > > @inds is not an operation, it feels like something that belongs as > another member in 'struct npem'. What drove this data to join 'struct > npem_ops'? The native NPEM register interface supports enclosure-specific indications which the DSM interface does not support. So those indications are present in the native npem_ops->inds and not present in the DSM npem_ops->inds. > > --- a/include/uapi/linux/pci_regs.h > > +++ b/include/uapi/linux/pci_regs.h [...] > > +#define PCI_NPEM_IND_SPEC_0 0x00800000 > > +#define PCI_NPEM_IND_SPEC_1 0x01000000 > > +#define PCI_NPEM_IND_SPEC_2 0x02000000 > > +#define PCI_NPEM_IND_SPEC_3 0x04000000 > > +#define PCI_NPEM_IND_SPEC_4 0x08000000 > > +#define PCI_NPEM_IND_SPEC_5 0x10000000 > > +#define PCI_NPEM_IND_SPEC_6 0x20000000 > > +#define PCI_NPEM_IND_SPEC_7 0x40000000 > > Given no other driver needs this, I would define them locally in > drivers/pci/npem.c. This is a uapi header, so could be used not just by other drivers but by user space. It's common to add spec-defined register bits to this header file even if they're only used by a single source file in the kernel. Thanks, Lukas
Hi, Thanks for feedback Dan! On Wed, 29 May 2024 11:38:12 +0200 Lukas Wunner <lukas@wunner.de> wrote: > On Tue, May 28, 2024 at 10:21:10PM -0700, Dan Williams wrote: > > Mariusz Tkaczyk wrote: > > > +config PCI_NPEM > > > + bool "Native PCIe Enclosure Management" > > > + depends on LEDS_CLASS=y > > > > I would have expected > > > > depends on NEW_LEDS > > select LEDS_CLASS > > Hm, a quick "git grep -C 2 'depends on NEW_LEDS'" shows that noone else > does that. Everyone else either selects both NEW_LEDS and LEDS_CLASS > or depends on both or depends on just LEDS_CLASS. > > (Since LEDS_CLASS is constrained to "if NEW_LEDS", depending on both > seems pointless, so I'm not sure why some people do that.) > > I guess it would be good to get guidance from leds maintainers what > the preferred modus operandi is. Pavel, could you please advice? I have no clue which way I should take so I prefer to keep current approach. > > > > > +#define for_each_indication(ind, inds) \ > > > + for (ind = inds; ind->bit; ind++) > > > + > > > +/* To avoid confusion, do not keep any special bits in indications */ > > > > I am confused by this comment. What "special bits" is this referring to? > > I think it's referring to bit 0 in the Status and Control register, > which is a master "NPEM Capable" and "NPEM Enable" bit. Yes, there are 2 special bits for capability/control NPEM_CAP_CAPABLE/NPEM_ENABLE and NPEM_CAP_RESET/NPEM_RESET. I wanted to highlight that these bits are not included in the cache. I will try to make it more precise in v3. > > > > > +struct npem_ops { > > > + const struct indication *inds; > > > > @inds is not an operation, it feels like something that belongs as > > another member in 'struct npem'. What drove this data to join 'struct > > npem_ops'? > > The native NPEM register interface supports enclosure-specific indications > which the DSM interface does not support. So those indications are > present in the native npem_ops->inds and not present in the DSM > npem_ops->inds. Yes, I need to differentiate DSM and NPEM indications. DSM has own indications list. > > > > > --- a/include/uapi/linux/pci_regs.h > > > +++ b/include/uapi/linux/pci_regs.h > [...] > > > +#define PCI_NPEM_IND_SPEC_0 0x00800000 > > > +#define PCI_NPEM_IND_SPEC_1 0x01000000 > > > +#define PCI_NPEM_IND_SPEC_2 0x02000000 > > > +#define PCI_NPEM_IND_SPEC_3 0x04000000 > > > +#define PCI_NPEM_IND_SPEC_4 0x08000000 > > > +#define PCI_NPEM_IND_SPEC_5 0x10000000 > > > +#define PCI_NPEM_IND_SPEC_6 0x20000000 > > > +#define PCI_NPEM_IND_SPEC_7 0x40000000 > > > > Given no other driver needs this, I would define them locally in > > drivers/pci/npem.c. > > This is a uapi header, so could be used not just by other drivers > but by user space. > > It's common to add spec-defined register bits to this header file > even if they're only used by a single source file in the kernel. > I will stay with current state while waiting for Bjorn's voice here. I will send v3 with fixes requested by Dan and Ilpo but I still need Stuart feedback on DSM patch. Thanks, Mariusz
On 6/12/2024 6:40 AM, Mariusz Tkaczyk wrote: > Hi, > Thanks for feedback Dan! > > On Wed, 29 May 2024 11:38:12 +0200 > Lukas Wunner <lukas@wunner.de> wrote: > >> On Tue, May 28, 2024 at 10:21:10PM -0700, Dan Williams wrote: >>> Mariusz Tkaczyk wrote: >>>> +config PCI_NPEM >>>> + bool "Native PCIe Enclosure Management" >>>> + depends on LEDS_CLASS=y >>> >>> I would have expected >>> >>> depends on NEW_LEDS >>> select LEDS_CLASS >> >> Hm, a quick "git grep -C 2 'depends on NEW_LEDS'" shows that noone else >> does that. Everyone else either selects both NEW_LEDS and LEDS_CLASS >> or depends on both or depends on just LEDS_CLASS. >> >> (Since LEDS_CLASS is constrained to "if NEW_LEDS", depending on both >> seems pointless, so I'm not sure why some people do that.) >> >> I guess it would be good to get guidance from leds maintainers what >> the preferred modus operandi is. > > Pavel, could you please advice? > I have no clue which way I should take so I prefer to keep current approach. > >> >> >>>> +#define for_each_indication(ind, inds) \ >>>> + for (ind = inds; ind->bit; ind++) >>>> + >>>> +/* To avoid confusion, do not keep any special bits in indications */ >>> >>> I am confused by this comment. What "special bits" is this referring to? >> >> I think it's referring to bit 0 in the Status and Control register, >> which is a master "NPEM Capable" and "NPEM Enable" bit. > > Yes, there are 2 special bits for capability/control > NPEM_CAP_CAPABLE/NPEM_ENABLE and NPEM_CAP_RESET/NPEM_RESET. > > I wanted to highlight that these bits are not included in the cache. I will try > to make it more precise in v3. > >> >> >>>> +struct npem_ops { >>>> + const struct indication *inds; >>> >>> @inds is not an operation, it feels like something that belongs as >>> another member in 'struct npem'. What drove this data to join 'struct >>> npem_ops'? >> >> The native NPEM register interface supports enclosure-specific indications >> which the DSM interface does not support. So those indications are >> present in the native npem_ops->inds and not present in the DSM >> npem_ops->inds. > > Yes, I need to differentiate DSM and NPEM indications. DSM has own indications > list. > >> >> >>>> --- a/include/uapi/linux/pci_regs.h >>>> +++ b/include/uapi/linux/pci_regs.h >> [...] >>>> +#define PCI_NPEM_IND_SPEC_0 0x00800000 >>>> +#define PCI_NPEM_IND_SPEC_1 0x01000000 >>>> +#define PCI_NPEM_IND_SPEC_2 0x02000000 >>>> +#define PCI_NPEM_IND_SPEC_3 0x04000000 >>>> +#define PCI_NPEM_IND_SPEC_4 0x08000000 >>>> +#define PCI_NPEM_IND_SPEC_5 0x10000000 >>>> +#define PCI_NPEM_IND_SPEC_6 0x20000000 >>>> +#define PCI_NPEM_IND_SPEC_7 0x40000000 >>> >>> Given no other driver needs this, I would define them locally in >>> drivers/pci/npem.c. >> >> This is a uapi header, so could be used not just by other drivers >> but by user space. >> >> It's common to add spec-defined register bits to this header file >> even if they're only used by a single source file in the kernel. >> > > I will stay with current state while waiting for Bjorn's voice here. > > I will send v3 with fixes requested by Dan and Ilpo but I still need Stuart > feedback on DSM patch. > > Thanks, > Mariusz I'm working on testing this now, sorry for the delay. Thanks! Stuart
On Tue, 28 May 2024 16:55:16 +0300 (EEST) Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > +#define PCI_NPEM_IND_SPEC_0 0x00800000 > > +#define PCI_NPEM_IND_SPEC_1 0x01000000 > > +#define PCI_NPEM_IND_SPEC_2 0x02000000 > > +#define PCI_NPEM_IND_SPEC_3 0x04000000 > > +#define PCI_NPEM_IND_SPEC_4 0x08000000 > > +#define PCI_NPEM_IND_SPEC_5 0x10000000 > > +#define PCI_NPEM_IND_SPEC_6 0x20000000 > > +#define PCI_NPEM_IND_SPEC_7 0x40000000 > > Are these off-by-one, shouldn't that field go all the way to the last bit > which is 0x80000000 (the field is 31:24)?? Yes, I made a mistake. Good catch! Mariusz
On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote: > Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows > managing LED in storage enclosures. NPEM is indication oriented > and it does not give direct access to LED. Although each of > the indications *could* represent an individual LED, multiple > indications could also be represented as a single, > multi-color LED or a single LED blinking in a specific interval. > The specification leaves that open. > > Each enabled indication (capability register bit on) is represented as a > ledclass_dev which can be controlled through sysfs. For every ledclass > device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0). > It is corresponding to NPEM control register (Indication bit on/off). > > Ledclass devices appear in sysfs as child devices (subdirectory) of PCI > device which has an NPEM Extended Capability and indication is enabled > in NPEM capability register. For example, these are leds created for > pcieport "10000:02:05.0" on my setup: > > leds/ > ├── 10000:02:05.0:enclosure:fail > ├── 10000:02:05.0:enclosure:locate > ├── 10000:02:05.0:enclosure:ok > └── 10000:02:05.0:enclosure:rebuild > > They can be also found in "/sys/class/leds" directory. Parent PCIe device > bdf is used to guarantee uniqueness across leds subsystem. > > To enable/disable fail indication "brightness" file can be edited: > echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness > echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness > > PCIe r6.1, sec 7.9.19.2 defines the possible indications. > > Multiple indications for same parent PCIe device can conflict and > hardware may update them when processing new request. To avoid issues, > driver refresh all indications by reading back control register. > > Driver is projected to be exclusive NPEM extended capability manager. > It waits up to 1 second after imposing new request, it doesn't verify if > controller is busy before write, assuming that mutex lock gives protection > from concurrent updates. Driver is not registered if _DSM LED management > is available. > > NPEM is a PCIe extended capability so it should be registered in > pcie_init_capabilities() but it is not possible due to LED dependency. > Parent pci_device must be added earlier for led_classdev_register() > to be successful. NPEM does not require configuration on kernel side, it > is safe to register LED devices later. > > Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1] > Suggested-by: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > --- > drivers/pci/Kconfig | 9 + > drivers/pci/Makefile | 1 + > drivers/pci/npem.c | 410 ++++++++++++++++++++++++++++++++++ > drivers/pci/pci.h | 8 + > drivers/pci/probe.c | 2 + > drivers/pci/remove.c | 2 + > include/linux/pci.h | 3 + > include/uapi/linux/pci_regs.h | 34 +++ > 8 files changed, 469 insertions(+) > create mode 100644 drivers/pci/npem.c > > diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig > index d35001589d88..e696e69ad516 100644 > --- a/drivers/pci/Kconfig > +++ b/drivers/pci/Kconfig > @@ -143,6 +143,15 @@ config PCI_IOV > > If unsure, say N. > > +config PCI_NPEM > + bool "Native PCIe Enclosure Management" > + depends on LEDS_CLASS=y > + help > + Support for Native PCIe Enclosure Management. It allows managing LED > + indications in storage enclosures. Enclosure must support following > + indications: OK, Locate, Fail, Rebuild, other indications are > + optional. > + > config PCI_PRI > bool "PCI PRI support" > select PCI_ATS > diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile > index 175302036890..cd5f655d4be9 100644 > --- a/drivers/pci/Makefile > +++ b/drivers/pci/Makefile > @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o > obj-$(CONFIG_VGA_ARB) += vgaarb.o > obj-$(CONFIG_PCI_DOE) += doe.o > obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o > +obj-$(CONFIG_PCI_NPEM) += npem.o > > # Endpoint library must be initialized before its users > obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ > diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c > new file mode 100644 > index 000000000000..a76a2044dab2 > --- /dev/null > +++ b/drivers/pci/npem.c > @@ -0,0 +1,410 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * PCIe Enclosure management driver created for LED interfaces based on > + * indications. It says *what indications* blink but does not specify *how* > + * they blink - it is hardware defined. > + * > + * The driver name refers to Native PCIe Enclosure Management. It is > + * first indication oriented standard with specification. > + * > + * Native PCIe Enclosure Management (NPEM) > + * PCIe Base Specification r6.1 sec 6.28 > + * PCIe Base Specification r6.1 sec 7.9.19 > + * > + * Copyright (c) 2023-2024 Intel Corporation > + * Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > + */ > + > +#include <linux/acpi.h> > +#include <linux/bitops.h> > +#include <linux/errno.h> > +#include <linux/iopoll.h> > +#include <linux/leds.h> > +#include <linux/mutex.h> > +#include <linux/pci.h> > +#include <linux/pci_regs.h> > +#include <linux/types.h> > +#include <linux/uleds.h> > + > +#include "pci.h" > + > +struct indication { > + u32 bit; > + const char *name; > +}; > + > +static const struct indication npem_indications[] = { > + {PCI_NPEM_IND_OK, "enclosure:ok"}, > + {PCI_NPEM_IND_LOCATE, "enclosure:locate"}, > + {PCI_NPEM_IND_FAIL, "enclosure:fail"}, > + {PCI_NPEM_IND_REBUILD, "enclosure:rebuild"}, > + {PCI_NPEM_IND_PFA, "enclosure:pfa"}, > + {PCI_NPEM_IND_HOTSPARE, "enclosure:hotspare"}, > + {PCI_NPEM_IND_ICA, "enclosure:ica"}, > + {PCI_NPEM_IND_IFA, "enclosure:ifa"}, > + {PCI_NPEM_IND_IDT, "enclosure:idt"}, > + {PCI_NPEM_IND_DISABLED, "enclosure:disabled"}, > + {PCI_NPEM_IND_SPEC_0, "enclosure:specific_0"}, > + {PCI_NPEM_IND_SPEC_1, "enclosure:specific_1"}, > + {PCI_NPEM_IND_SPEC_2, "enclosure:specific_2"}, > + {PCI_NPEM_IND_SPEC_3, "enclosure:specific_3"}, > + {PCI_NPEM_IND_SPEC_4, "enclosure:specific_4"}, > + {PCI_NPEM_IND_SPEC_5, "enclosure:specific_5"}, > + {PCI_NPEM_IND_SPEC_6, "enclosure:specific_6"}, > + {PCI_NPEM_IND_SPEC_7, "enclosure:specific_7"}, > + {0, NULL} > +}; > + > +#define for_each_indication(ind, inds) \ > + for (ind = inds; ind->bit; ind++) > + > +/* To avoid confusion, do not keep any special bits in indications */ > +static u32 reg_to_indications(u32 caps, const struct indication *inds) > +{ > + const struct indication *ind; > + u32 supported_indications = 0; > + > + for_each_indication(ind, inds) > + supported_indications |= ind->bit; > + > + return caps & supported_indications; > +} > + > +/** > + * struct npem_led - LED details > + * @indication: indication details > + * @npem: npem device > + * @name: LED name > + * @led: LED device > + */ > +struct npem_led { > + const struct indication *indication; > + struct npem *npem; > + char name[LED_MAX_NAME_SIZE]; > + struct led_classdev led; > +}; > + > +/** > + * struct npem_ops - backend specific callbacks > + * @inds: supported indications array > + * @get_active_indications: get active indications > + * npem: npem device > + * buf: response buffer > + * @set_active_indications: set new indications > + * npem: npem device > + * val: bit mask to set > + * > + * Handle communication with hardware. set_active_indications updates > + * npem->active_indications. > + */ > +struct npem_ops { > + const struct indication *inds; > + int (*get_active_indications)(struct npem *npem, u32 *buf); > + int (*set_active_indications)(struct npem *npem, u32 val); > +}; > + > +/** > + * struct npem - NPEM device properties > + * @dev: PCIe device this driver is attached to > + * @ops: Backend specific callbacks > + * @npem_lock: to keep concurrent updates serialized > + * @pos: NPEM capability offset (only relevant for NPEM direct register access, > + * not DSM access method) > + * @supported_indications: bit mask of supported indications > + * non-indication and reserved bits are cleared > + * @active_indications: bit mask of active indications > + * non-indication and reserved bits are cleared > + * @led_cnt: Supported LEDs count > + * @leds: supported LEDs > + */ > +struct npem { > + struct pci_dev *dev; > + const struct npem_ops *ops; > + struct mutex npem_lock; > + u16 pos; > + u32 supported_indications; > + u32 active_indications; > + int led_cnt; > + struct npem_led leds[]; > +}; > + > +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val) > +{ > + int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val); > + > + return pcibios_err_to_errno(ret); > +} > + > +static int npem_write_ctrl(struct npem *npem, u32 reg) > +{ > + int pos = npem->pos + PCI_NPEM_CTRL; > + int ret = pci_write_config_dword(npem->dev, pos, reg); > + > + return pcibios_err_to_errno(ret); > +} > + > +static int npem_get_active_indications(struct npem *npem, u32 *buf) > +{ > + int ret; > + > + ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf); > + if (ret) > + return ret; > + > + /* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */ > + if (!(*buf & PCI_NPEM_CTRL_ENABLE)) > + *buf = 0; > + > + /* Filter out not supported indications in response */ > + *buf &= npem->supported_indications; > + return 0; > +} > + > +static int npem_set_active_indications(struct npem *npem, u32 val) > +{ > + int ret, ret_val; > + u32 cc_status; > + > + lockdep_assert_held(&npem->npem_lock); > + > + /* This bit is always required */ > + val |= PCI_NPEM_CTRL_ENABLE; > + ret = npem_write_ctrl(npem, val); > + if (ret) > + return ret; > + > + /* > + * For the case where a NPEM command has not completed immediately, > + * it is recommended that software not continuously “spin” on polling > + * the status register, but rather poll under interrupt at a reduced > + * rate; for example at 10 ms intervals. > + * > + * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM > + * Command Completed" > + */ > + ret = read_poll_timeout(npem_read_reg, ret_val, > + ret_val || (cc_status & PCI_NPEM_STATUS_CC), > + 10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem, > + PCI_NPEM_STATUS, &cc_status); > + if (ret) > + return ret_val; > + > + /* > + * All writes to control register, including writes that do not change > + * the register value, are NPEM commands and should eventually result > + * in a command completion indication in the NPEM Status Register. > + * > + * PCIe Base Specification r6.1 sec 7.9.19.3 > + * > + * Register may not be updated, or other conflicting bits may be > + * cleared. Spec is not strict here. Read NPEM Control register after > + * write to keep cache in-sync. > + */ > + return npem_get_active_indications(npem, &npem->active_indications); > +} > + > +static const struct npem_ops npem_ops = { > + .inds = npem_indications, > + .get_active_indications = npem_get_active_indications, > + .set_active_indications = npem_set_active_indications, > +}; > + > +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b, 0x8c, 0xb7, 0x74, 0x7e,\ > + 0xd5, 0x1e, 0x19, 0x4d) > +#define GET_SUPPORTED_STATES_DSM BIT(1) > +#define GET_STATE_DSM BIT(2) > +#define SET_STATE_DSM BIT(3) > + > +static const guid_t dsm_guid = DSM_GUID; > + > +static bool npem_has_dsm(struct pci_dev *pdev) > +{ > + acpi_handle handle; > + > + handle = ACPI_HANDLE(&pdev->dev); > + if (!handle) > + return false; > + > + return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM | > + GET_STATE_DSM | SET_STATE_DSM); > +} > + I had to change the DSM definitions to be 1/2/3 rather than BIT(1)/BIT(2)/BIT(3), since the numbers 1/2/3 (not 2/4/8) need to be passed to acpi_evaluate_dsm_typed()... I just changed it to: #define GET_SUPPORTED_STATES_DSM 1 #define GET_STATE_DSM 2 #define SET_STATE_DSM 3 ...and return acpi_check_dsm(handle, &dsm_guid, 0x1, BIT(GET_SUPPORTED_STATES_DSM) | BIT(GET_STATE_DSM) | BIT(SET_STATE_DSM)); After I changed this (and one other thing, below), it worked great. > +/* > + * This function does not use ops->get_active_indications(). > + * It returns cached value, npem->npem_lock is held and it is safe. > + */ > +static enum led_brightness brightness_get(struct led_classdev *led) > +{ > + struct npem_led *nled = container_of(led, struct npem_led, led); > + struct npem *npem = nled->npem; > + int ret, val = LED_OFF; > + > + ret = mutex_lock_interruptible(&npem->npem_lock); > + if (ret) > + return ret; > + > + if (npem->active_indications & nled->indication->bit) > + val = LED_ON; > + > + mutex_unlock(&npem->npem_lock); > + > + return val; > +} > + > +static int brightness_set(struct led_classdev *led, > + enum led_brightness brightness) > +{ > + struct npem_led *nled = container_of(led, struct npem_led, led); > + struct npem *npem = nled->npem; > + u32 indications; > + int ret; > + > + ret = mutex_lock_interruptible(&npem->npem_lock); > + if (ret) > + return ret; > + > + if (brightness == LED_OFF) > + indications = npem->active_indications & ~(nled->indication->bit); > + else > + indications = npem->active_indications | nled->indication->bit; > + > + ret = npem->ops->set_active_indications(npem, indications); > + > + mutex_unlock(&npem->npem_lock); > + return ret; > +} > + > +static void npem_free(struct npem *npem) > +{ > + struct npem_led *nled; > + int cnt; > + > + for (cnt = 0; cnt < npem->led_cnt; cnt++) { > + nled = &npem->leds[cnt]; > + > + if (nled->name[0]) > + led_classdev_unregister(&nled->led); > + } > + > + mutex_destroy(&npem->npem_lock); > + kfree(npem); > +} > + > +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled) > +{ > + struct led_classdev *led = &nled->led; > + struct led_init_data init_data = {}; > + char *name = nled->name; > + int ret; > + > + init_data.devicename = pci_name(npem->dev); > + init_data.default_label = nled->indication->name; > + > + ret = led_compose_name(&npem->dev->dev, &init_data, name); > + if (ret) > + return ret; > + > + led->name = name; > + led->brightness_set_blocking = brightness_set; > + led->brightness_get = brightness_get; > + led->max_brightness = LED_ON; > + led->default_trigger = "none"; > + led->flags = 0; > + > + ret = led_classdev_register(&npem->dev->dev, led); > + if (ret) > + /* Clear the name to indicate that it is not registered. */ > + name[0] = 0; > + return ret; > +} > + > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, > + int pos, u32 caps) > +{ > + u32 supported = reg_to_indications(caps, ops->inds); > + int supported_cnt = hweight32(supported); > + const struct indication *indication; > + struct npem_led *nled; > + struct npem *npem; > + int led_idx = 0; > + u32 active; > + int ret; > + > + npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL); > + > + if (!npem) > + return -ENOMEM; > + > + npem->supported_indications = supported; > + npem->led_cnt = supported_cnt; > + npem->pos = pos; > + npem->dev = dev; > + npem->ops = ops; > + > + ret = ops->get_active_indications(npem, &active); > + if (ret) { > + npem_free(npem); > + return -EACCES; > + } Failing pci_npem_init() if this ops->get_active_indications() fails will keep this from working on most (all?) Dell servers, because the _DSM get/set functions use an IPMI operation region to get/set the active LEDs, and this is getting run before the IPMI drivers and acpi_ipmi module (which provides ACPI access to IPMI operation regions) get loaded. (GET_SUPPORTED_STATES works without IPMI.) For testing, I just changed this to ignore the error returned from get_active_indications() if it is using DSM ops. That still results in a string of errors ("ACPI Error: Region IPMI (ID=7) has no handler" etc), but once the system is up and I can log in, everything is loaded, and i can see & change brightness in sysfs. I'm not sure if ignoring the error is the best fix for that. Solutions I can think of are: (1) ignoring an error result from get_active_indications during init (as mentioned) (2) providing a mechanism to trigger this driver to rescan a PCI device later from user space (3) don't cache the active LEDs--just get the active states using NPEM/DSM when brightness is read, and do a get/modify/set when setting the brightness... then get_active_indications() wouldn't need to be called during init. Thank you for including DSM support, Mariusz! > + > + npem->active_indications = reg_to_indications(active, ops->inds); > + > + /* > + * Do not take npem->npem_lock, get_brightness() is called on > + * registration path. > + */ > + mutex_init(&npem->npem_lock); > + > + for_each_indication(indication, npem_indications) { > + if ((npem->supported_indications & indication->bit) == 0) > + /* Do not register unsupported indication */ > + continue; > + > + nled = &npem->leds[led_idx++]; > + nled->indication = indication; > + nled->npem = npem; > + > + ret = pci_npem_set_led_classdev(npem, nled); > + if (ret) { > + npem_free(npem); > + return ret; > + } > + } > + > + dev->npem = npem; > + return 0; > +} > + > +void pci_npem_remove(struct pci_dev *dev) > +{ > + if (dev->npem) > + npem_free(dev->npem); > +} > + > +void pci_npem_create(struct pci_dev *dev) > +{ > + const struct npem_ops *ops = &npem_ops; > + int pos = 0, ret; > + u32 cap; > + > + if (!npem_has_dsm(dev)) { > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); > + if (pos == 0) > + return; > + > + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || > + (cap & PCI_NPEM_CAP_CAPABLE) == 0) > + return; > + } else { > + /* > + * OS should use the DSM for LED control if it is available > + * PCI Firmware Spec r3.3 sec 4.7. > + */ > + return; > + } > + > + ret = pci_npem_init(dev, ops, pos, cap); > + if (ret) > + pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n", > + ret); > +} > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fd44565c4756..9dea8c7353ab 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { } > static inline void pci_doe_disconnected(struct pci_dev *pdev) { } > #endif > > +#ifdef CONFIG_PCI_NPEM > +void pci_npem_create(struct pci_dev *dev); > +void pci_npem_remove(struct pci_dev *dev); > +#else > +static inline void pci_npem_create(struct pci_dev *dev) { } > +static inline void pci_npem_remove(struct pci_dev *dev) { } > +#endif > + > /** > * pci_dev_set_io_state - Set the new error state if possible. > * > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 8e696e547565..b8ea6353e27a 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > dev->match_driver = false; > ret = device_add(&dev->dev); > WARN_ON(ret < 0); > + > + pci_npem_create(dev); > } > > struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index d749ea8250d6..1436f9cf1fea 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev) > if (!dev->dev.kobj.parent) > return; > > + pci_npem_remove(dev); > + > device_del(&dev->dev); > > down_write(&pci_bus_sem); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index fb004fd4e889..c327c2dd4527 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -517,6 +517,9 @@ struct pci_dev { > #endif > #ifdef CONFIG_PCI_DOE > struct xarray doe_mbs; /* Data Object Exchange mailboxes */ > +#endif > +#ifdef CONFIG_PCI_NPEM > + struct npem *npem; /* Native PCIe Enclosure Management */ > #endif > u16 acs_cap; /* ACS Capability offset */ > phys_addr_t rom; /* Physical address if not from BAR */ > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index 94c00996e633..97e8b7ed9998 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -740,6 +740,7 @@ > #define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > +#define PCI_EXT_CAP_ID_NPEM 0x29 /* Native PCIe Enclosure Management */ > #define PCI_EXT_CAP_ID_PL_32GT 0x2A /* Physical Layer 32.0 GT/s */ > #define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE > @@ -1121,6 +1122,39 @@ > #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 > #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 > > +/* Native PCIe Enclosure Management */ > +#define PCI_NPEM_CAP 0x04 /* NPEM capability register */ > +#define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ > + > +#define PCI_NPEM_CTRL 0x08 /* NPEM control register */ > +#define PCI_NPEM_CTRL_ENABLE 0x00000001 /* NPEM Enable */ > + > +#define PCI_NPEM_STATUS 0x0c /* NPEM status register */ > +#define PCI_NPEM_STATUS_CC 0x00000001 /* NPEM Command completed */ > +/* > + * Native PCIe Enclosure Management indication bits and Reset command bit > + * are corresponding for capability and control registers. > + */ > +#define PCI_NPEM_CMD_RESET 0x00000002 /* NPEM Reset Command */ > +#define PCI_NPEM_IND_OK 0x00000004 /* NPEM indication OK */ > +#define PCI_NPEM_IND_LOCATE 0x00000008 /* NPEM indication Locate */ > +#define PCI_NPEM_IND_FAIL 0x00000010 /* NPEM indication Fail */ > +#define PCI_NPEM_IND_REBUILD 0x00000020 /* NPEM indication Rebuild */ > +#define PCI_NPEM_IND_PFA 0x00000040 /* NPEM indication Predicted Failure Analysis */ > +#define PCI_NPEM_IND_HOTSPARE 0x00000080 /* NPEM indication Hot Spare */ > +#define PCI_NPEM_IND_ICA 0x00000100 /* NPEM indication In Critical Array */ > +#define PCI_NPEM_IND_IFA 0x00000200 /* NPEM indication In Failed Array */ > +#define PCI_NPEM_IND_IDT 0x00000400 /* NPEM indication Invalid Device Type */ > +#define PCI_NPEM_IND_DISABLED 0x00000800 /* NPEM indication Disabled */ > +#define PCI_NPEM_IND_SPEC_0 0x00800000 > +#define PCI_NPEM_IND_SPEC_1 0x01000000 > +#define PCI_NPEM_IND_SPEC_2 0x02000000 > +#define PCI_NPEM_IND_SPEC_3 0x04000000 > +#define PCI_NPEM_IND_SPEC_4 0x08000000 > +#define PCI_NPEM_IND_SPEC_5 0x10000000 > +#define PCI_NPEM_IND_SPEC_6 0x20000000 > +#define PCI_NPEM_IND_SPEC_7 0x40000000 > + > /* Data Object Exchange */ > #define PCI_DOE_CAP 0x04 /* DOE Capabilities Register */ > #define PCI_DOE_CAP_INT_SUP 0x00000001 /* Interrupt Support */
On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote: > On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote: > > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, > > + int pos, u32 caps) > > +{ [...] > > + ret = ops->get_active_indications(npem, &active); > > + if (ret) { > > + npem_free(npem); > > + return -EACCES; > > + } > > Failing pci_npem_init() if this ops->get_active_indications() fails > will keep this from working on most (all?) Dell servers, because the > _DSM get/set functions use an IPMI operation region to get/set the > active LEDs, and this is getting run before the IPMI drivers and > acpi_ipmi module (which provides ACPI access to IPMI operation > regions) get loaded. (GET_SUPPORTED_STATES works without IPMI.) CONFIG_ACPI_IPMI is tristate. Even if it's built-in, the module_initcall() becomes a device_initcall(). PCI enumeration happens from a subsys_initcall(), way earlier than device_initcall(). If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away? > (2) providing a mechanism to trigger this driver to rescan a PCI > device later from user space If this was a regular device driver and -EPROBE_DEFER was returned if IPMI drivers aren't loaded yet, then this would be easy to solve. But neither is the case here. Of course it's possible to exercise the "remove" and "rescan" attributes in sysfs to re-enumerate the device but that's not a great solution. > (3) don't cache the active LEDs--just get the active states using > NPEM/DSM when brightness is read, and do a get/modify/set when > setting the brightness... then get_active_indications() wouldn't > need to be called during init. Not good. The LEDs are published in sysfs from a subsys_initcall(). Brightness changes through sysfs could in theory immediately happen once they're published. If acpi_ipmi is a module and gets loaded way later, we'd still have to cope with Set State or Get State DSMs going nowhere. Thanks, Lukas
On Sat, 15 Jun 2024 12:33:45 +0200 Lukas Wunner <lukas@wunner.de> wrote: > On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote: > > On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote: > > > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, > > > + int pos, u32 caps) > > > +{ > [...] > > > + ret = ops->get_active_indications(npem, &active); > > > + if (ret) { > > > + npem_free(npem); > > > + return -EACCES; > > > + } > > > > Failing pci_npem_init() if this ops->get_active_indications() fails > > will keep this from working on most (all?) Dell servers, because the > > _DSM get/set functions use an IPMI operation region to get/set the > > active LEDs, and this is getting run before the IPMI drivers and > > acpi_ipmi module (which provides ACPI access to IPMI operation > > regions) get loaded. (GET_SUPPORTED_STATES works without IPMI.) > > CONFIG_ACPI_IPMI is tristate. Even if it's built-in, the > module_initcall() becomes a device_initcall(). > > PCI enumeration happens from a subsys_initcall(), way earlier > than device_initcall(). > > If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in > drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away? That seems to be the best option. Please test Lukas proposal and let me know. Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment about initcall)? +config PCI_NPEM + bool "Native PCIe Enclosure Management" + depends on LEDS_CLASS=y + depends on ACPI_IPMI=y > > > > (2) providing a mechanism to trigger this driver to rescan a PCI > > device later from user space > > If this was a regular device driver and -EPROBE_DEFER was returned if > IPMI drivers aren't loaded yet, then this would be easy to solve. > But neither is the case here. > > Of course it's possible to exercise the "remove" and "rescan" attributes > in sysfs to re-enumerate the device but that's not a great solution. We cannot expect from users to know and do that. If we cannot configure driver, we should not register it. We have to guarantee that IMPI commands are available at the point we are using them. There is not better place to add _DSM device than its enumeration and I have a feeling than sooner or later someone else will reach this problem so it would be better for community to solve it now. > > > > (3) don't cache the active LEDs--just get the active states using > > NPEM/DSM when brightness is read, and do a get/modify/set when > > setting the brightness... then get_active_indications() wouldn't > > need to be called during init. > > Not good. The LEDs are published in sysfs from a subsys_initcall(). > Brightness changes through sysfs could in theory immediately happen > once they're published. If acpi_ipmi is a module and gets loaded way > later, we'd still have to cope with Set State or Get State DSMs going > nowhere. > Agree. I can do it and it should be safe but it is not addressing the issue. We would limit time race window but we will not close it. Thanks, Mariusz
On Tue, Jun 18, 2024 at 10:56:53AM +0200, Mariusz Tkaczyk wrote: > On Sat, 15 Jun 2024 12:33:45 +0200 Lukas Wunner <lukas@wunner.de> wrote: >> > On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote: > > > Failing pci_npem_init() if this ops->get_active_indications() fails > > > will keep this from working on most (all?) Dell servers, because the > > > _DSM get/set functions use an IPMI operation region to get/set the > > > active LEDs, and this is getting run before the IPMI drivers and > > > acpi_ipmi module (which provides ACPI access to IPMI operation > > > regions) get loaded. (GET_SUPPORTED_STATES works without IPMI.) > > > > CONFIG_ACPI_IPMI is tristate. Even if it's built-in, the > > module_initcall() becomes a device_initcall(). > > > > PCI enumeration happens from a subsys_initcall(), way earlier > > than device_initcall(). > > > > If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in > > drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away? > > That seems to be the best option. Please test Lukas proposal and let me know. > Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment > about initcall)? > > +config PCI_NPEM > + bool "Native PCIe Enclosure Management" > + depends on LEDS_CLASS=y > + depends on ACPI_IPMI=y This would effectively disallow NPEM on non-ACPI systems. I think what you want instead is to allow either ACPI_IPMI=y or ACPI_IPMI=n, but not ACPI_IPMI=m, so: depends on ACPI_IPMI!=m Thanks, Lukas
On 6/18/2024 3:56 AM, Mariusz Tkaczyk wrote: > On Sat, 15 Jun 2024 12:33:45 +0200 > Lukas Wunner <lukas@wunner.de> wrote: > >> On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote: >>> On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote: >>>> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, >>>> + int pos, u32 caps) >>>> +{ >> [...] >>>> + ret = ops->get_active_indications(npem, &active); >>>> + if (ret) { >>>> + npem_free(npem); >>>> + return -EACCES; >>>> + } >>> >>> Failing pci_npem_init() if this ops->get_active_indications() fails >>> will keep this from working on most (all?) Dell servers, because the >>> _DSM get/set functions use an IPMI operation region to get/set the >>> active LEDs, and this is getting run before the IPMI drivers and >>> acpi_ipmi module (which provides ACPI access to IPMI operation >>> regions) get loaded. (GET_SUPPORTED_STATES works without IPMI.) >> >> CONFIG_ACPI_IPMI is tristate. Even if it's built-in, the >> module_initcall() becomes a device_initcall(). >> >> PCI enumeration happens from a subsys_initcall(), way earlier >> than device_initcall(). >> >> If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in >> drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away? > > That seems to be the best option. Please test Lukas proposal and let me know. > Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment > about initcall)? > > +config PCI_NPEM > + bool "Native PCIe Enclosure Management" > + depends on LEDS_CLASS=y > + depends on ACPI_IPMI=y > I tried it just to be sure, but changing the module_initcall() to an arch_initcall() in acpi_ipmi.c (and compiling it into the kernel) doesn't help... acpi_ipmi is loaded before npem, but the underlying IPMI drivers that acpi_ipmi uses to provide the IPMI operation region in ACPI aren't loaded until later... acpi_ipmi needs the IPMI device. I'll try to think of another solution. I don't see how to get the IPMI drivers to load before PCI devices are enumerated, so it seems to me that the only way to get it to work from the moment the LEDs show up in sysfs is to somehow delay the npem driver initialization of the LEDs (at least for _DSM) or use something to trigger a rescan later. I notice that acpi_ipmi provides a function to wait for it to have an IPMI device registered (acpi_wait_for_acpi_ipmi()), which is used by acpi_power_meter.c. It would be kind of awkward to use that... it just just waits for a completion (with a 2 second timeout)--it isn't a notifier or callback. On my system, the npem driver failed to run a _DSM at 0.72 seconds, and ipmi_si driver didn't initialize the IPMI controller until 9.54 seconds. >> >>> (2) providing a mechanism to trigger this driver to rescan a PCI >>> device later from user space >> >> If this was a regular device driver and -EPROBE_DEFER was returned if >> IPMI drivers aren't loaded yet, then this would be easy to solve. >> But neither is the case here. >> >> Of course it's possible to exercise the "remove" and "rescan" attributes >> in sysfs to re-enumerate the device but that's not a great solution. > > We cannot expect from users to know and do that. If we cannot configure driver, > we should not register it. We have to guarantee that IMPI commands are > available at the point we are using them. > > There is not better place to add _DSM device than its enumeration and I have a > feeling than sooner or later someone else will reach this problem so it would > be better for community to solve it now. > >> >> >>> (3) don't cache the active LEDs--just get the active states using >>> NPEM/DSM when brightness is read, and do a get/modify/set when >>> setting the brightness... then get_active_indications() wouldn't >>> need to be called during init. >> >> Not good. The LEDs are published in sysfs from a subsys_initcall(). >> Brightness changes through sysfs could in theory immediately happen >> once they're published. If acpi_ipmi is a module and gets loaded way >> later, we'd still have to cope with Set State or Get State DSMs going >> nowhere. >> > > Agree. I can do it and it should be safe but it is not addressing the issue. > We would limit time race window but we will not close it. > > Thanks, > Mariusz
stuart hayes wrote: > > > On 6/18/2024 3:56 AM, Mariusz Tkaczyk wrote: > > On Sat, 15 Jun 2024 12:33:45 +0200 > > Lukas Wunner <lukas@wunner.de> wrote: > > > >> On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote: > >>> On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote: > >>>> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, > >>>> + int pos, u32 caps) > >>>> +{ > >> [...] > >>>> + ret = ops->get_active_indications(npem, &active); > >>>> + if (ret) { > >>>> + npem_free(npem); > >>>> + return -EACCES; > >>>> + } > >>> > >>> Failing pci_npem_init() if this ops->get_active_indications() fails > >>> will keep this from working on most (all?) Dell servers, because the > >>> _DSM get/set functions use an IPMI operation region to get/set the > >>> active LEDs, and this is getting run before the IPMI drivers and > >>> acpi_ipmi module (which provides ACPI access to IPMI operation > >>> regions) get loaded. (GET_SUPPORTED_STATES works without IPMI.) > >> > >> CONFIG_ACPI_IPMI is tristate. Even if it's built-in, the > >> module_initcall() becomes a device_initcall(). > >> > >> PCI enumeration happens from a subsys_initcall(), way earlier > >> than device_initcall(). > >> > >> If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in > >> drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away? > > > > That seems to be the best option. Please test Lukas proposal and let me know. > > Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment > > about initcall)? > > > > +config PCI_NPEM > > + bool "Native PCIe Enclosure Management" > > + depends on LEDS_CLASS=y > > + depends on ACPI_IPMI=y > > > > I tried it just to be sure, but changing the module_initcall() to an > arch_initcall() in acpi_ipmi.c (and compiling it into the kernel) doesn't > help... acpi_ipmi is loaded before npem, but the underlying IPMI drivers > that acpi_ipmi uses to provide the IPMI operation region in ACPI aren't > loaded until later... acpi_ipmi needs the IPMI device. > > I'll try to think of another solution. I don't see how to get the IPMI > drivers to load before PCI devices are enumerated, so it seems to me that > the only way to get it to work from the moment the LEDs show up in sysfs > is to somehow delay the npem driver initialization of the LEDs (at least > for _DSM) or use something to trigger a rescan later. > > I notice that acpi_ipmi provides a function to wait for it to have an > IPMI device registered (acpi_wait_for_acpi_ipmi()), which is used by > acpi_power_meter.c. It would be kind of awkward to use that... it just > just waits for a completion (with a 2 second timeout)--it isn't a notifier > or callback. On my system, the npem driver failed to run a _DSM at > 0.72 seconds, and ipmi_si driver didn't initialize the IPMI controller > until 9.54 seconds. It strikes me that playing these initcall games is a losing battle and that this case would be best served by late loading of NPEM functionality. Something similar is happening with PCI device security where the enabling depends on a third-party driver for a platform "security-manager" (TSM) to arrive. The approach there is to make the functionality independent of device-discovery vs TSM driver load order. So, if the TSM driver is loaded early then pci_init_capabilities() can immediately enable the functionality. If the TSM driver is loaded *after* some devices have already gone through pci_init_capabilities(), then that event is responsible for doing for_each_pci_dev() to catch up on devices that missed their initial chance to turn on device security details. So, for NPEM, the thought would be to implement the same rendezvous flow, i.e. s/TSM/NPEM/. I am an overdue for a refresh of the TSM patches, but you can see the last posting here: http://lore.kernel.org/171291190324.3532867.13480405752065082171.stgit@dwillia2-xfh.jf.intel.com
On Tue, Jun 18, 2024 at 12:32:33PM -0700, Dan Williams wrote: > It strikes me that playing these initcall games is a losing battle and > that this case would be best served by late loading of NPEM > functionality. > > Something similar is happening with PCI device security where the > enabling depends on a third-party driver for a platform > "security-manager" (TSM) to arrive. > > The approach there is to make the functionality independent of > device-discovery vs TSM driver load order. So, if the TSM driver is > loaded early then pci_init_capabilities() can immediately enable the > functionality. If the TSM driver is loaded *after* some devices have already > gone through pci_init_capabilities(), then that event is responsible for > doing for_each_pci_dev() to catch up on devices that missed their > initial chance to turn on device security details. > > So, for NPEM, the thought would be to implement the same rendezvous > flow, i.e. s/TSM/NPEM/. A different viewpoint is that these issues are caused by the "division of labor" between OS kernel and platform firmware. In the NPEM case, Dell servers require the OS to call firmware to change LEDs. But before OS can do that, OS has to initialize a certain other interface with firmware. In the TSM case, Intel TDX Connect or AMD SEV-TIO require OS to ask firmware to perform certain authentication steps with devices, wherefore OS has to provide another interface to facilitate communication with the device. It's a complexity nightmare exacerbated by vendor-specific quirks. Which is why I'm arguing that firmware functionality (e.g. TDX module) should be constrained to the absolute minimum and the OS should be in control of as much as possible. That's the approach Apple has been following as it's the only way to achieve their close interplay between hardware and software without making things too complex. It seems what's keeping this series from working on Dell servers is primarily that the driver wants to read out LED status on probe. So I've recommended to Mariusz off-list to do that lazily if possible, i.e. on first read of a LED's status. Then if users do try to read or write LED status on Dell servers without loading IPMI modules first, they get to keep the pieces, sorry. :( > I am an overdue for a refresh of the TSM patches No hurry, there's a refresh of the OS-owned PCI device authentication coming up before end of this month. I'm taking my "TDX Connect heretic" hat off now. :) Thanks, Lukas
On Wed, 19 Jun 2024 11:08:26 +0200 Lukas Wunner <lukas@wunner.de> wrote: > On Tue, Jun 18, 2024 at 12:32:33PM -0700, Dan Williams wrote: > > It strikes me that playing these initcall games is a losing battle and > > that this case would be best served by late loading of NPEM > > functionality. > > > > Something similar is happening with PCI device security where the > > enabling depends on a third-party driver for a platform > > "security-manager" (TSM) to arrive. > > > > The approach there is to make the functionality independent of > > device-discovery vs TSM driver load order. So, if the TSM driver is > > loaded early then pci_init_capabilities() can immediately enable the > > functionality. If the TSM driver is loaded *after* some devices have already > > gone through pci_init_capabilities(), then that event is responsible for > > doing for_each_pci_dev() to catch up on devices that missed their > > initial chance to turn on device security details. > > > > So, for NPEM, the thought would be to implement the same rendezvous > > flow, i.e. s/TSM/NPEM/. > > A different viewpoint is that these issues are caused by the > "division of labor" between OS kernel and platform firmware. > > In the NPEM case, Dell servers require the OS to call firmware > to change LEDs. But before OS can do that, OS has to initialize > a certain other interface with firmware. > > In the TSM case, Intel TDX Connect or AMD SEV-TIO require OS to > ask firmware to perform certain authentication steps with devices, > wherefore OS has to provide another interface to facilitate > communication with the device. > > It's a complexity nightmare exacerbated by vendor-specific quirks. > > Which is why I'm arguing that firmware functionality (e.g. TDX module) > should be constrained to the absolute minimum and the OS should be > in control of as much as possible. That's the approach Apple has > been following as it's the only way to achieve their close interplay > between hardware and software without making things too complex. > > It seems what's keeping this series from working on Dell servers is > primarily that the driver wants to read out LED status on probe. > So I've recommended to Mariusz off-list to do that lazily if possible, > i.e. on first read of a LED's status. > > Then if users do try to read or write LED status on Dell servers without > loading IPMI modules first, they get to keep the pieces, sorry. :( > Initially, I thought that Dan suggestion is the best option but after taking into account use cases of the driver and times provided by Stuart - lazy loading wins. As a led application maintainer, I can accept fact that I cannot impose led for a while and errors will be reported, that is fine. I can left a hint why it is happening to user. I would be a nightmare to get new LED controller after some time if LED interface appearance is delayed. It is much worse from user perspective because no device means that I have no information in userland. I cannot determine if something is going to be up soon so I will report disks as not supported - unnecessary maintenance hell. I may receive a lot of issues. Stuart, please give me some time to apply suggestions and introduce lazy approach. I'm working on it! Thanks, Mariusz
On Tue, 28 May 2024 22:21:10 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > > + > > + /* > > + * For the case where a NPEM command has not completed immediately, > > + * it is recommended that software not continuously “spin” on > > polling > > + * the status register, but rather poll under interrupt at a > > reduced > > + * rate; for example at 10 ms intervals. > > I think the use of read_poll_timeout() obviates the need for this > comment. I.e. read_poll_timeout() is self documenting. > This is not describing how read_poll_timeout works but why I need it at all. I don't think that there is and ever will be a controller that needs a time to complete the NPEM request. From my experience this is immediate action. This is explanation why I have to keep it this way even if it has almost no sense :) Mariusz > > + * > > + * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of > > NPEM > > + * Command Completed" > > + */ > > + ret = read_poll_timeout(npem_read_reg, ret_val, > > + ret_val || (cc_status & > > PCI_NPEM_STATUS_CC), > > + 10 * USEC_PER_MSEC, USEC_PER_SEC, false, > > npem, > > + PCI_NPEM_STATUS, &cc_status); > > + if (ret) > > + return ret_val;
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index d35001589d88..e696e69ad516 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -143,6 +143,15 @@ config PCI_IOV If unsure, say N. +config PCI_NPEM + bool "Native PCIe Enclosure Management" + depends on LEDS_CLASS=y + help + Support for Native PCIe Enclosure Management. It allows managing LED + indications in storage enclosures. Enclosure must support following + indications: OK, Locate, Fail, Rebuild, other indications are + optional. + config PCI_PRI bool "PCI PRI support" select PCI_ATS diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 175302036890..cd5f655d4be9 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o obj-$(CONFIG_VGA_ARB) += vgaarb.o obj-$(CONFIG_PCI_DOE) += doe.o obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o +obj-$(CONFIG_PCI_NPEM) += npem.o # Endpoint library must be initialized before its users obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c new file mode 100644 index 000000000000..a76a2044dab2 --- /dev/null +++ b/drivers/pci/npem.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PCIe Enclosure management driver created for LED interfaces based on + * indications. It says *what indications* blink but does not specify *how* + * they blink - it is hardware defined. + * + * The driver name refers to Native PCIe Enclosure Management. It is + * first indication oriented standard with specification. + * + * Native PCIe Enclosure Management (NPEM) + * PCIe Base Specification r6.1 sec 6.28 + * PCIe Base Specification r6.1 sec 7.9.19 + * + * Copyright (c) 2023-2024 Intel Corporation + * Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> + */ + +#include <linux/acpi.h> +#include <linux/bitops.h> +#include <linux/errno.h> +#include <linux/iopoll.h> +#include <linux/leds.h> +#include <linux/mutex.h> +#include <linux/pci.h> +#include <linux/pci_regs.h> +#include <linux/types.h> +#include <linux/uleds.h> + +#include "pci.h" + +struct indication { + u32 bit; + const char *name; +}; + +static const struct indication npem_indications[] = { + {PCI_NPEM_IND_OK, "enclosure:ok"}, + {PCI_NPEM_IND_LOCATE, "enclosure:locate"}, + {PCI_NPEM_IND_FAIL, "enclosure:fail"}, + {PCI_NPEM_IND_REBUILD, "enclosure:rebuild"}, + {PCI_NPEM_IND_PFA, "enclosure:pfa"}, + {PCI_NPEM_IND_HOTSPARE, "enclosure:hotspare"}, + {PCI_NPEM_IND_ICA, "enclosure:ica"}, + {PCI_NPEM_IND_IFA, "enclosure:ifa"}, + {PCI_NPEM_IND_IDT, "enclosure:idt"}, + {PCI_NPEM_IND_DISABLED, "enclosure:disabled"}, + {PCI_NPEM_IND_SPEC_0, "enclosure:specific_0"}, + {PCI_NPEM_IND_SPEC_1, "enclosure:specific_1"}, + {PCI_NPEM_IND_SPEC_2, "enclosure:specific_2"}, + {PCI_NPEM_IND_SPEC_3, "enclosure:specific_3"}, + {PCI_NPEM_IND_SPEC_4, "enclosure:specific_4"}, + {PCI_NPEM_IND_SPEC_5, "enclosure:specific_5"}, + {PCI_NPEM_IND_SPEC_6, "enclosure:specific_6"}, + {PCI_NPEM_IND_SPEC_7, "enclosure:specific_7"}, + {0, NULL} +}; + +#define for_each_indication(ind, inds) \ + for (ind = inds; ind->bit; ind++) + +/* To avoid confusion, do not keep any special bits in indications */ +static u32 reg_to_indications(u32 caps, const struct indication *inds) +{ + const struct indication *ind; + u32 supported_indications = 0; + + for_each_indication(ind, inds) + supported_indications |= ind->bit; + + return caps & supported_indications; +} + +/** + * struct npem_led - LED details + * @indication: indication details + * @npem: npem device + * @name: LED name + * @led: LED device + */ +struct npem_led { + const struct indication *indication; + struct npem *npem; + char name[LED_MAX_NAME_SIZE]; + struct led_classdev led; +}; + +/** + * struct npem_ops - backend specific callbacks + * @inds: supported indications array + * @get_active_indications: get active indications + * npem: npem device + * buf: response buffer + * @set_active_indications: set new indications + * npem: npem device + * val: bit mask to set + * + * Handle communication with hardware. set_active_indications updates + * npem->active_indications. + */ +struct npem_ops { + const struct indication *inds; + int (*get_active_indications)(struct npem *npem, u32 *buf); + int (*set_active_indications)(struct npem *npem, u32 val); +}; + +/** + * struct npem - NPEM device properties + * @dev: PCIe device this driver is attached to + * @ops: Backend specific callbacks + * @npem_lock: to keep concurrent updates serialized + * @pos: NPEM capability offset (only relevant for NPEM direct register access, + * not DSM access method) + * @supported_indications: bit mask of supported indications + * non-indication and reserved bits are cleared + * @active_indications: bit mask of active indications + * non-indication and reserved bits are cleared + * @led_cnt: Supported LEDs count + * @leds: supported LEDs + */ +struct npem { + struct pci_dev *dev; + const struct npem_ops *ops; + struct mutex npem_lock; + u16 pos; + u32 supported_indications; + u32 active_indications; + int led_cnt; + struct npem_led leds[]; +}; + +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val) +{ + int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val); + + return pcibios_err_to_errno(ret); +} + +static int npem_write_ctrl(struct npem *npem, u32 reg) +{ + int pos = npem->pos + PCI_NPEM_CTRL; + int ret = pci_write_config_dword(npem->dev, pos, reg); + + return pcibios_err_to_errno(ret); +} + +static int npem_get_active_indications(struct npem *npem, u32 *buf) +{ + int ret; + + ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf); + if (ret) + return ret; + + /* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */ + if (!(*buf & PCI_NPEM_CTRL_ENABLE)) + *buf = 0; + + /* Filter out not supported indications in response */ + *buf &= npem->supported_indications; + return 0; +} + +static int npem_set_active_indications(struct npem *npem, u32 val) +{ + int ret, ret_val; + u32 cc_status; + + lockdep_assert_held(&npem->npem_lock); + + /* This bit is always required */ + val |= PCI_NPEM_CTRL_ENABLE; + ret = npem_write_ctrl(npem, val); + if (ret) + return ret; + + /* + * For the case where a NPEM command has not completed immediately, + * it is recommended that software not continuously “spin” on polling + * the status register, but rather poll under interrupt at a reduced + * rate; for example at 10 ms intervals. + * + * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM + * Command Completed" + */ + ret = read_poll_timeout(npem_read_reg, ret_val, + ret_val || (cc_status & PCI_NPEM_STATUS_CC), + 10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem, + PCI_NPEM_STATUS, &cc_status); + if (ret) + return ret_val; + + /* + * All writes to control register, including writes that do not change + * the register value, are NPEM commands and should eventually result + * in a command completion indication in the NPEM Status Register. + * + * PCIe Base Specification r6.1 sec 7.9.19.3 + * + * Register may not be updated, or other conflicting bits may be + * cleared. Spec is not strict here. Read NPEM Control register after + * write to keep cache in-sync. + */ + return npem_get_active_indications(npem, &npem->active_indications); +} + +static const struct npem_ops npem_ops = { + .inds = npem_indications, + .get_active_indications = npem_get_active_indications, + .set_active_indications = npem_set_active_indications, +}; + +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b, 0x8c, 0xb7, 0x74, 0x7e,\ + 0xd5, 0x1e, 0x19, 0x4d) +#define GET_SUPPORTED_STATES_DSM BIT(1) +#define GET_STATE_DSM BIT(2) +#define SET_STATE_DSM BIT(3) + +static const guid_t dsm_guid = DSM_GUID; + +static bool npem_has_dsm(struct pci_dev *pdev) +{ + acpi_handle handle; + + handle = ACPI_HANDLE(&pdev->dev); + if (!handle) + return false; + + return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM | + GET_STATE_DSM | SET_STATE_DSM); +} + +/* + * This function does not use ops->get_active_indications(). + * It returns cached value, npem->npem_lock is held and it is safe. + */ +static enum led_brightness brightness_get(struct led_classdev *led) +{ + struct npem_led *nled = container_of(led, struct npem_led, led); + struct npem *npem = nled->npem; + int ret, val = LED_OFF; + + ret = mutex_lock_interruptible(&npem->npem_lock); + if (ret) + return ret; + + if (npem->active_indications & nled->indication->bit) + val = LED_ON; + + mutex_unlock(&npem->npem_lock); + + return val; +} + +static int brightness_set(struct led_classdev *led, + enum led_brightness brightness) +{ + struct npem_led *nled = container_of(led, struct npem_led, led); + struct npem *npem = nled->npem; + u32 indications; + int ret; + + ret = mutex_lock_interruptible(&npem->npem_lock); + if (ret) + return ret; + + if (brightness == LED_OFF) + indications = npem->active_indications & ~(nled->indication->bit); + else + indications = npem->active_indications | nled->indication->bit; + + ret = npem->ops->set_active_indications(npem, indications); + + mutex_unlock(&npem->npem_lock); + return ret; +} + +static void npem_free(struct npem *npem) +{ + struct npem_led *nled; + int cnt; + + for (cnt = 0; cnt < npem->led_cnt; cnt++) { + nled = &npem->leds[cnt]; + + if (nled->name[0]) + led_classdev_unregister(&nled->led); + } + + mutex_destroy(&npem->npem_lock); + kfree(npem); +} + +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled) +{ + struct led_classdev *led = &nled->led; + struct led_init_data init_data = {}; + char *name = nled->name; + int ret; + + init_data.devicename = pci_name(npem->dev); + init_data.default_label = nled->indication->name; + + ret = led_compose_name(&npem->dev->dev, &init_data, name); + if (ret) + return ret; + + led->name = name; + led->brightness_set_blocking = brightness_set; + led->brightness_get = brightness_get; + led->max_brightness = LED_ON; + led->default_trigger = "none"; + led->flags = 0; + + ret = led_classdev_register(&npem->dev->dev, led); + if (ret) + /* Clear the name to indicate that it is not registered. */ + name[0] = 0; + return ret; +} + +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops, + int pos, u32 caps) +{ + u32 supported = reg_to_indications(caps, ops->inds); + int supported_cnt = hweight32(supported); + const struct indication *indication; + struct npem_led *nled; + struct npem *npem; + int led_idx = 0; + u32 active; + int ret; + + npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL); + + if (!npem) + return -ENOMEM; + + npem->supported_indications = supported; + npem->led_cnt = supported_cnt; + npem->pos = pos; + npem->dev = dev; + npem->ops = ops; + + ret = ops->get_active_indications(npem, &active); + if (ret) { + npem_free(npem); + return -EACCES; + } + + npem->active_indications = reg_to_indications(active, ops->inds); + + /* + * Do not take npem->npem_lock, get_brightness() is called on + * registration path. + */ + mutex_init(&npem->npem_lock); + + for_each_indication(indication, npem_indications) { + if ((npem->supported_indications & indication->bit) == 0) + /* Do not register unsupported indication */ + continue; + + nled = &npem->leds[led_idx++]; + nled->indication = indication; + nled->npem = npem; + + ret = pci_npem_set_led_classdev(npem, nled); + if (ret) { + npem_free(npem); + return ret; + } + } + + dev->npem = npem; + return 0; +} + +void pci_npem_remove(struct pci_dev *dev) +{ + if (dev->npem) + npem_free(dev->npem); +} + +void pci_npem_create(struct pci_dev *dev) +{ + const struct npem_ops *ops = &npem_ops; + int pos = 0, ret; + u32 cap; + + if (!npem_has_dsm(dev)) { + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM); + if (pos == 0) + return; + + if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 || + (cap & PCI_NPEM_CAP_CAPABLE) == 0) + return; + } else { + /* + * OS should use the DSM for LED control if it is available + * PCI Firmware Spec r3.3 sec 4.7. + */ + return; + } + + ret = pci_npem_init(dev, ops, pos, cap); + if (ret) + pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n", + ret); +} diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fd44565c4756..9dea8c7353ab 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { } static inline void pci_doe_disconnected(struct pci_dev *pdev) { } #endif +#ifdef CONFIG_PCI_NPEM +void pci_npem_create(struct pci_dev *dev); +void pci_npem_remove(struct pci_dev *dev); +#else +static inline void pci_npem_create(struct pci_dev *dev) { } +static inline void pci_npem_remove(struct pci_dev *dev) { } +#endif + /** * pci_dev_set_io_state - Set the new error state if possible. * diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 8e696e547565..b8ea6353e27a 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) dev->match_driver = false; ret = device_add(&dev->dev); WARN_ON(ret < 0); + + pci_npem_create(dev); } struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index d749ea8250d6..1436f9cf1fea 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev) if (!dev->dev.kobj.parent) return; + pci_npem_remove(dev); + device_del(&dev->dev); down_write(&pci_bus_sem); diff --git a/include/linux/pci.h b/include/linux/pci.h index fb004fd4e889..c327c2dd4527 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -517,6 +517,9 @@ struct pci_dev { #endif #ifdef CONFIG_PCI_DOE struct xarray doe_mbs; /* Data Object Exchange mailboxes */ +#endif +#ifdef CONFIG_PCI_NPEM + struct npem *npem; /* Native PCIe Enclosure Management */ #endif u16 acs_cap; /* ACS Capability offset */ phys_addr_t rom; /* Physical address if not from BAR */ diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index 94c00996e633..97e8b7ed9998 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -740,6 +740,7 @@ #define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ +#define PCI_EXT_CAP_ID_NPEM 0x29 /* Native PCIe Enclosure Management */ #define PCI_EXT_CAP_ID_PL_32GT 0x2A /* Physical Layer 32.0 GT/s */ #define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE @@ -1121,6 +1122,39 @@ #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 +/* Native PCIe Enclosure Management */ +#define PCI_NPEM_CAP 0x04 /* NPEM capability register */ +#define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ + +#define PCI_NPEM_CTRL 0x08 /* NPEM control register */ +#define PCI_NPEM_CTRL_ENABLE 0x00000001 /* NPEM Enable */ + +#define PCI_NPEM_STATUS 0x0c /* NPEM status register */ +#define PCI_NPEM_STATUS_CC 0x00000001 /* NPEM Command completed */ +/* + * Native PCIe Enclosure Management indication bits and Reset command bit + * are corresponding for capability and control registers. + */ +#define PCI_NPEM_CMD_RESET 0x00000002 /* NPEM Reset Command */ +#define PCI_NPEM_IND_OK 0x00000004 /* NPEM indication OK */ +#define PCI_NPEM_IND_LOCATE 0x00000008 /* NPEM indication Locate */ +#define PCI_NPEM_IND_FAIL 0x00000010 /* NPEM indication Fail */ +#define PCI_NPEM_IND_REBUILD 0x00000020 /* NPEM indication Rebuild */ +#define PCI_NPEM_IND_PFA 0x00000040 /* NPEM indication Predicted Failure Analysis */ +#define PCI_NPEM_IND_HOTSPARE 0x00000080 /* NPEM indication Hot Spare */ +#define PCI_NPEM_IND_ICA 0x00000100 /* NPEM indication In Critical Array */ +#define PCI_NPEM_IND_IFA 0x00000200 /* NPEM indication In Failed Array */ +#define PCI_NPEM_IND_IDT 0x00000400 /* NPEM indication Invalid Device Type */ +#define PCI_NPEM_IND_DISABLED 0x00000800 /* NPEM indication Disabled */ +#define PCI_NPEM_IND_SPEC_0 0x00800000 +#define PCI_NPEM_IND_SPEC_1 0x01000000 +#define PCI_NPEM_IND_SPEC_2 0x02000000 +#define PCI_NPEM_IND_SPEC_3 0x04000000 +#define PCI_NPEM_IND_SPEC_4 0x08000000 +#define PCI_NPEM_IND_SPEC_5 0x10000000 +#define PCI_NPEM_IND_SPEC_6 0x20000000 +#define PCI_NPEM_IND_SPEC_7 0x40000000 + /* Data Object Exchange */ #define PCI_DOE_CAP 0x04 /* DOE Capabilities Register */ #define PCI_DOE_CAP_INT_SUP 0x00000001 /* Interrupt Support */
Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows managing LED in storage enclosures. NPEM is indication oriented and it does not give direct access to LED. Although each of the indications *could* represent an individual LED, multiple indications could also be represented as a single, multi-color LED or a single LED blinking in a specific interval. The specification leaves that open. Each enabled indication (capability register bit on) is represented as a ledclass_dev which can be controlled through sysfs. For every ledclass device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0). It is corresponding to NPEM control register (Indication bit on/off). Ledclass devices appear in sysfs as child devices (subdirectory) of PCI device which has an NPEM Extended Capability and indication is enabled in NPEM capability register. For example, these are leds created for pcieport "10000:02:05.0" on my setup: leds/ ├── 10000:02:05.0:enclosure:fail ├── 10000:02:05.0:enclosure:locate ├── 10000:02:05.0:enclosure:ok └── 10000:02:05.0:enclosure:rebuild They can be also found in "/sys/class/leds" directory. Parent PCIe device bdf is used to guarantee uniqueness across leds subsystem. To enable/disable fail indication "brightness" file can be edited: echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness PCIe r6.1, sec 7.9.19.2 defines the possible indications. Multiple indications for same parent PCIe device can conflict and hardware may update them when processing new request. To avoid issues, driver refresh all indications by reading back control register. Driver is projected to be exclusive NPEM extended capability manager. It waits up to 1 second after imposing new request, it doesn't verify if controller is busy before write, assuming that mutex lock gives protection from concurrent updates. Driver is not registered if _DSM LED management is available. NPEM is a PCIe extended capability so it should be registered in pcie_init_capabilities() but it is not possible due to LED dependency. Parent pci_device must be added earlier for led_classdev_register() to be successful. NPEM does not require configuration on kernel side, it is safe to register LED devices later. Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1] Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> --- drivers/pci/Kconfig | 9 + drivers/pci/Makefile | 1 + drivers/pci/npem.c | 410 ++++++++++++++++++++++++++++++++++ drivers/pci/pci.h | 8 + drivers/pci/probe.c | 2 + drivers/pci/remove.c | 2 + include/linux/pci.h | 3 + include/uapi/linux/pci_regs.h | 34 +++ 8 files changed, 469 insertions(+) create mode 100644 drivers/pci/npem.c