Message ID | 1470687542-30155-2-git-send-email-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, Aug 08, 2016 at 02:19:02PM -0600, Keith Busch wrote: > This adds a quirk for certain devices that require ignoring attention > and power indicators. > > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- > drivers/pci/quirks.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index b69321c..b5bd7a0 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -4427,3 +4427,18 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev) > } > } > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap); > + > +/* > + * The PCIe slot capabilities for Intel compatible Hot-swap backplane advertise > + * attention and power indicators, but will do the wrong thing if used in a > + * standard way. Ignore these. > + */ Hmm. So I guess you're saying these devices are defective? Is there an erratum we can reference? What exactly does "do the wrong thing" mean? These are indicators, so the only thing we really do is turn them on and off. I think we do that with pcie_write_cmd_nowait(), and all the synchronization there is a little messy. Maybe we got that wrong somehow? It's hard to believe something as simple as controlling an LED is broken. If it *is* broken, I would think the breakage would be platform-dependent, not just device-dependent, i.e., I would suspect something wrong with motherboard wiring or firmware. > +static void quirk_hsbp(struct pci_dev *pdev) > +{ > + pdev->ignore_aip = 1; > + pdev->ignore_pip = 1; > +} > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_hsbp); > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_hsbp); > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_hsbp); > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_hsbp); > -- > 2.7.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 15, 2016 at 12:40:02PM -0500, Bjorn Helgaas wrote: > On Mon, Aug 08, 2016 at 02:19:02PM -0600, Keith Busch wrote: > > +/* > > + * The PCIe slot capabilities for Intel compatible Hot-swap backplane advertise > > + * attention and power indicators, but will do the wrong thing if used in a > > + * standard way. Ignore these. > > + */ > > Hmm. So I guess you're saying these devices are defective? Is there > an erratum we can reference? > > What exactly does "do the wrong thing" mean? These are indicators, so > the only thing we really do is turn them on and off. I think we do > that with pcie_write_cmd_nowait(), and all the synchronization there > is a little messy. Maybe we got that wrong somehow? > > It's hard to believe something as simple as controlling an LED is > broken. If it *is* broken, I would think the breakage would be > platform-dependent, not just device-dependent, i.e., I would suspect > something wrong with motherboard wiring or firmware. This is actually a "feature". The devices listed in the patch re-purpose the spec defined capability and control bits for Attention and Power indicators. The control values match IBPI (International Blinking Pattern Interpretation) rather than the spec definition. Since these operate in a non-standard way, we'd just as soon not let the kernel know about them (an incorrect LED pattern will definitely occur). The LEDs are to be set from user space by 'ledmon' instead. Had I my way, the hardware wouldn't advertise the capability in the first place. I rarely get my way, so I instead get to publicly defend the quirk. :) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 15, 2016 at 03:23:16PM -0400, Keith Busch wrote: > On Mon, Aug 15, 2016 at 12:40:02PM -0500, Bjorn Helgaas wrote: > > On Mon, Aug 08, 2016 at 02:19:02PM -0600, Keith Busch wrote: > > > +/* > > > + * The PCIe slot capabilities for Intel compatible Hot-swap backplane advertise > > > + * attention and power indicators, but will do the wrong thing if used in a > > > + * standard way. Ignore these. > > > + */ > > > > Hmm. So I guess you're saying these devices are defective? Is there > > an erratum we can reference? > > > > What exactly does "do the wrong thing" mean? These are indicators, so > > the only thing we really do is turn them on and off. I think we do > > that with pcie_write_cmd_nowait(), and all the synchronization there > > is a little messy. Maybe we got that wrong somehow? > > > > It's hard to believe something as simple as controlling an LED is > > broken. If it *is* broken, I would think the breakage would be > > platform-dependent, not just device-dependent, i.e., I would suspect > > something wrong with motherboard wiring or firmware. > > This is actually a "feature". The devices listed in the patch re-purpose > the spec defined capability and control bits for Attention and Power > indicators. The control values match IBPI (International Blinking Pattern > Interpretation) rather than the spec definition. > > Since these operate in a non-standard way, we'd just as soon not let > the kernel know about them (an incorrect LED pattern will definitely > occur). The LEDs are to be set from user space by 'ledmon' instead. What? This is August 15, not April 1. That makes no sense whatsoever. The device is advertising that it supports the hotplug model in the PCIe spec. If you don't like that model, you should define a new model with a different capability ID. Or you could propose a PCIe spec update with a new capability bit so software can tell that this device supports different functionality. > Had I my way, the hardware wouldn't advertise the capability in the > first place. I rarely get my way, so I instead get to publicly defend > the quirk. :) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 15, 2016 at 02:50:58PM -0500, Bjorn Helgaas wrote: > What? This is August 15, not April 1. That makes no sense > whatsoever. Yes, I agree, and not advocating this was a good idea. I'm just the messenger in this case. That said, it looks like this can fall under the intended usage for the quirk framework if we can consider this a work-around for standard non-compliance. Or are your thoughts that we've gone too far if it's not a true erratum? -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 15, 2016 at 06:35:03PM -0400, Keith Busch wrote: > On Mon, Aug 15, 2016 at 02:50:58PM -0500, Bjorn Helgaas wrote: > > What? This is August 15, not April 1. That makes no sense > > whatsoever. > > Yes, I agree, and not advocating this was a good idea. I'm just the > messenger in this case. > > That said, it looks like this can fall under the intended usage for > the quirk framework if we can consider this a work-around for standard > non-compliance. Or are your thoughts that we've gone too far if it's > not a true erratum? Well, I don't want to add random quirks for everybody who decides to make their own non-standard hardware. The reason we have the PCI specs is for interoperability, and this is not it. The whole purpose of capability bits is for the hardware to tell the OS that it has certain *capabilities* that work in certain ways. It's idiotic to set a capability bit that says "I have this indicator" when it doesn't work the way it's supposed to. The hardware should just NOT SET THE BIT. How hard is that? I can't conceive of the reasoning that concluded this was useful. I don't care if you want to make the LEDs do something else, but for Pete's sake, don't claim they work according to spec when they don't. I assume there will be more broken devices like this in the future. How long will we be adding quirks? Do you have a plan to resolve this the correct way eventually? I still don't know what it means that the devices "do the wrong thing." Do the LEDs not work? Do the LEDs work according to the PCIe spec, but the patterns don't match the International Blinken Lights spec? Does the hotplug controller lock up? Sigh. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 15, 2016 at 03:23:16PM -0400, Keith Busch wrote: > On Mon, Aug 15, 2016 at 12:40:02PM -0500, Bjorn Helgaas wrote: > > On Mon, Aug 08, 2016 at 02:19:02PM -0600, Keith Busch wrote: > > > +/* > > > + * The PCIe slot capabilities for Intel compatible Hot-swap backplane advertise > > > + * attention and power indicators, but will do the wrong thing if used in a > > > + * standard way. Ignore these. > > > + */ > > > > Hmm. So I guess you're saying these devices are defective? Is there > > an erratum we can reference? > > > > What exactly does "do the wrong thing" mean? These are indicators, so > > the only thing we really do is turn them on and off. I think we do > > that with pcie_write_cmd_nowait(), and all the synchronization there > > is a little messy. Maybe we got that wrong somehow? > > > > It's hard to believe something as simple as controlling an LED is > > broken. If it *is* broken, I would think the breakage would be > > platform-dependent, not just device-dependent, i.e., I would suspect > > something wrong with motherboard wiring or firmware. > > This is actually a "feature". The devices listed in the patch re-purpose > the spec defined capability and control bits for Attention and Power > indicators. The control values match IBPI (International Blinking Pattern > Interpretation) rather than the spec definition. > > Since these operate in a non-standard way, we'd just as soon not let > the kernel know about them (an incorrect LED pattern will definitely > occur). The LEDs are to be set from user space by 'ledmon' instead. > > Had I my way, the hardware wouldn't advertise the capability in the > first place. I rarely get my way, so I instead get to publicly defend > the quirk. :) Usually when I think something is totally stupid, it's because I don't know the whole story. So it might make more sense and lead to a better solution if you could tell us more about your intent here. According to the Linux PCI database, the devices you want to quirk are: 2030 Sky Lake-E PCI Express Root Port 1A 2031 Sky Lake-E PCI Express Root Port 1B 2032 Sky Lake-E PCI Express Root Port 1C 2033 Sky Lake-E PCI Express Root Port 1D So are you saying that on every platform that uses Sky Lake-E, these indicators are non-standard in this way? IBPI looks like it's targeted at storage arrays, since it has states for "drive not present", "fail", "rebuild", "hotspare", etc. Maybe there's some sense for Sky Lake-E platforms with directly-attached storage. But if somebody built a Sky Lake-E platform with one of these Root Ports leading to a plain hotplug PCIe slot with regular indicators, your quirk would break them, wouldn't it? Or are you imposing constraints on how those Root Ports can be used? How does 'ledmon' manage the indicators? The kernel (pciehp) uses the Slot Control register, which is not completely trivial because of the Command Completed synchronization required. I'm hoping ledmon isn't going to mess up that synchronization. How does this work for other OSes? Are you proposing similar changes to Windows? What's your plan for backwards compatibility? Just accept that old OSes won't be able to operate the indicators correctly until they're patched with this quirk? You must have set that capability bit for some reason. You don't want the OS to consume it, so who *do* you expect to consume it, and how (direct PCI config access, lspci, etc.), and what are they supposed to do with it? Still scratching my head, Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 17, 2016 at 04:37:45PM -0500, Bjorn Helgaas wrote: > Usually when I think something is totally stupid, it's because I don't > know the whole story. So it might make more sense and lead to a > better solution if you could tell us more about your intent here. Definitely. I did not provide all the details because I didn't think knowing the full context would help toward understanding the function of the patch, but I see now that skimping on the details did not help our cause. This came about from wanting a simple SGPIO-like LED management solution for PCIe SSDs. The Intel group who made this, not considering the more broad impact on standarization, chose to reuse the hot plug serial SMBus in the Intel CPUs (aka VPP) that already carried the Slot Control register bits out of the CPU. We would love to have been able to disable the capability present bits, but the hardware logic that would disable those bits would also have disabled LED control entirely, and we can't change the hardware now. We have to rely on software to work around this limitation for this generation of hardware. The next generation of these devices will pursue standards compliant methods by engaging with PCI-SIG and the NVMe-MI standards bodies. This current generation of devices is the only one set this way that requires this work-around. > According to the Linux PCI database, the devices you want to quirk are: > > 2030 Sky Lake-E PCI Express Root Port 1A > 2031 Sky Lake-E PCI Express Root Port 1B > 2032 Sky Lake-E PCI Express Root Port 1C > 2033 Sky Lake-E PCI Express Root Port 1D > > > So are you saying that on every platform that uses Sky Lake-E, these > indicators are non-standard in this way? Yes (more details below), but I may need to refine this to specific subsystem IDs (clarifying that internally now). Also, just for future note, the device list that I provided with the quirk may need to be augmented for other vendor devices that implement it this way as well. > IBPI looks like it's targeted at storage arrays, since it has states > for "drive not present", "fail", "rebuild", "hotspare", etc. Maybe > there's some sense for Sky Lake-E platforms with directly-attached > storage. > > But if somebody built a Sky Lake-E platform with one of these Root > Ports leading to a plain hotplug PCIe slot with regular indicators, > your quirk would break them, wouldn't it? Or are you imposing > constraints on how those Root Ports can be used? Yes, you totally nailed it on this being for storage. To go even further, we constrain these to be in Sky Lake's VMD domains. We do not support using VMD for anything but PCIe storage. > How does 'ledmon' manage the indicators? The kernel (pciehp) uses the > Slot Control register, which is not completely trivial because of the > Command Completed synchronization required. I'm hoping ledmon isn't > going to mess up that synchronization. We've augmented 'ledmon' with libpci and toggles these indicators very similar to how 'setpci' can change PCI config registers. It is done only after the storage device is up and registered, which is well outside the time when pciehp is actively using the slot control. > How does this work for other OSes? Are you proposing similar changes > to Windows? Aha, the mystery to that part might be clarified knowing this is tied to VMD. For other OSes including Windows, the VMD driver does not expose the domain to the rest of the operating system. These VMD drivers implement their own pci bus and port service drivers, nvme storage drivers, and interrupt chaining, so they don't have to worry about anyone using these devices incorrectly. Those drivers also provide VMD specific management interfaces (special ioctls, for example) to manage its domain. For Linux, we did not think it appropriate or upstreamable to re-write all the drivers and services into a monolithic VMD driver, then implement new user tooling for things that 'lspci' and 'setpci' already provide. Instead, we leverage all the existing good code by implementing VMD as a PCI host-bridge driver. The down side is we have to get our quirks fixed in generic code. > What's your plan for backwards compatibility? Just accept that old > OSes won't be able to operate the indicators correctly until they're > patched with this quirk? You may have surmised this already based on the new details, but I'll just add that since it's tied to VMD, and that being a very new driver, backward compatibility is not a concern. > You must have set that capability bit for some reason. You don't want > the OS to consume it, so who *do* you expect to consume it, and how > (direct PCI config access, lspci, etc.), and what are they supposed to > do with it? Just restating what I mentioned above, this is an artifact on how the h/w gates were wired. Suppressing the capability bit automatically suppresses the LED control, and we need LED control. This is being addressed in the next generation to do provide the desired LED control in a standard compliant manner. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 17, 2016 at 07:09:51PM -0400, Keith Busch wrote: > On Wed, Aug 17, 2016 at 04:37:45PM -0500, Bjorn Helgaas wrote: > > Usually when I think something is totally stupid, it's because I don't > > know the whole story. So it might make more sense and lead to a > > better solution if you could tell us more about your intent here. > > Definitely. I did not provide all the details because I didn't think > knowing the full context would help toward understanding the function > of the patch, but I see now that skimping on the details did not help > our cause. > > This came about from wanting a simple SGPIO-like LED management solution > for PCIe SSDs. The Intel group who made this, not considering the > more broad impact on standarization, chose to reuse the hot plug > serial SMBus in the Intel CPUs (aka VPP) that already carried the Slot > Control register bits out of the CPU. > > We would love to have been able to disable the capability present > bits, but the hardware logic that would disable those bits would also > have disabled LED control entirely, and we can't change the hardware > now. We have to rely on software to work around this limitation for this > generation of hardware. > > The next generation of these devices will pursue standards compliant > methods by engaging with PCI-SIG and the NVMe-MI standards bodies. > This current generation of devices is the only one set this way that > requires this work-around. That's good news. > > How does 'ledmon' manage the indicators? The kernel (pciehp) uses the > > Slot Control register, which is not completely trivial because of the > > Command Completed synchronization required. I'm hoping ledmon isn't > > going to mess up that synchronization. > > We've augmented 'ledmon' with libpci and toggles these indicators very > similar to how 'setpci' can change PCI config registers. It is done only > after the storage device is up and registered, which is well outside > the time when pciehp is actively using the slot control. I assume this means ledmon writes Slot Control to manage the LEDs. That sounds pretty scary to me because it's impossible to enforce the assumption that ledmon only uses Slot Control when pciehp isn't using it. Hotplug includes surprise events, and I think pciehp is entitled to assume it is the sole owner of Slot Control. I wonder if there's some way to implement the LED control in pciehp, e.g., by enhancing pciehp_set_attention_status(). I assume the desired indicator state is coming from some in-kernel storage driver, and ledmon learns about that somehow, then fiddles with Slot Control behind the kernel's back? That looping from kernel to user and back to kernel sounds a little dodgy and fragile. Can you share any details about how you plan to implement this on the next generation of devices? Maybe we can enhance pciehp indicator control in a way that supports both Sky Lake and future devices. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index b69321c..b5bd7a0 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4427,3 +4427,18 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev) } } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap); + +/* + * The PCIe slot capabilities for Intel compatible Hot-swap backplane advertise + * attention and power indicators, but will do the wrong thing if used in a + * standard way. Ignore these. + */ +static void quirk_hsbp(struct pci_dev *pdev) +{ + pdev->ignore_aip = 1; + pdev->ignore_pip = 1; +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_hsbp); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_hsbp); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_hsbp); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_hsbp);
This adds a quirk for certain devices that require ignoring attention and power indicators. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/quirks.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)