diff mbox series

PCI: pciehp: Do not turn off slot if presence comes up after link

Message ID 20190205210701.25387-1-mr.nuke.me@gmail.com (mailing list archive)
State Superseded, archived
Headers show
Series PCI: pciehp: Do not turn off slot if presence comes up after link | expand

Commit Message

Alex G. Feb. 5, 2019, 9:06 p.m. UTC
According to PCIe 3.0, the presence detect state is a logical OR of
in-band and out-of-band presence. With this, we'd expect the presence
state to always be asserted when the link comes up.

Not all hardware follows this, and it is possible for the presence to
come up after the link. In this case, the PCIe device would be
erroneously disabled and re-probed. It is possible to distinguish
between a delayed presence and a card swap by looking at the DLL state
changed bit -- The link has to come down if the card is removed.

Thus, for a device that is probed, present and has its link active, a
lack of a link state change event guarantees we have the same device,
and shutdown of is not needed.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---

Following some discussion in
"PCI: hotplug: Erroneous removal of hotplug PCI devices" [1]
It became apparent that we can't fully rely presence detect changed
as an indicator to shutdown a device. And in PCIe 4.0 the "logical OR"
requirement is going away, so we can use a way to slightly  decouple
presence detect and link active.

I think the approach here is the simplest, and least likely to
interfere with other mis-behaving hardware (in the PCIe 3.0 sense).

[1] https://www.spinics.net/lists/linux-pci/msg79564.html

 drivers/pci/hotplug/pciehp_ctrl.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Bjorn Helgaas Feb. 8, 2019, 7:56 p.m. UTC | #1
On Tue, Feb 05, 2019 at 03:06:56PM -0600, Alexandru Gagniuc wrote:
> According to PCIe 3.0, the presence detect state is a logical OR of
> in-band and out-of-band presence. With this, we'd expect the presence
> state to always be asserted when the link comes up.

Do you have a PCIe 4.0 spec?  I think 5.0 is about to come out, so
it'd be nice to have at least a 4.0 citation (including section
number), if you have one.

> Not all hardware follows this, and it is possible for the presence to
> come up after the link. In this case, the PCIe device would be
> erroneously disabled and re-probed. It is possible to distinguish
> between a delayed presence and a card swap by looking at the DLL state
> changed bit -- The link has to come down if the card is removed.
> 
> Thus, for a device that is probed, present and has its link active, a
> lack of a link state change event guarantees we have the same device,
> and shutdown of is not needed.

s/of is/is/, I guess?

I'm hoping Lukas will chime in here; thanks for cc'ing him already.

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
> 
> Following some discussion in
> "PCI: hotplug: Erroneous removal of hotplug PCI devices" [1]
> It became apparent that we can't fully rely presence detect changed
> as an indicator to shutdown a device. And in PCIe 4.0 the "logical OR"
> requirement is going away, so we can use a way to slightly  decouple
> presence detect and link active.
> 
> I think the approach here is the simplest, and least likely to
> interfere with other mis-behaving hardware (in the PCIe 3.0 sense).
> 
> [1] https://www.spinics.net/lists/linux-pci/msg79564.html

Would you mind updating this reference to the
https://lore.kernel.org/linux-pci form that doesn't depend on external
organizations?

https://www.kernel.org/lore.html has some details, but unfortunately
lacks a hint about how to construct the URL.  What I do is look up the
Message-ID and use, e.g.,

  https://lore.kernel.org/linux-pci/<Message-ID>

>  drivers/pci/hotplug/pciehp_ctrl.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 3f3df4c29f6e..ea0dd3e956be 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -220,13 +220,23 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  	/*
>  	 * If the slot is on and presence or link has changed, turn it off.
>  	 * Even if it's occupied again, we cannot assume the card is the same.
> +	 * When the card is swapped, we also expect a change in link state,
> +	 * without which, it's likely presence became high after link-active.
>  	 */
>  	mutex_lock(&ctrl->state_lock);
> +	present = pciehp_card_present(ctrl);
> +	link_active = pciehp_check_link_active(ctrl);
>  	switch (ctrl->state) {
>  	case BLINKINGOFF_STATE:
>  		cancel_delayed_work(&ctrl->button_work);
>  		/* fall through */
>  	case ON_STATE:
> +		if (present && link_active &&
> +		   !(events & PCI_EXP_SLTSTA_DLLSC)) {
> +			mutex_unlock(&ctrl->state_lock);
> +			ctrl_warn(ctrl, "Hardware bug: Presence state came up after link");

I'm not 100% sure this is worth KERN_WARN unless the user might be
able to do something about it.

> +			return;
> +		}
>  		ctrl->state = POWEROFF_STATE;
>  		mutex_unlock(&ctrl->state_lock);
>  		if (events & PCI_EXP_SLTSTA_DLLSC)
> -- 
> 2.19.2
>
Lukas Wunner Feb. 9, 2019, 11:58 a.m. UTC | #2
On Tue, Feb 05, 2019 at 03:06:56PM -0600, Alexandru Gagniuc wrote:
> According to PCIe 3.0, the presence detect state is a logical OR of
> in-band and out-of-band presence.

Since Bjorn asked for a spec reference:

PCIe r4.0 sec 7.8.11: "Presence Detect State - This bit indicates the
presence of an adapter in the slot, reflected by the logical OR of the
Physical Layer in-band presence detect mechanism and, if present, any
out-of-band presence detect mechanism defined for the slot's
corresponding form factor."

Table 4-14 in sec 4.2.6 is also important because it shows the difference
between Link Up and in-band presence.


> With this, we'd expect the presence
> state to always be asserted when the link comes up.
> 
> Not all hardware follows this, and it is possible for the presence to
> come up after the link. In this case, the PCIe device would be
> erroneously disabled and re-probed. It is possible to distinguish
> between a delayed presence and a card swap by looking at the DLL state
> changed bit -- The link has to come down if the card is removed.

So in reality, PDC and DLLSC events rarely come in simultaneously and
we do allow some leeway in pcie_wait_for_link(), it's just not enough
in your particular case due to the way presence is detected by the
platform.

I think in this case instead of changing the behavior for everyone,
it's more appropriate to add a quirk for your hardware.  Two ideas:

* Amend pcie_init() to set slot_cap |= PCI_EXP_SLTCAP_ABP.  Insert
  this as a quirk right below the existing Thunderbolt quirk and
  use PCI vendor/device IDs and/or DMI checks to identify your
  particular hardware.  If the ABP bit is set, PDC events are not
  enabled by pcie_enable_notification(), so you will solely rely on
  DLLSC events.  Problem solved.  (Hopefully.)

* Alternatively, add a DECLARE_PCI_FIXUP_FINAL() quirk which sets
  pdev->link_active_reporting = false.  This causes pcie_wait_for_link()
  to wait 1100 msec before the hot-added device's config space is
  accessed for the first time.

Would this work for you?

Thanks,

Lukas
Alex_Gagniuc@Dellteam.com Feb. 11, 2019, 11:48 p.m. UTC | #3
On 2/9/19 5:58 AM, Lukas Wunner wrote:
> 
> [EXTERNAL EMAIL]
> 
> On Tue, Feb 05, 2019 at 03:06:56PM -0600, Alexandru Gagniuc wrote:
>> According to PCIe 3.0, the presence detect state is a logical OR of
>> in-band and out-of-band presence.
> 
> Since Bjorn asked for a spec reference:
> 
> PCIe r4.0 sec 7.8.11: "Presence Detect State - This bit indicates the
> presence of an adapter in the slot, reflected by the logical OR of the
> Physical Layer in-band presence detect mechanism and, if present, any
> out-of-band presence detect mechanism defined for the slot's
> corresponding form factor."

ECN [1] was approved last November, so it's normative spec text. Sorry 
if the Ukranians didn't get ahold of it yet. I'll try to explain the delta.
There's an in-band PD supported/disable set of bits. If 'supported' is 
set, then you can set 'disable', which removes the 'logical OR" and now 
PD is only out-of-band presence.

> Table 4-14 in sec 4.2.6 is also important because it shows the difference
> between Link Up and in-band presence.

So, we'd expect PD to come up at the same time or before DLL?

>> With this, we'd expect the presence
>> state to always be asserted when the link comes up.
>>
>> Not all hardware follows this, and it is possible for the presence to
>> come up after the link. In this case, the PCIe device would be
>> erroneously disabled and re-probed. It is possible to distinguish
>> between a delayed presence and a card swap by looking at the DLL state
>> changed bit -- The link has to come down if the card is removed.
> 
> So in reality, PDC and DLLSC events rarely come in simultaneously and
> we do allow some leeway in pcie_wait_for_link(), it's just not enough
> in your particular case due to the way presence is detected by the
> platform.
> 
> I think in this case instead of changing the behavior for everyone,
> it's more appropriate to add a quirk for your hardware.  Two ideas:
> 
> * Amend pcie_init() to set slot_cap |= PCI_EXP_SLTCAP_ABP.  Insert
>    this as a quirk right below the existing Thunderbolt quirk and
>    use PCI vendor/device IDs and/or DMI checks to identify your
>    particular hardware.  If the ABP bit is set, PDC events are not
>    enabled by pcie_enable_notification(), so you will solely rely on
>    DLLSC events.  Problem solved.  (Hopefully.)
> 
> * Alternatively, add a DECLARE_PCI_FIXUP_FINAL() quirk which sets
>    pdev->link_active_reporting = false.  This causes pcie_wait_for_link()
>    to wait 1100 msec before the hot-added device's config space is
>    accessed for the first time.

These are both hacks right? We're declaring something untrue about the 
hardware because we want to obtain a specific side-effect. BUt the 
side-effects are not guaranteed not to change.

> Would this work for you?

I'm certain it's doable. In theory one could use the PCI ID of the DP, 
and the vendor and machine names to filter the quirk. But what happens 
in situations where the same PCI ID switch is used in different parts of 
the system? You want the quirk here or not there. It quickly becomes a 
shartstorm.

I'd like a generic solution. I suspect supporting 'in-band PD disabled' 
and quirking for that would be a saner way to do things. If we support 
it, then we need to handle PDSC after DLLSC scenarios anyway.

Alex

[1] PCI-SIG ENGINEERING CHANGE NOTICE
	Async Hot-Plug Updates
	Nov 29, 2018
Lukas Wunner Feb. 12, 2019, 8:30 a.m. UTC | #4
On Mon, Feb 11, 2019 at 11:48:12PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 2/9/19 5:58 AM, Lukas Wunner wrote:
> ECN [1] was approved last November, so it's normative spec text. Sorry 
> if the Ukranians didn't get ahold of it yet. I'll try to explain the delta.
> There's an in-band PD supported/disable set of bits. If 'supported' is 
> set, then you can set 'disable', which removes the 'logical OR" and now 
> PD is only out-of-band presence.

I see, thanks a lot for relaying this information.  The PCI SIG
should probably consider granting access to specs to open source
developers to ease adoption of new features in Linux et al.


> > Table 4-14 in sec 4.2.6 is also important because it shows the difference
> > between Link Up and in-band presence.
> 
> So, we'd expect PD to come up at the same time or before DLL?

Per table 4-14 and figure 4-23 (immediately below the table) in r4.0,
PDC ought to be signaled before DLLSC.  As said, Linux can handle PDC
after DLLSC if they're not too far apart (100 ms, see pcie_wait_for_link()).


> I'd like a generic solution. I suspect supporting 'in-band PD disabled' 
> and quirking for that would be a saner way to do things. If we support 
> it, then we need to handle PDSC after DLLSC scenarios anyway.

I agree, having support for this new ECN would be great.

If you look at struct controller in drivers/pci/hotplug/pciehp.h,
there's a section labelled /* capabilities and quirks */.  An idea
would be to add an unsigned int there to indicate whether in-band
presence is disabled.  An alternative would be to add a flag to
struct pci_dev.

Instead of modifying the logic in pciehp_handle_presence_or_link_change(),
you could amend pcie_wait_for_link() to poll PDS until it's set, in
addition to DLLLA.  The rationale would be that although the link is up,
the hot-added device cannot really be considered accessible until PDS
is also set.  Unfortunately we cannot do this for all devices because
(as I've said before), some broken devices hardwire PDS to zero.  But
it should be safe to constrain it to those which can disable in-band
presence.

Be mindful however that pcie_wait_for_link() is also called from the
DPC driver.  Keith should be able to judge whether a change to that
function breaks DPC.

You'll probably also need to add code to disable in-band presence if
that is supported, either in pcie_init() in pciehp_hpc.c or in generic
PCI code.

Thanks,

Lukas
Lukas Wunner Feb. 12, 2019, 10:37 a.m. UTC | #5
On Tue, Feb 12, 2019 at 09:30:31AM +0100, Lukas Wunner wrote:
> Instead of modifying the logic in pciehp_handle_presence_or_link_change(),
> you could amend pcie_wait_for_link() to poll PDS until it's set, in
> addition to DLLLA.  The rationale would be that although the link is up,
> the hot-added device cannot really be considered accessible until PDS
> is also set.  Unfortunately we cannot do this for all devices because
> (as I've said before), some broken devices hardwire PDS to zero.  But
> it should be safe to constrain it to those which can disable in-band
> presence.

An alternative approach would of course be to only enable one of PDC or
DLLSC events if in-band presence is disabled.  In the case of DLLSC,
you'd also need to modify pciehp_reset_slot() in addition to
pcie_enable_notification().

Thanks,

Lukas
Alex_Gagniuc@Dellteam.com Feb. 12, 2019, 11:57 p.m. UTC | #6
On 2/12/19 2:30 AM, Lukas Wunner wrote:
> The PCI SIG
> should probably consider granting access to specs to open source
> developers to ease adoption of new features in Linux et al.

Don't get me started on just how ridiculous I think PCI-SIG's policy is 
with respect to public availability of standards.

>>> Table 4-14 in sec 4.2.6 is also important because it shows the difference
>>> between Link Up and in-band presence.
>>
>> So, we'd expect PD to come up at the same time or before DLL?
> 
> Per table 4-14 and figure 4-23 (immediately below the table) in r4.0,
> PDC ought to be signaled before DLLSC.  As said, Linux can handle PDC
> after DLLSC if they're not too far apart (100 ms, see pcie_wait_for_link()).
> 
> 
>> I'd like a generic solution. I suspect supporting 'in-band PD disabled'
>> and quirking for that would be a saner way to do things. If we support
>> it, then we need to handle PDSC after DLLSC scenarios anyway.
> 
> I agree, having support for this new ECN would be great.
> 
> If you look at struct controller in drivers/pci/hotplug/pciehp.h,
> there's a section labelled /* capabilities and quirks */.  An idea
> would be to add an unsigned int there to indicate whether in-band
> presence is disabled.  An alternative would be to add a flag to
> struct pci_dev.

> Instead of modifying the logic in pciehp_handle_presence_or_link_change(),
> you could amend pcie_wait_for_link() to poll PDS until it's set, in
> addition to DLLLA.  The rationale would be that although the link is up,
> the hot-added device cannot really be considered accessible until PDS
> is also set.  Unfortunately we cannot do this for all devices because
> (as I've said before), some broken devices hardwire PDS to zero.  But
> it should be safe to constrain it to those which can disable in-band
> presence.


The recommendation is to set the "in-band PD disable" bit, and it's 
possible that platform firmware may have set it at boot time (*). I'm 
not sure that there is a spec-justifiable reason to not access a device 
whose DLL is up, but PD isn't.

(*) A bit hypothetical: There is no hardware yet implementing the ECN.

> Be mindful however that pcie_wait_for_link() is also called from the
> DPC driver.  Keith should be able to judge whether a change to that
> function breaks DPC.

That's why I went for ammending pciehp_handle_presence_or_link_change(). 
Smaller bug surface ;). What I'm thinking at this point is, keep the 
patch as is, but, also check for in-band PD disable before aborting the 
shutdown. Old behavior stays the same.

I'll rework this a little bit for in-band PD disable (what's a good 
acronym for that, BTW?), and then quirk those machines that are known to 
'disable' this in hardware.

Alex
Lukas Wunner Feb. 13, 2019, 8:36 a.m. UTC | #7
On Tue, Feb 12, 2019 at 11:57:55PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> The recommendation is to set the "in-band PD disable" bit, and it's
> possible that platform firmware may have set it at boot time

Ok, then I'd suggest to check in pcie_init() whether the feature is
supported and enable it if so.  And store that fact in a flag, either
in struct pci_dev or struct controller.


> (*) A bit hypothetical: There is no hardware yet implementing the ECN.

Hm, this contradicts Austin Bolen's e-mail of Jan 25 that "Yes, this
platform disables in-band presence" (when asked whether your host
controller already adheres to the ECN).


> I'm 
> not sure that there is a spec-justifiable reason to not access a device 
> whose DLL is up, but PD isn't.

Austin asked in an e-mail of Jan 24 whether "the hot-inserted device's
config space [is] accessed immediately after waiting this 20 + 100 ms
delay", which sounded to me like you'd prefer the device not to be
accessed until PDS is 1.  This can be achieved by polling PDS in
pcie_wait_for_link() if in-band presence is disabled, which is why
I suggested it.


> > Be mindful however that pcie_wait_for_link() is also called from the
> > DPC driver.  Keith should be able to judge whether a change to that
> > function breaks DPC.
> 
> That's why I went for ammending pciehp_handle_presence_or_link_change(). 
> Smaller bug surface ;). What I'm thinking at this point is, keep the 
> patch as is, but, also check for in-band PD disable before aborting the 
> shutdown. Old behavior stays the same.

I'm worried that amending pciehp_handle_presence_or_link_change() makes
the event handling logic more difficult to understand.  Polling PDS in
pcie_wait_for_link() or disabling either PDC or DLLSC if in-band presence
is disabled seems simpler to reason about.


> in-band PD disable (what's a good acronym for that, BTW?)

I don't know, maybe inband_presence_disabled?

Thanks,

Lukas
Alex_Gagniuc@Dellteam.com Feb. 13, 2019, 6:55 p.m. UTC | #8
On 2/13/19 2:36 AM, Lukas Wunner wrote:
>> (*) A bit hypothetical: There is no hardware yet implementing the ECN.
> 
> Hm, this contradicts Austin Bolen's e-mail of Jan 25 that "Yes, this
> platform disables in-band presence" (when asked whether your host
> controller already adheres to the ECN).

Both statements are true. The hardware does indeed disable in-band 
presence, in a rudimentary way that is not compliant with the ECN -- it 
doesn't implement the bits required by the ECN.


>> I'm
>> not sure that there is a spec-justifiable reason to not access a device
>> whose DLL is up, but PD isn't.
> 
> Austin asked in an e-mail of Jan 24 whether "the hot-inserted device's
> config space [is] accessed immediately after waiting this 20 + 100 ms
> delay", which sounded to me like you'd prefer the device not to be
> accessed until PDS is 1.

"Unless Readiness Notifications mechanisms are used (see Section 6.23), 
the Root Complex and/or system software must allow at least 1.0 s after 
a Conventional Reset of a device, before it may determine that a device 
which fails to return a Successful Completion status for a valid 
Configuration Request is a broken device. This period is independent of 
how quickly Link training completes."
	(Section 6.6)

The concern was the one second delay, which is addressed by the use of 
pci_bus_wait_crs().

>>> Be mindful however that pcie_wait_for_link() is also called from the
>>> DPC driver.  Keith should be able to judge whether a change to that
>>> function breaks DPC.
>>
>> That's why I went for ammending pciehp_handle_presence_or_link_change().
>> Smaller bug surface ;). What I'm thinking at this point is, keep the
>> patch as is, but, also check for in-band PD disable before aborting the
>> shutdown. Old behavior stays the same.
> 
> I'm worried that amending pciehp_handle_presence_or_link_change() makes
> the event handling logic more difficult to understand.

Don't worry. It's already difficult to understand ;)

>  Polling PDS in
> pcie_wait_for_link() or disabling either PDC or DLLSC if in-band presence
> is disabled seems simpler to reason about.

pcie_wait_for_link() is generic PCIe layer. I don't think mixing hotplug 
concepts is a good layering violation.

Disabling PDC or DLLSC can work. I've sometimes wondered why we even 
care about PDC. I can imagine situations where platform might want to 
signal imminent removal by yanking PD.

>> in-band PD disable (what's a good acronym for that, BTW?)
> 
> I don't know, maybe inband_presence_disabled?

PCI_EXP_SLTCAP2_IBPD ?
Lukas Wunner Feb. 14, 2019, 7:01 a.m. UTC | #9
On Wed, Feb 13, 2019 at 06:55:46PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 2/13/19 2:36 AM, Lukas Wunner wrote:
> > > (*) A bit hypothetical: There is no hardware yet implementing the ECN.
> > 
> > Hm, this contradicts Austin Bolen's e-mail of Jan 25 that "Yes, this
> > platform disables in-band presence" (when asked whether your host
> > controller already adheres to the ECN).
> 
> Both statements are true. The hardware does indeed disable in-band 
> presence, in a rudimentary way that is not compliant with the ECN -- it 
> doesn't implement the bits required by the ECN.

Ugh, can a BIOS update make those machines compliant to the ECN
or do we need a quirk specifically for them?


> >  Polling PDS in
> > pcie_wait_for_link() or disabling either PDC or DLLSC if in-band presence
> > is disabled seems simpler to reason about.
> 
> pcie_wait_for_link() is generic PCIe layer. I don't think mixing hotplug 
> concepts is a good layering violation.

The function used to live in pciehp_hpc.c, but commits 9f5a70f18c58
and f0157160b359 moved it to generic code to allow code sharing with
the DPC driver.  That's the only reason it's in generic code AFAICS.


> >> in-band PD disable (what's a good acronym for that, BTW?)
> > 
> > I don't know, maybe inband_presence_disabled?
> 
> PCI_EXP_SLTCAP2_IBPD ?

Yes, something like that.  It should match the spec, which I have no
access to.

Thanks,

Lukas
Alex_Gagniuc@Dellteam.com Feb. 14, 2019, 7:26 p.m. UTC | #10
On 2/14/19 1:01 AM, Lukas Wunner wrote:
> On Wed, Feb 13, 2019 at 06:55:46PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>> On 2/13/19 2:36 AM, Lukas Wunner wrote:
>>>> (*) A bit hypothetical: There is no hardware yet implementing the ECN.
>>>
>>> Hm, this contradicts Austin Bolen's e-mail of Jan 25 that "Yes, this
>>> platform disables in-band presence" (when asked whether your host
>>> controller already adheres to the ECN).
>>
>> Both statements are true. The hardware does indeed disable in-band
>> presence, in a rudimentary way that is not compliant with the ECN -- it
>> doesn't implement the bits required by the ECN.
> 
> Ugh, can a BIOS update make those machines compliant to the ECN

No, that is not possible, I'm told.

> or do we need a quirk specifically for them?

We might have to.

> It should match the spec, which I have no access to.

I am as irritated as you that the spec is not readily available for 
public viewing.

Cheers,
Alex
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 3f3df4c29f6e..ea0dd3e956be 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -220,13 +220,23 @@  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 	/*
 	 * If the slot is on and presence or link has changed, turn it off.
 	 * Even if it's occupied again, we cannot assume the card is the same.
+	 * When the card is swapped, we also expect a change in link state,
+	 * without which, it's likely presence became high after link-active.
 	 */
 	mutex_lock(&ctrl->state_lock);
+	present = pciehp_card_present(ctrl);
+	link_active = pciehp_check_link_active(ctrl);
 	switch (ctrl->state) {
 	case BLINKINGOFF_STATE:
 		cancel_delayed_work(&ctrl->button_work);
 		/* fall through */
 	case ON_STATE:
+		if (present && link_active &&
+		   !(events & PCI_EXP_SLTSTA_DLLSC)) {
+			mutex_unlock(&ctrl->state_lock);
+			ctrl_warn(ctrl, "Hardware bug: Presence state came up after link");
+			return;
+		}
 		ctrl->state = POWEROFF_STATE;
 		mutex_unlock(&ctrl->state_lock);
 		if (events & PCI_EXP_SLTSTA_DLLSC)