diff mbox

[RFC,4/4] thunderbolt: Support runtime pm

Message ID 4560ddb26a2bb15eeaf8eab565ac41c35e36d092.1458126755.git.lukas@wunner.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lukas Wunner March 16, 2016, 2:50 p.m. UTC
Document and implement Apple's ACPI-based (but nonstandard) mechanism
to power the controller up and down as needed.

This fixes (at least partially) a power regression introduced in
Linux 3.17 by 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").

A Thunderbolt controller consists of an NHI (Native Host Interface) and
a set of bridges. Power is cut to the entire chip. The Linux pm model
assumes that runtime pm is governed by the parent device, i.e. the
upstream bridge driver, pcieport. In violation of this model we let a
child govern it, i.e. the NHI driver thunderbolt.ko. The traditional
hierarchical pm model is defeated by setting ignore_children on the
upstream bridge and downstream bridge 0, and by having the NHI update
all the bridges' runtime pm state in unison with itself. It is also the
NHI driver's job to save and restore PCI state of the bridges.

PCIe Port --- Upstream Bridge --+
                                |
                                +-- Downstream Bridge 0 --+
                                |                         |
                                |                         +-- NHI
                                |
                                +-- Downstream Bridge 1 ...
                                |
                                +-- Downstream Bridge 2 ... hotplugged
                                |                           devices
                                +-- Downstream Bridge 3 ...
                                |
                                +-- Downstream Bridge 4 ...

The PCI subsystem pm_ops do not work properly for devices which can be
put into D3cold by some other means than the standard _PSx ACPI platform
methods: We do not want to wake up the chip before system sleep, yet
pci_pm_prepare() does not return 1 as it should since pci_target_state()
returns D3hot. We solve this by overriding pci_pm_prepare() using power
domains. They are assigned to the bridges using a PCI quirk. We also do
not want to wake the chip after system resume as pci_pm_complete() does,
so we override that as well. Note that we can never remove and free the
dev_pm_domain assigned to the bridges as there is no PCI remove fixup
section. We also cannot bail out of the ->probe callback if allocation
of the struct dev_pm_domain fails since the PCI enable fixup does not
allow return values to be passed back.

It might be possible to implement a less kludgy solution which adheres
to the hierarchical pm model and does not need a PCI enable quirk for
the bridges if pcieport had runtime pm support both for itself and
any service drivers registering with it. The runtime pm code could
then be moved from the NHI to a new Thunderbolt service driver that
gets used on the upstream bridge.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92111
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/quirks.c         |  35 ++++++
 drivers/thunderbolt/Kconfig  |   2 +-
 drivers/thunderbolt/nhi.c    |   4 +
 drivers/thunderbolt/nhi.h    |   3 +
 drivers/thunderbolt/power.c  | 247 +++++++++++++++++++++++++++++++++++++++++++
 drivers/thunderbolt/power.h  |   3 +
 drivers/thunderbolt/switch.c |   9 ++
 drivers/thunderbolt/tb.c     |   6 ++
 8 files changed, 308 insertions(+), 1 deletion(-)

Comments

Alan Stern March 16, 2016, 3:26 p.m. UTC | #1
On Wed, 16 Mar 2016, Lukas Wunner wrote:

> Document and implement Apple's ACPI-based (but nonstandard) mechanism
> to power the controller up and down as needed.
> 
> This fixes (at least partially) a power regression introduced in
> Linux 3.17 by 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
> 
> A Thunderbolt controller consists of an NHI (Native Host Interface) and
> a set of bridges. Power is cut to the entire chip. The Linux pm model
> assumes that runtime pm is governed by the parent device, i.e. the
> upstream bridge driver, pcieport. In violation of this model we let a
> child govern it, i.e. the NHI driver thunderbolt.ko. The traditional

The NHI driver is bound to bridge 0?  Your diagram indicates this but 
you don't say so explicitly.

> hierarchical pm model is defeated by setting ignore_children on the
> upstream bridge and downstream bridge 0, and by having the NHI update
> all the bridges' runtime pm state in unison with itself. It is also the
> NHI driver's job to save and restore PCI state of the bridges.
> 
> PCIe Port --- Upstream Bridge --+
>                                 |
>                                 +-- Downstream Bridge 0 --+
>                                 |                         |
>                                 |                         +-- NHI
>                                 |
>                                 +-- Downstream Bridge 1 ...
>                                 |
>                                 +-- Downstream Bridge 2 ... hotplugged
>                                 |                           devices
>                                 +-- Downstream Bridge 3 ...
>                                 |
>                                 +-- Downstream Bridge 4 ...

This may be a naive question: The diagram indicates a single upstream 
bridge attached to a bunch of downstream bridges with nothing in 
between.  Is that really how the kernel treats Thunderbolt controllers?

In all other controllers that I'm familiar with, there's a device to 
represent the controller, another device representing its upward link, 
and a bunch of devices representing the downward links.  The analogous 
approach here would make bridges 1 ... n children of bridge 0 (which 
sounds strange but might make more sense in the end).

The way you're doing it, how does the NHI driver know when to go into 
suspend?  The runtime PM core won't notify it when all the hotplugged 
devices attached to the other bridges have been suspended, since it's 
not their parent.

> The PCI subsystem pm_ops do not work properly for devices which can be
> put into D3cold by some other means than the standard _PSx ACPI platform
> methods: We do not want to wake up the chip before system sleep, yet
> pci_pm_prepare() does not return 1 as it should since pci_target_state()
> returns D3hot. We solve this by overriding pci_pm_prepare() using power
> domains. They are assigned to the bridges using a PCI quirk. We also do
> not want to wake the chip after system resume as pci_pm_complete() does,
> so we override that as well. Note that we can never remove and free the
> dev_pm_domain assigned to the bridges as there is no PCI remove fixup
> section. We also cannot bail out of the ->probe callback if allocation
> of the struct dev_pm_domain fails since the PCI enable fixup does not
> allow return values to be passed back.
> 
> It might be possible to implement a less kludgy solution which adheres
> to the hierarchical pm model and does not need a PCI enable quirk for
> the bridges if pcieport had runtime pm support both for itself and
> any service drivers registering with it. The runtime pm code could
> then be moved from the NHI to a new Thunderbolt service driver that
> gets used on the upstream bridge.

Or you could interpose another device structure between the upstream 
bridge and all the downstream bridges.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner March 16, 2016, 4:20 p.m. UTC | #2
Hi Alan,

On Wed, Mar 16, 2016 at 11:26:54AM -0400, Alan Stern wrote:
> On Wed, 16 Mar 2016, Lukas Wunner wrote:
> 
> > Document and implement Apple's ACPI-based (but nonstandard) mechanism
> > to power the controller up and down as needed.
> > 
> > This fixes (at least partially) a power regression introduced in
> > Linux 3.17 by 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
> > 
> > A Thunderbolt controller consists of an NHI (Native Host Interface) and
> > a set of bridges. Power is cut to the entire chip. The Linux pm model
> > assumes that runtime pm is governed by the parent device, i.e. the
> > upstream bridge driver, pcieport. In violation of this model we let a
> > child govern it, i.e. the NHI driver thunderbolt.ko. The traditional
> 
> The NHI driver is bound to bridge 0?  Your diagram indicates this but 
> you don't say so explicitly.

No, the NHI driver is bound to the NHI, that's a PCI device sitting on
a bus behind Downstream Bridge 0.

E.g. on a MacBookPro11,3 with a Falcon Ridge 4C it looks like this:

Upstream Bridge:
	06:00.0 PCI bridge [0604]: Intel Corporation Device [8086:156d]
	Bus: primary=06, secondary=07, subordinate=6c, sec-latency=0
Downstream Bridges:
	07:00.0 PCI bridge [0604]: Intel Corporation Device [8086:156d]
	Bus: primary=07, secondary=08, subordinate=08, sec-latency=0
	07:03.0 PCI bridge [0604]: Intel Corporation Device [8086:156d]
	Bus: primary=07, secondary=09, subordinate=39, sec-latency=0
	07:04.0 PCI bridge [0604]: Intel Corporation Device [8086:156d]
	Bus: primary=07, secondary=3a, subordinate=3a, sec-latency=0
	07:05.0 PCI bridge [0604]: Intel Corporation Device [8086:156d]
	Bus: primary=07, secondary=3b, subordinate=6b, sec-latency=0
	07:06.0 PCI bridge [0604]: Intel Corporation Device [8086:156d]
	Bus: primary=07, secondary=6c, subordinate=6c, sec-latency=0
NHI:
	08:00.0 System peripheral [0880]: Intel Corporation Device [8086:156c]
	Subsystem: Device [2222:1111]


> > hierarchical pm model is defeated by setting ignore_children on the
> > upstream bridge and downstream bridge 0, and by having the NHI update
> > all the bridges' runtime pm state in unison with itself. It is also the
> > NHI driver's job to save and restore PCI state of the bridges.
> > 
> > PCIe Port --- Upstream Bridge --+
> >                                 |
> >                                 +-- Downstream Bridge 0 --+
> >                                 |                         |
> >                                 |                         +-- NHI
> >                                 |
> >                                 +-- Downstream Bridge 1 ...
> >                                 |
> >                                 +-- Downstream Bridge 2 ... hotplugged
> >                                 |                           devices
> >                                 +-- Downstream Bridge 3 ...
> >                                 |
> >                                 +-- Downstream Bridge 4 ...
> 
> This may be a naive question: The diagram indicates a single upstream 
> bridge attached to a bunch of downstream bridges with nothing in 
> between.  Is that really how the kernel treats Thunderbolt controllers?

There's a bus in-between, bus 07 in the example above.
Buses are signified by a vertical line in this ascii drawing.


> In all other controllers that I'm familiar with, there's a device to 
> represent the controller, another device representing its upward link, 
> and a bunch of devices representing the downward links.  The analogous 
> approach here would make bridges 1 ... n children of bridge 0 (which 
> sounds strange but might make more sense in the end).
> 
> The way you're doing it, how does the NHI driver know when to go into 
> suspend?  The runtime PM core won't notify it when all the hotplugged 
> devices attached to the other bridges have been suspended, since it's 
> not their parent.

The NHI knows when something is plugged in, it talks to the switches
in devices that are hotplugged to the controller. As I've explained
in the lengthy comment in the middle of patch [4/4], we acquire a
runtime pm ref for each switch that is plugged in and release one
whenever a switch is unplugged.


> > The PCI subsystem pm_ops do not work properly for devices which can be
> > put into D3cold by some other means than the standard _PSx ACPI platform
> > methods: We do not want to wake up the chip before system sleep, yet
> > pci_pm_prepare() does not return 1 as it should since pci_target_state()
> > returns D3hot. We solve this by overriding pci_pm_prepare() using power
> > domains. They are assigned to the bridges using a PCI quirk. We also do
> > not want to wake the chip after system resume as pci_pm_complete() does,
> > so we override that as well. Note that we can never remove and free the
> > dev_pm_domain assigned to the bridges as there is no PCI remove fixup
> > section. We also cannot bail out of the ->probe callback if allocation
> > of the struct dev_pm_domain fails since the PCI enable fixup does not
> > allow return values to be passed back.
> > 
> > It might be possible to implement a less kludgy solution which adheres
> > to the hierarchical pm model and does not need a PCI enable quirk for
> > the bridges if pcieport had runtime pm support both for itself and
> > any service drivers registering with it. The runtime pm code could
> > then be moved from the NHI to a new Thunderbolt service driver that
> > gets used on the upstream bridge.
> 
> Or you could interpose another device structure between the upstream 
> bridge and all the downstream bridges.

How? The structure is predetermined by the way the PCI devices and
bridges are connected to each other. That was Intel's idea.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern March 17, 2016, 2:54 p.m. UTC | #3
On Wed, 16 Mar 2016, Lukas Wunner wrote:

> > In all other controllers that I'm familiar with, there's a device to 
> > represent the controller, another device representing its upward link, 
> > and a bunch of devices representing the downward links.  The analogous 
> > approach here would make bridges 1 ... n children of bridge 0 (which 
> > sounds strange but might make more sense in the end).
> > 
> > The way you're doing it, how does the NHI driver know when to go into 
> > suspend?  The runtime PM core won't notify it when all the hotplugged 
> > devices attached to the other bridges have been suspended, since it's 
> > not their parent.
> 
> The NHI knows when something is plugged in, it talks to the switches
> in devices that are hotplugged to the controller. As I've explained
> in the lengthy comment in the middle of patch [4/4], we acquire a
> runtime pm ref for each switch that is plugged in and release one
> whenever a switch is unplugged.

If I understand correctly, that means you allow the Thunderbolt
controller to go into runtime suspend only when nothing is plugged into
any of the ports.  Is that right?  It's quite inefficient.

> > > The PCI subsystem pm_ops do not work properly for devices which can be
> > > put into D3cold by some other means than the standard _PSx ACPI platform
> > > methods: We do not want to wake up the chip before system sleep, yet
> > > pci_pm_prepare() does not return 1 as it should since pci_target_state()
> > > returns D3hot. We solve this by overriding pci_pm_prepare() using power
> > > domains. They are assigned to the bridges using a PCI quirk. We also do
> > > not want to wake the chip after system resume as pci_pm_complete() does,
> > > so we override that as well. Note that we can never remove and free the
> > > dev_pm_domain assigned to the bridges as there is no PCI remove fixup
> > > section. We also cannot bail out of the ->probe callback if allocation
> > > of the struct dev_pm_domain fails since the PCI enable fixup does not
> > > allow return values to be passed back.
> > > 
> > > It might be possible to implement a less kludgy solution which adheres
> > > to the hierarchical pm model and does not need a PCI enable quirk for
> > > the bridges if pcieport had runtime pm support both for itself and
> > > any service drivers registering with it. The runtime pm code could
> > > then be moved from the NHI to a new Thunderbolt service driver that
> > > gets used on the upstream bridge.
> > 
> > Or you could interpose another device structure between the upstream 
> > bridge and all the downstream bridges.
> 
> How? The structure is predetermined by the way the PCI devices and
> bridges are connected to each other. That was Intel's idea.

What I'm getting at is that we should have proper runtime-PM support
for bridges, i.e., I agree with what you wrote above.  A bridge can
safely go into runtime suspend when there are no unsuspended devices
attached to any of its downstream ports.  (That's how the USB hub
driver works, for instance.)  Doing things that way would make
everything simpler in the long run.

So my suggestion is that you change over to the "less kludgy solution"  
and work on that instead.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Noever March 20, 2016, 1:53 p.m. UTC | #4
On Wed, Mar 16, 2016 at 3:50 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Document and implement Apple's ACPI-based (but nonstandard) mechanism
> to power the controller up and down as needed.
>
> This fixes (at least partially) a power regression introduced in
> Linux 3.17 by 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
>
> A Thunderbolt controller consists of an NHI (Native Host Interface) and
> a set of bridges. Power is cut to the entire chip. The Linux pm model
> assumes that runtime pm is governed by the parent device, i.e. the
> upstream bridge driver, pcieport. In violation of this model we let a
> child govern it, i.e. the NHI driver thunderbolt.ko. The traditional
> hierarchical pm model is defeated by setting ignore_children on the
> upstream bridge and downstream bridge 0, and by having the NHI update
> all the bridges' runtime pm state in unison with itself. It is also the
> NHI driver's job to save and restore PCI state of the bridges.
>
> PCIe Port --- Upstream Bridge --+
>                                 |
>                                 +-- Downstream Bridge 0 --+
>                                 |                         |
>                                 |                         +-- NHI
>                                 |
>                                 +-- Downstream Bridge 1 ...
>                                 |
>                                 +-- Downstream Bridge 2 ... hotplugged
>                                 |                           devices
>                                 +-- Downstream Bridge 3 ...
>                                 |
>                                 +-- Downstream Bridge 4 ...
>
> The PCI subsystem pm_ops do not work properly for devices which can be
> put into D3cold by some other means than the standard _PSx ACPI platform
> methods: We do not want to wake up the chip before system sleep, yet
> pci_pm_prepare() does not return 1 as it should since pci_target_state()
> returns D3hot. We solve this by overriding pci_pm_prepare() using power
> domains. They are assigned to the bridges using a PCI quirk. We also do
> not want to wake the chip after system resume as pci_pm_complete() does,
> so we override that as well. Note that we can never remove and free the
> dev_pm_domain assigned to the bridges as there is no PCI remove fixup
> section. We also cannot bail out of the ->probe callback if allocation
> of the struct dev_pm_domain fails since the PCI enable fixup does not
> allow return values to be passed back.
>
> It might be possible to implement a less kludgy solution which adheres
> to the hierarchical pm model and does not need a PCI enable quirk for
> the bridges if pcieport had runtime pm support both for itself and
> any service drivers registering with it. The runtime pm code could
> then be moved from the NHI to a new Thunderbolt service driver that
> gets used on the upstream bridge.

Hi Lukas,

thanks for implementing this. I have tested it on my my MacBook Pro
with CactusRidge and got it to work with a few modifications. Saves
about 4 watts of power form me!

- My firmware does not provide the TRPE ACPI method, only XRPE. So
either TRPE is only post CactusRidge or it is only present in newer
MBPs. In any case the OS X driver looks for TRPE first and uses XRPE
only if TRPE does not exists. I suggest we do the same (but see below
for TRPE).
- The XRIN GPE fired immediately after the power was cut. The problem
seems to be that the controller takes a bit to shut down. The solution
is to poll until XRIL returns 1 before activating the GPE. On "Type 2"
devices the OS X driver polls up to 300 times with a 1ms sleep in
between (for me 1 or 2 iterations were always enough). Afaik no
polling is done on "Type 1" devices. (Fun fact: Compiling with the
kernel address sanitizer makes the kernel go slow enough such that
this is not necessary:)). Also the OS X interrupt handler checks XRIL
and only wakes up the device if it returns 0. This was not necessary
to do on my model - but maybe spurious interrupts can happen with
newer controllers?.


Concerning TRPE style hardware: It seems that pm is more complicated
here. I see a bunch of references to SX* ACPI methods (SXFP, SXLV,
SXIO) and have not jet figured out what they do. Maybe we should not
enable PM if XRPE is not present until we find someone to test it.

I don't have any experience with the runtime pm core. But the
thunderbolt side looks good.

As you have noted the "correct" place to but this logic would be at
the upstream bridge. Ideally the downstream bridges should go into
D3hot by themselves if no devices are attached. The NHI as well (did
you by chance check whether the NHI can be put into D3hot without
killing the thunderbolt tunnels?). And then the upstream bridge would
go to D3cold (and thus power down the whole subtree). If I recall
correctly there were two problems:
1. PCI bridges do currently not suspend themselves at all
2. How to teach the upstream bridge about D3cold.

(1) should be possible to fix? For (2): D3Cold always requires a
platform specific mechanism and the pci subsystem only supports ACPI.
Would it be possible to add an API to tell the pci subsystem that we
know how to put a specific device(tree) into D3Cold from a platform
driver [+CC Bjorn]? Then this whole thing would become a normal pci
suspend operation.

Regards,
Andreas



> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92111
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Cc: Andreas Noever <andreas.noever@gmail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/quirks.c         |  35 ++++++
>  drivers/thunderbolt/Kconfig  |   2 +-
>  drivers/thunderbolt/nhi.c    |   4 +
>  drivers/thunderbolt/nhi.h    |   3 +
>  drivers/thunderbolt/power.c  | 247 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/thunderbolt/power.h  |   3 +
>  drivers/thunderbolt/switch.c |   9 ++
>  drivers/thunderbolt/tb.c     |   6 ++
>  8 files changed, 308 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d1e3956..a007485 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -25,6 +25,7 @@
>  #include <linux/sched.h>
>  #include <linux/ktime.h>
>  #include <linux/mm.h>
> +#include <linux/pm_domain.h>
>  #include <asm/dma.h>   /* isa_dma_bridge_buggy */
>  #include "pci.h"
>
> @@ -3255,6 +3256,40 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL,
>  DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL,
>                                PCI_DEVICE_ID_INTEL_FALCON_RIDGE_4C_BRIDGE,
>                                quirk_apple_wait_for_thunderbolt);
> +
> +static int bridge_prepare(struct device *dev)
> +{
> +       return 1; /* stay asleep if already runtime suspended */
> +}
> +
> +static void quirk_apple_thunderbolt_runpm(struct pci_dev *dev)
> +{
> +       struct dev_pm_domain *bridge_pm_domain;
> +
> +       if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
> +               return;
> +       if ((dev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
> +               return;
> +       if (dev->dev.pm_domain)
> +               return;

Bridges in Hotplugged TB devices might have the same PCI ids as the
"root" bridges (if they use the same TB chip). You probably should
check that dev is a bridge of the builtin controller (for example by
checking for the presence of ACPI methods, see the comment in the
other tb quirks).

> +
> +       bridge_pm_domain = kzalloc(sizeof(*bridge_pm_domain), GFP_KERNEL);
> +       if (!bridge_pm_domain) {
> +               dev_err(&dev->dev, "cannot allocate pm_domain\n");
> +               return;
> +       }
> +
> +       bridge_pm_domain->ops          = *pci_bus_type.pm;
> +       bridge_pm_domain->ops.prepare  = bridge_prepare;
> +       bridge_pm_domain->ops.complete = NULL;
> +       dev_pm_domain_set(&dev->dev, bridge_pm_domain);
> +}
> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL,
> +                        PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
> +                        quirk_apple_thunderbolt_runpm);
> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL,
> +                        PCI_DEVICE_ID_INTEL_FALCON_RIDGE_4C_BRIDGE,
> +                        quirk_apple_thunderbolt_runpm);
>  #endif
>
>  static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
> diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
> index c121acc..40335f7 100644
> --- a/drivers/thunderbolt/Kconfig
> +++ b/drivers/thunderbolt/Kconfig
> @@ -1,6 +1,6 @@
>  menuconfig THUNDERBOLT
>         tristate "Thunderbolt support for Apple devices"
> -       depends on PCI
> +       depends on PCI && ACPI
>         select CRC32
>         help
>           Cactus Ridge Thunderbolt Controller driver
> diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
> index fa89160..964b006 100644
> --- a/drivers/thunderbolt/nhi.c
> +++ b/drivers/thunderbolt/nhi.c
> @@ -588,6 +588,8 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>         }
>         pci_set_drvdata(pdev, tb);
>
> +       nhi_runtime_pm_init(nhi);
> +
>         return 0;
>  }
>
> @@ -595,6 +597,8 @@ static void nhi_remove(struct pci_dev *pdev)
>  {
>         struct tb *tb = pci_get_drvdata(pdev);
>         struct tb_nhi *nhi = tb->nhi;
> +
> +       nhi_runtime_pm_fini(nhi);
>         thunderbolt_shutdown_and_free(tb);
>         nhi_shutdown(nhi);
>  }
> diff --git a/drivers/thunderbolt/nhi.h b/drivers/thunderbolt/nhi.h
> index 3172429..dd725f7 100644
> --- a/drivers/thunderbolt/nhi.h
> +++ b/drivers/thunderbolt/nhi.h
> @@ -7,6 +7,7 @@
>  #ifndef DSL3510_H_
>  #define DSL3510_H_
>
> +#include <linux/acpi.h>
>  #include <linux/mutex.h>
>  #include <linux/workqueue.h>
>
> @@ -25,6 +26,8 @@ struct tb_nhi {
>         struct tb_ring **rx_rings;
>         struct work_struct interrupt_work;
>         u32 hop_count; /* Number of rings (end point hops) supported by NHI. */
> +       unsigned long long wake_gpe; /* Hotplug interrupt during powerdown. */
> +       acpi_handle set_power; /* Method to power controller up/down. */
>  };
>
>  /**
> diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c
> index 1095ad0..cc83940 100644
> --- a/drivers/thunderbolt/power.c
> +++ b/drivers/thunderbolt/power.c
> @@ -2,11 +2,15 @@
>   * Thunderbolt Cactus Ridge driver - power management
>   *
>   * Copyright (c) 2014 Andreas Noever <andreas.noever@gmail.com>
> + * Copyright (c) 2016 Lukas Wunner <lukas@wunner.de>
>   */
>
> +#include <linux/delay.h>
>  #include <linux/pci.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pm_runtime.h>
>
> +#include "nhi.h"
>  #include "tb.h"
>
>  static int nhi_suspend_noirq(struct device *dev)
> @@ -39,3 +43,246 @@ const struct dev_pm_ops nhi_pm_ops = {
>                                              */
>         .restore_noirq = nhi_resume_noirq,
>  };
> +
> +/*
> + * Runtime Power Management
> + *
> + * Apple provides the following means for runtime pm in ACPI:
> + *
> + * * XRPE method (TRPE on Cactus Ridge and newer), takes argument 1 or 0,
> + *   toggles a GPIO pin to switch the controller on or off.
> + * * XRIN named object (alternatively _GPE), contains number of a GPE which
> + *   fires as long as something is plugged in (regardless of power state).
> + * * XRIL method returns 0 as long as something is plugged in, 1 otherwise.
> + * * XRIP + XRIO methods, unused by OS X driver. (Flip interrupt polarity?)
> + *
> + * If there are multiple Thunderbolt controllers (e.g. MacPro6,1), each NHI
> + * device has a separate XRIN GPE and separate instances of these methods.
> + *
> + * We acquire a runtime pm ref for each newly allocated switch (except for
> + * the root switch) and drop one when a switch is freed. The controller is
> + * thus powered up as long as something is plugged in. This behaviour is
> + * identical to the OS X driver.
> + *
> + * Powering the controller down is almost instantaneous, but powering up takes
> + * about 2 sec. To handle situations gracefully where a device is unplugged
> + * and immediately replaced by another one, we afford a grace period of 10 sec
> + * before powering down. This autosuspend_delay_ms may be reduced to 0 via
> + * sysfs and to handle that properly we need to wait during runtime_resume
> + * since it takes about 0.7 sec after resuming until a hotplug event appears.
> + *
> + * When the system wakes from suspend-to-RAM, the controller's power state is
> + * as it was before. However if it was powered down, calling XRPE once to power
> + * it up is not sufficient: An additional call to XRPE is necessary to reset
> + * the power switch first.
> + */
> +
> +static int nhi_prepare(struct device *dev)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +       struct tb *tb = pci_get_drvdata(pdev);
> +       acpi_status res;
> +
> +       if (pm_runtime_active(dev))
> +               return 0;
> +
> +       res = acpi_disable_gpe(NULL, tb->nhi->wake_gpe);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot disable wake GPE, resuming\n");
> +               return 0;
> +       } else
> +               return 1; /* stay asleep if already runtime suspended */
> +}
> +
> +static void nhi_complete(struct device *dev)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +       struct tb *tb = pci_get_drvdata(pdev);
> +       acpi_status res;
> +
> +       if (pm_runtime_active(dev))
> +               return;
> +
> +       tb_info(tb, "resetting power switch\n");
> +       res = acpi_execute_simple_method(tb->nhi->set_power, NULL, 0);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot call set_power method\n");
> +               dev->power.runtime_error = -ENODEV;
> +       }
> +
> +       res = acpi_enable_gpe(NULL, tb->nhi->wake_gpe);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot enable wake GPE, resuming\n");
> +               pm_request_resume(dev);
> +       }
> +}
> +
> +static int pci_save_state_cb(struct pci_dev *pdev, void *ptr)
> +{
> +       pci_save_state(pdev);
> +       if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> +               pm_runtime_disable(&pdev->dev);
> +               pm_runtime_set_suspended(&pdev->dev);
> +               pm_runtime_enable(&pdev->dev);
> +       }
> +       pdev->current_state = PCI_D3cold;
> +       return 0;
> +}
> +
> +static int pci_restore_state_cb(struct pci_dev *pdev, void *ptr)
> +{
> +       pdev->current_state = PCI_D0;
> +       if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> +               pm_runtime_disable(&pdev->dev);
> +               pm_runtime_set_active(&pdev->dev);
> +               pm_runtime_enable(&pdev->dev);
> +       }
> +       pci_restore_state(pdev);
> +       return 0;
> +}
> +
> +static int nhi_runtime_suspend(struct device *dev)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +       struct pci_bus *upstream_bridge = pdev->bus->parent->parent;
> +       struct tb *tb = pci_get_drvdata(pdev);
> +       acpi_status res;
> +
> +       if (!pdev->d3cold_allowed)
> +               return -EAGAIN;
> +
> +       thunderbolt_suspend(tb);
> +       pci_walk_bus(upstream_bridge, pci_save_state_cb, NULL);
> +
> +       tb_info(tb, "powering down\n");
> +       res = acpi_execute_simple_method(tb->nhi->set_power, NULL, 0);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot call set_power method, resuming\n");
> +               goto err;
> +       }
> +
> +       res = acpi_enable_gpe(NULL, tb->nhi->wake_gpe);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot enable wake GPE, resuming\n");
> +               goto err;
> +       }
> +
> +       return 0;
> +
> +err:
> +       acpi_execute_simple_method(tb->nhi->set_power, NULL, 1);
> +       pci_walk_bus(upstream_bridge, pci_restore_state_cb, NULL);
> +       thunderbolt_resume(tb);
> +       return -EAGAIN;
> +}
> +
> +static int nhi_runtime_resume(struct device *dev)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +       struct pci_bus *upstream_bridge = pdev->bus->parent->parent;
> +       struct tb *tb = pci_get_drvdata(pdev);
> +       acpi_status res;
> +
> +       if (system_state >= SYSTEM_HALT)
> +               return -ESHUTDOWN;
> +
> +       res = acpi_disable_gpe(NULL, tb->nhi->wake_gpe);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot disable wake GPE, disabling runtime pm\n");
> +               pm_runtime_disable(dev);
> +       }
> +
> +       tb_info(tb, "powering up\n");
> +       res = acpi_execute_simple_method(tb->nhi->set_power, NULL, 1);
> +       if (ACPI_FAILURE(res)) {
> +               dev_err(dev, "cannot call set_power method\n");
> +               return -ENODEV;
> +       }
> +
> +       pci_walk_bus(upstream_bridge, pci_restore_state_cb, NULL);
> +       thunderbolt_resume(tb);
> +       msleep(1500); /* allow 1.5 sec for hotplug event to arrive */
> +       pm_runtime_mark_last_busy(dev);
> +
> +       return 0;
> +}
> +
> +static u32 nhi_runtime_wake(acpi_handle gpe_device, u32 gpe_number, void *ctx)
> +{
> +       struct device *dev = ctx;
> +       WARN_ON(pm_request_resume(dev) < 0);
> +       return ACPI_INTERRUPT_HANDLED;
> +}
> +
> +static struct dev_pm_domain nhi_pm_domain;
> +
> +void nhi_runtime_pm_init(struct tb_nhi *nhi)
> +{
> +       struct device *dev = &nhi->pdev->dev;
> +       struct acpi_handle *nhi_handle = ACPI_HANDLE(dev);
> +       acpi_status res;
> +
> +       /* gen 1 controllers use XRPE, gen 2+ controllers use TRPE */
> +       if (nhi->pdev->device <= PCI_DEVICE_ID_INTEL_EAGLE_RIDGE)
> +               res = acpi_get_handle(nhi_handle, "XRPE", &nhi->set_power);
> +       else
> +               res = acpi_get_handle(nhi_handle, "TRPE", &nhi->set_power);
> +       if (ACPI_FAILURE(res)) {
> +               dev_warn(dev, "cannot find set_power method, disabling runtime pm\n");
> +               goto err;
> +       }
> +
> +       res = acpi_evaluate_integer(nhi_handle, "XRIN", NULL, &nhi->wake_gpe);
> +       if (ACPI_FAILURE(res)) {
> +               dev_warn(dev, "cannot find wake GPE, disabling runtime pm\n");
> +               goto err;
> +       }
> +
> +       res = acpi_install_gpe_handler(NULL, nhi->wake_gpe,
> +                                      ACPI_GPE_LEVEL_TRIGGERED,
> +                                      nhi_runtime_wake, dev);
> +       if (ACPI_FAILURE(res)) {
> +               dev_warn(dev, "cannot install GPE handler, disabling runtime pm\n");
> +               goto err;
> +       }
> +
> +       nhi_pm_domain.ops                 = *pci_bus_type.pm;
> +       nhi_pm_domain.ops.prepare         = nhi_prepare;
> +       nhi_pm_domain.ops.complete        = nhi_complete;
> +       nhi_pm_domain.ops.runtime_suspend = nhi_runtime_suspend;
> +       nhi_pm_domain.ops.runtime_resume  = nhi_runtime_resume;
> +       dev_pm_domain_set(dev, &nhi_pm_domain);
> +
> +       /* apply to upstream bridge and downstream bridge 0 */
> +       pm_suspend_ignore_children(dev->parent->parent, true);
> +       pm_suspend_ignore_children(dev->parent, true);
> +
> +       pm_runtime_allow(dev);
> +       pm_runtime_set_autosuspend_delay(dev, 10000);
> +       pm_runtime_use_autosuspend(dev);
> +       pm_runtime_mark_last_busy(dev);
> +       pm_runtime_put(dev);
> +       return;
> +
> +err:
> +       nhi->wake_gpe = -1;
> +       if (pm_runtime_enabled(dev))
> +               pm_runtime_disable(dev);
> +}
> +
> +void nhi_runtime_pm_fini(struct tb_nhi *nhi)
> +{
> +       struct device *dev = &nhi->pdev->dev;
> +       acpi_status res;
> +
> +       if (nhi->wake_gpe == -1)
> +               return;
> +
> +       res = acpi_remove_gpe_handler(NULL, nhi->wake_gpe, nhi_runtime_wake);
> +       if (ACPI_FAILURE(res))
> +               dev_warn(dev, "cannot remove GPE handler\n");
> +
> +       pm_runtime_get(dev);
> +       pm_runtime_forbid(dev);
> +       dev_pm_domain_set(dev, NULL);
> +}
> diff --git a/drivers/thunderbolt/power.h b/drivers/thunderbolt/power.h
> index 99cb900..4fc836d 100644
> --- a/drivers/thunderbolt/power.h
> +++ b/drivers/thunderbolt/power.h
> @@ -11,4 +11,7 @@
>
>  extern const struct dev_pm_ops nhi_pm_ops;
>
> +void nhi_runtime_pm_fini(struct tb_nhi *nhi);
> +void nhi_runtime_pm_init(struct tb_nhi *nhi);
> +
>  #endif
> diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
> index c6270f0..e9be3d5 100644
> --- a/drivers/thunderbolt/switch.c
> +++ b/drivers/thunderbolt/switch.c
> @@ -5,6 +5,7 @@
>   */
>
>  #include <linux/delay.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/slab.h>
>
>  #include "tb.h"
> @@ -326,6 +327,11 @@ void tb_switch_free(struct tb_switch *sw)
>         if (!sw->is_unplugged)
>                 tb_plug_events_active(sw, false);
>
> +       if (sw != sw->tb->root_switch) {
> +               pm_runtime_mark_last_busy(&sw->tb->nhi->pdev->dev);
> +               pm_runtime_put(&sw->tb->nhi->pdev->dev);
> +       }
> +
>         kfree(sw->ports);
>         kfree(sw->drom);
>         kfree(sw);
> @@ -417,6 +423,9 @@ struct tb_switch *tb_switch_alloc(struct tb *tb, u64 route)
>         if (tb_plug_events_active(sw, true))
>                 goto err;
>
> +       if (tb->root_switch)
> +               pm_runtime_get(&tb->nhi->pdev->dev);
> +
>         return sw;
>  err:
>         kfree(sw->ports);
> diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
> index 24b6d30..c33d3f1 100644
> --- a/drivers/thunderbolt/tb.c
> +++ b/drivers/thunderbolt/tb.c
> @@ -7,6 +7,7 @@
>  #include <linux/slab.h>
>  #include <linux/errno.h>
>  #include <linux/delay.h>
> +#include <linux/pm_runtime.h>
>
>  #include "tb.h"
>  #include "tb_regs.h"
> @@ -217,8 +218,11 @@ static void tb_handle_hotplug(struct work_struct *work)
>  {
>         struct tb_hotplug_event *ev = container_of(work, typeof(*ev), work);
>         struct tb *tb = ev->tb;
> +       struct device *dev = &tb->nhi->pdev->dev;
>         struct tb_switch *sw;
>         struct tb_port *port;
> +
> +       pm_runtime_get(dev);
>         mutex_lock(&tb->lock);
>         if (!tb->hotplug_active)
>                 goto out; /* during init, suspend or shutdown */
> @@ -274,6 +278,8 @@ static void tb_handle_hotplug(struct work_struct *work)
>  out:
>         mutex_unlock(&tb->lock);
>         kfree(ev);
> +       pm_runtime_mark_last_busy(dev);
> +       pm_runtime_put(dev);
>  }
>
>  /**
> --
> 2.7.0
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner April 24, 2016, 3:23 p.m. UTC | #5
Hi Andreas,

thank you for your valuable feedback.

On Sun, Mar 20, 2016 at 02:53:10PM +0100, Andreas Noever wrote:
> - My firmware does not provide the TRPE ACPI method, only XRPE. So
> either TRPE is only post CactusRidge or it is only present in newer
> MBPs. In any case the OS X driver looks for TRPE first and uses XRPE
> only if TRPE does not exists. I suggest we do the same (but see below
> for TRPE).

I only had the acpidump of an MBA6 (2013) available when I implemented
this and it uses TRPE. I have since been able to obtain the acpidump of
an MBP10 (2012) and you're right, it uses XRPE. Both have the same
controller, Cactus Ridge 4C. It looks like they changed this on machines
introduced 2013. It's just a rename of the method, there are no machines
which have both methods.


> - The XRIN GPE fired immediately after the power was cut. The problem
> seems to be that the controller takes a bit to shut down. The solution
> is to poll until XRIL returns 1 before activating the GPE. On "Type 2"
> devices the OS X driver polls up to 300 times with a 1ms sleep in
> between (for me 1 or 2 iterations were always enough). Afaik no
> polling is done on "Type 1" devices.

Hm, this means that the semantics of XRIN and XRIL changed on Cactus
Ridge. I have changed the behaviour to be exactly as you've specified
above, this works fine on Light Ridge and should hopefully also work
on Cactus Ridge, no distinction between Type 1 and Type 2 necessary.


> Also the OS X interrupt handler checks XRIL
> and only wakes up the device if it returns 0. This was not necessary
> to do on my model - but maybe spurious interrupts can happen with
> newer controllers?

They're doing lots of stuff which seems superfluous or needlessly
complicated, e.g. they also reset the controller upon driver load using
the XRST method (which exists only on some models). I don't think we
have to do everything exactly as they do as long as it works.
FWIW I haven't seen any spurious XRIN interrupts on Light Ridge.


> Concerning TRPE style hardware: It seems that pm is more complicated
> here. I see a bunch of references to SX* ACPI methods (SXFP, SXLV,
> SXIO) and have not yet figured out what they do. Maybe we should not
> enable PM if XRPE is not present until we find someone to test it.

But you do have the SX* methods on your machine even though it uses
XRPE, right? I've mostly figured out now what these methods are there
for and have documented them extensively in upstream.c. However I cannot
verify if my documentation is accurate as they are not present on my
machine, but perhaps you can if your machine has them.

SXLV, SXIO and SXIL exist only on Cactus Ridge machines and utilize
the Go2Sx and Ok2Go2Sx pins. Judging by the PCI quirk you've added,
it seems that a Go2Sx dance is necessary on this controller before
power is cut (either by going to S3 / S4 / S5 or by using the Force
Power pin, which is what XRPE / TRPE / SXFP do).


> As you have noted the "correct" place to but this logic would be at
> the upstream bridge. Ideally the downstream bridges should go into
> D3hot by themselves if no devices are attached. The NHI as well

In v2 it works exactly like this now:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v2

The trick is to allocate a Thunderbolt port service for the upstream
bridge which we can bind to. In fact I'm allocating such a port service
for *any* PCIe port on Thunderbolt devices, this could be useful for all
sorts of other stuff.

Binding to the upstream bridge also allows us to replace the PCI quirk
which delays resume_noirq on the downstream bridges, as demonstrated by
this experimental commit (works fine on Light Ridge but YMMV):
https://github.com/l1k/linux/commit/79e0b8b8fb5da50b63836939f75212f824d8cba7


> (did you by chance check whether the NHI can be put into D3hot without
> killing the thunderbolt tunnels?).

Amazingly this works. However the NHI does not act on hotplug events
after thunderbolt_suspend() has been called. Even without calling
thunderbolt_suspend(), it seems that the control channel is down
when the NHI is in D3hot, I'm getting RX timeouts. Also, I cannot
see any reduction in power consumption when putting the NHI in D3,
same for the downstream bridges.

You can test this for yourself by commenting out the two calls to
pm_runtime_get() and pm_runtime_put_autosuspend() in switch.c.
Plug in a Thunderbolt device, wait 10 sec for the NHI to autosuspend,
try accessing the Thunderbolt device. Works for me.

If the NHI suspended before you had a chance to plug in the device,
invoke "echo on > /sys/bus/pci/devices/0000:06:00.0/power/control".
Plug in the device and use "echo auto" to let the NHI autosuspend.


> (1) should be possible to fix? For (2): D3Cold always requires a
> platform specific mechanism and the pci subsystem only supports ACPI.
> Would it be possible to add an API to tell the pci subsystem that we
> know how to put a specific device(tree) into D3Cold from a platform
> driver [+CC Bjorn]? Then this whole thing would become a normal pci
> suspend operation.

I simply go to D3cold in the driver's ->runtime_suspend callback.
There's just one small fix necessary in pci_raw_set_power_state()
for this to work. Plus some changes in portdrv to call down to the
port service drivers on each pm transition. (It already does this
for ->suspend and ->resume, we just need the same functionality for
additional pm callbacks).


> Bridges in Hotplugged TB devices might have the same PCI ids as the
> "root" bridges (if they use the same TB chip). You probably should
> check that dev is a bridge of the builtin controller (for example by
> checking for the presence of ACPI methods, see the comment in the
> other tb quirks).

For the upstream bridge I'm checking if its parent is a root port now
to determine if it's a host controller built into the machine.

I think the only chance for a false positive is if two machines are
connected with Thunderbolt and one of them has multiple Thunderbolt
controllers built in. Might look like this:

RP - UPSB - DSB - UPSB - DSB - RP - RP - UPSB - DSB
^^^^^^^^^^^^^^^   ^^^^                   ^^^^^^^^^^
local machine     remote machine         secondary controller on remote

If the topology indeed looks like this (which I'm not sure of, I lack
the hardware to test it), a thunderbolt_upstream driver will try to
attach to UPSB on the secondary controller of the remote machine but
should bail out because it can't find an ACPI handle for its NHI.
So we should even have this corner case covered.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Noever May 1, 2016, 11:18 a.m. UTC | #6
On Sun, Apr 24, 2016 at 5:23 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Hi Andreas,
>
> thank you for your valuable feedback.
>
> On Sun, Mar 20, 2016 at 02:53:10PM +0100, Andreas Noever wrote:
>> - My firmware does not provide the TRPE ACPI method, only XRPE. So
>> either TRPE is only post CactusRidge or it is only present in newer
>> MBPs. In any case the OS X driver looks for TRPE first and uses XRPE
>> only if TRPE does not exists. I suggest we do the same (but see below
>> for TRPE).
>
> I only had the acpidump of an MBA6 (2013) available when I implemented
> this and it uses TRPE. I have since been able to obtain the acpidump of
> an MBP10 (2012) and you're right, it uses XRPE. Both have the same
> controller, Cactus Ridge 4C. It looks like they changed this on machines
> introduced 2013. It's just a rename of the method, there are no machines
> which have both methods.
>
>
>> - The XRIN GPE fired immediately after the power was cut. The problem
>> seems to be that the controller takes a bit to shut down. The solution
>> is to poll until XRIL returns 1 before activating the GPE. On "Type 2"
>> devices the OS X driver polls up to 300 times with a 1ms sleep in
>> between (for me 1 or 2 iterations were always enough). Afaik no
>> polling is done on "Type 1" devices.
>
> Hm, this means that the semantics of XRIN and XRIL changed on Cactus
> Ridge. I have changed the behaviour to be exactly as you've specified
> above, this works fine on Light Ridge and should hopefully also work
> on Cactus Ridge, no distinction between Type 1 and Type 2 necessary.

ok

>
>> Also the OS X interrupt handler checks XRIL
>> and only wakes up the device if it returns 0. This was not necessary
>> to do on my model - but maybe spurious interrupts can happen with
>> newer controllers?
>
> They're doing lots of stuff which seems superfluous or needlessly
> complicated, e.g. they also reset the controller upon driver load using
> the XRST method (which exists only on some models). I don't think we
> have to do everything exactly as they do as long as it works.
> FWIW I haven't seen any spurious XRIN interrupts on Light Ridge.
>
>
>> Concerning TRPE style hardware: It seems that pm is more complicated
>> here. I see a bunch of references to SX* ACPI methods (SXFP, SXLV,
>> SXIO) and have not yet figured out what they do. Maybe we should not
>> enable PM if XRPE is not present until we find someone to test it.
>
> But you do have the SX* methods on your machine even though it uses
> XRPE, right? I've mostly figured out now what these methods are there
> for and have documented them extensively in upstream.c. However I cannot
> verify if my documentation is accurate as they are not present on my
> machine, but perhaps you can if your machine has them.
Yes I have these methods but I have no idea what they do. Just that
they have to be called before suspend:
http://lxr.free-electrons.com/source/drivers/pci/quirks.c#L3175


> SXLV, SXIO and SXIL exist only on Cactus Ridge machines and utilize
> the Go2Sx and Ok2Go2Sx pins. Judging by the PCI quirk you've added,
> it seems that a Go2Sx dance is necessary on this controller before
> power is cut (either by going to S3 / S4 / S5 or by using the Force
> Power pin, which is what XRPE / TRPE / SXFP do).
>
>
>> As you have noted the "correct" place to but this logic would be at
>> the upstream bridge. Ideally the downstream bridges should go into
>> D3hot by themselves if no devices are attached. The NHI as well
>
> In v2 it works exactly like this now:
> https://github.com/l1k/linux/commits/thunderbolt_runpm_v2
>
> The trick is to allocate a Thunderbolt port service for the upstream
> bridge which we can bind to. In fact I'm allocating such a port service
> for *any* PCIe port on Thunderbolt devices, this could be useful for all
> sorts of other stuff.
Just tested your branch - works nicely (runtime pm, suspend and hibernate)!

> Binding to the upstream bridge also allows us to replace the PCI quirk
> which delays resume_noirq on the downstream bridges, as demonstrated by
> this experimental commit (works fine on Light Ridge but YMMV):
> https://github.com/l1k/linux/commit/79e0b8b8fb5da50b63836939f75212f824d8cba7
>
>
>> (did you by chance check whether the NHI can be put into D3hot without
>> killing the thunderbolt tunnels?).
>
> Amazingly this works. However the NHI does not act on hotplug events
> after thunderbolt_suspend() has been called. Even without calling
> thunderbolt_suspend(), it seems that the control channel is down
> when the NHI is in D3hot, I'm getting RX timeouts. Also, I cannot
> see any reduction in power consumption when putting the NHI in D3,
> same for the downstream bridges.
Interesting. Looks like the NHI is really just a a device on the tb
swicht. But then it is understandable that turning it of does not
decrease power consumption.


> You can test this for yourself by commenting out the two calls to
> pm_runtime_get() and pm_runtime_put_autosuspend() in switch.c.
> Plug in a Thunderbolt device, wait 10 sec for the NHI to autosuspend,
> try accessing the Thunderbolt device. Works for me.
>
> If the NHI suspended before you had a chance to plug in the device,
> invoke "echo on > /sys/bus/pci/devices/0000:06:00.0/power/control".
> Plug in the device and use "echo auto" to let the NHI autosuspend.
>
>
>> (1) should be possible to fix? For (2): D3Cold always requires a
>> platform specific mechanism and the pci subsystem only supports ACPI.
>> Would it be possible to add an API to tell the pci subsystem that we
>> know how to put a specific device(tree) into D3Cold from a platform
>> driver [+CC Bjorn]? Then this whole thing would become a normal pci
>> suspend operation.
>
> I simply go to D3cold in the driver's ->runtime_suspend callback.
> There's just one small fix necessary in pci_raw_set_power_state()
> for this to work. Plus some changes in portdrv to call down to the
> port service drivers on each pm transition. (It already does this
> for ->suspend and ->resume, we just need the same functionality for
> additional pm callbacks).
>
>
>> Bridges in Hotplugged TB devices might have the same PCI ids as the
>> "root" bridges (if they use the same TB chip). You probably should
>> check that dev is a bridge of the builtin controller (for example by
>> checking for the presence of ACPI methods, see the comment in the
>> other tb quirks).
>
> For the upstream bridge I'm checking if its parent is a root port now
> to determine if it's a host controller built into the machine.
> I think the only chance for a false positive is if two machines are
> connected with Thunderbolt and one of them has multiple Thunderbolt
> controllers built in. Might look like this:
>
> RP - UPSB - DSB - UPSB - DSB - RP - RP - UPSB - DSB
> ^^^^^^^^^^^^^^^   ^^^^                   ^^^^^^^^^^
> local machine     remote machine         secondary controller on remote
I don't think that it is possible to tunnel into a different machine
like that :) The root port check should be sufficient.

Best Regards
Andreas


> If the topology indeed looks like this (which I'm not sure of, I lack
> the hardware to test it), a thunderbolt_upstream driver will try to
> attach to UPSB on the secondary controller of the remote machine but
> should bail out because it can't find an ACPI handle for its NHI.
> So we should even have this corner case covered.
> Best regards,
>
> Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner May 13, 2016, 12:10 p.m. UTC | #7
Hi Alan,

On Thu, Mar 17, 2016 at 10:54:55AM -0400, Alan Stern wrote:
> On Wed, 16 Mar 2016, Lukas Wunner wrote:
> > > The way you're doing it, how does the NHI driver know when to go into 
> > > suspend?  The runtime PM core won't notify it when all the hotplugged 
> > > devices attached to the other bridges have been suspended, since it's 
> > > not their parent.
> > 
> > The NHI knows when something is plugged in, it talks to the switches
> > in devices that are hotplugged to the controller. As I've explained
> > in the lengthy comment in the middle of patch [4/4], we acquire a
> > runtime pm ref for each switch that is plugged in and release one
> > whenever a switch is unplugged.
> 
> If I understand correctly, that means you allow the Thunderbolt
> controller to go into runtime suspend only when nothing is plugged into
> any of the ports.  Is that right?  It's quite inefficient.

In the case of Thunderbolt on the Mac, runtime suspend means that the
controller is powered down. A plug event is side-band signaled using a GPE
so that we're able to power the controller up once something is plugged in.

It's not possible to power the controller down while devices are attached
because downstream devices have no way to side-band signal an interrupt
when they need to send data to the controller.


> What I'm getting at is that we should have proper runtime-PM support
> for bridges, i.e., I agree with what you wrote above.  A bridge can
> safely go into runtime suspend when there are no unsuspended devices
> attached to any of its downstream ports.  (That's how the USB hub
> driver works, for instance.)  Doing things that way would make
> everything simpler in the long run.
> 
> So my suggestion is that you change over to the "less kludgy solution"  
> and work on that instead.

Alright, posted as v2 today. :-)

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d1e3956..a007485 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,7 @@ 
 #include <linux/sched.h>
 #include <linux/ktime.h>
 #include <linux/mm.h>
+#include <linux/pm_domain.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include "pci.h"
 
@@ -3255,6 +3256,40 @@  DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL,
 DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL,
 			       PCI_DEVICE_ID_INTEL_FALCON_RIDGE_4C_BRIDGE,
 			       quirk_apple_wait_for_thunderbolt);
+
+static int bridge_prepare(struct device *dev)
+{
+	return 1; /* stay asleep if already runtime suspended */
+}
+
+static void quirk_apple_thunderbolt_runpm(struct pci_dev *dev)
+{
+	struct dev_pm_domain *bridge_pm_domain;
+
+	if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
+		return;
+	if ((dev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
+		return;
+	if (dev->dev.pm_domain)
+		return;
+
+	bridge_pm_domain = kzalloc(sizeof(*bridge_pm_domain), GFP_KERNEL);
+	if (!bridge_pm_domain) {
+		dev_err(&dev->dev, "cannot allocate pm_domain\n");
+		return;
+	}
+
+	bridge_pm_domain->ops	       = *pci_bus_type.pm;
+	bridge_pm_domain->ops.prepare  = bridge_prepare;
+	bridge_pm_domain->ops.complete = NULL;
+	dev_pm_domain_set(&dev->dev, bridge_pm_domain);
+}
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL,
+			 PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
+			 quirk_apple_thunderbolt_runpm);
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL,
+			 PCI_DEVICE_ID_INTEL_FALCON_RIDGE_4C_BRIDGE,
+			 quirk_apple_thunderbolt_runpm);
 #endif
 
 static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index c121acc..40335f7 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,6 +1,6 @@ 
 menuconfig THUNDERBOLT
 	tristate "Thunderbolt support for Apple devices"
-	depends on PCI
+	depends on PCI && ACPI
 	select CRC32
 	help
 	  Cactus Ridge Thunderbolt Controller driver
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index fa89160..964b006 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -588,6 +588,8 @@  static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 	pci_set_drvdata(pdev, tb);
 
+	nhi_runtime_pm_init(nhi);
+
 	return 0;
 }
 
@@ -595,6 +597,8 @@  static void nhi_remove(struct pci_dev *pdev)
 {
 	struct tb *tb = pci_get_drvdata(pdev);
 	struct tb_nhi *nhi = tb->nhi;
+
+	nhi_runtime_pm_fini(nhi);
 	thunderbolt_shutdown_and_free(tb);
 	nhi_shutdown(nhi);
 }
diff --git a/drivers/thunderbolt/nhi.h b/drivers/thunderbolt/nhi.h
index 3172429..dd725f7 100644
--- a/drivers/thunderbolt/nhi.h
+++ b/drivers/thunderbolt/nhi.h
@@ -7,6 +7,7 @@ 
 #ifndef DSL3510_H_
 #define DSL3510_H_
 
+#include <linux/acpi.h>
 #include <linux/mutex.h>
 #include <linux/workqueue.h>
 
@@ -25,6 +26,8 @@  struct tb_nhi {
 	struct tb_ring **rx_rings;
 	struct work_struct interrupt_work;
 	u32 hop_count; /* Number of rings (end point hops) supported by NHI. */
+	unsigned long long wake_gpe; /* Hotplug interrupt during powerdown. */
+	acpi_handle set_power; /* Method to power controller up/down. */
 };
 
 /**
diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c
index 1095ad0..cc83940 100644
--- a/drivers/thunderbolt/power.c
+++ b/drivers/thunderbolt/power.c
@@ -2,11 +2,15 @@ 
  * Thunderbolt Cactus Ridge driver - power management
  *
  * Copyright (c) 2014 Andreas Noever <andreas.noever@gmail.com>
+ * Copyright (c) 2016 Lukas Wunner <lukas@wunner.de>
  */
 
+#include <linux/delay.h>
 #include <linux/pci.h>
+#include <linux/pm_domain.h>
 #include <linux/pm_runtime.h>
 
+#include "nhi.h"
 #include "tb.h"
 
 static int nhi_suspend_noirq(struct device *dev)
@@ -39,3 +43,246 @@  const struct dev_pm_ops nhi_pm_ops = {
 					     */
 	.restore_noirq = nhi_resume_noirq,
 };
+
+/*
+ * Runtime Power Management
+ *
+ * Apple provides the following means for runtime pm in ACPI:
+ *
+ * * XRPE method (TRPE on Cactus Ridge and newer), takes argument 1 or 0,
+ *   toggles a GPIO pin to switch the controller on or off.
+ * * XRIN named object (alternatively _GPE), contains number of a GPE which
+ *   fires as long as something is plugged in (regardless of power state).
+ * * XRIL method returns 0 as long as something is plugged in, 1 otherwise.
+ * * XRIP + XRIO methods, unused by OS X driver. (Flip interrupt polarity?)
+ *
+ * If there are multiple Thunderbolt controllers (e.g. MacPro6,1), each NHI
+ * device has a separate XRIN GPE and separate instances of these methods.
+ *
+ * We acquire a runtime pm ref for each newly allocated switch (except for
+ * the root switch) and drop one when a switch is freed. The controller is
+ * thus powered up as long as something is plugged in. This behaviour is
+ * identical to the OS X driver.
+ *
+ * Powering the controller down is almost instantaneous, but powering up takes
+ * about 2 sec. To handle situations gracefully where a device is unplugged
+ * and immediately replaced by another one, we afford a grace period of 10 sec
+ * before powering down. This autosuspend_delay_ms may be reduced to 0 via
+ * sysfs and to handle that properly we need to wait during runtime_resume
+ * since it takes about 0.7 sec after resuming until a hotplug event appears.
+ *
+ * When the system wakes from suspend-to-RAM, the controller's power state is
+ * as it was before. However if it was powered down, calling XRPE once to power
+ * it up is not sufficient: An additional call to XRPE is necessary to reset
+ * the power switch first.
+ */
+
+static int nhi_prepare(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct tb *tb = pci_get_drvdata(pdev);
+	acpi_status res;
+
+	if (pm_runtime_active(dev))
+		return 0;
+
+	res = acpi_disable_gpe(NULL, tb->nhi->wake_gpe);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot disable wake GPE, resuming\n");
+		return 0;
+	} else
+		return 1; /* stay asleep if already runtime suspended */
+}
+
+static void nhi_complete(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct tb *tb = pci_get_drvdata(pdev);
+	acpi_status res;
+
+	if (pm_runtime_active(dev))
+		return;
+
+	tb_info(tb, "resetting power switch\n");
+	res = acpi_execute_simple_method(tb->nhi->set_power, NULL, 0);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot call set_power method\n");
+		dev->power.runtime_error = -ENODEV;
+	}
+
+	res = acpi_enable_gpe(NULL, tb->nhi->wake_gpe);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot enable wake GPE, resuming\n");
+		pm_request_resume(dev);
+	}
+}
+
+static int pci_save_state_cb(struct pci_dev *pdev, void *ptr)
+{
+	pci_save_state(pdev);
+	if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
+		pm_runtime_disable(&pdev->dev);
+		pm_runtime_set_suspended(&pdev->dev);
+		pm_runtime_enable(&pdev->dev);
+	}
+	pdev->current_state = PCI_D3cold;
+	return 0;
+}
+
+static int pci_restore_state_cb(struct pci_dev *pdev, void *ptr)
+{
+	pdev->current_state = PCI_D0;
+	if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
+		pm_runtime_disable(&pdev->dev);
+		pm_runtime_set_active(&pdev->dev);
+		pm_runtime_enable(&pdev->dev);
+	}
+	pci_restore_state(pdev);
+	return 0;
+}
+
+static int nhi_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_bus *upstream_bridge = pdev->bus->parent->parent;
+	struct tb *tb = pci_get_drvdata(pdev);
+	acpi_status res;
+
+	if (!pdev->d3cold_allowed)
+		return -EAGAIN;
+
+	thunderbolt_suspend(tb);
+	pci_walk_bus(upstream_bridge, pci_save_state_cb, NULL);
+
+	tb_info(tb, "powering down\n");
+	res = acpi_execute_simple_method(tb->nhi->set_power, NULL, 0);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot call set_power method, resuming\n");
+		goto err;
+	}
+
+	res = acpi_enable_gpe(NULL, tb->nhi->wake_gpe);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot enable wake GPE, resuming\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	acpi_execute_simple_method(tb->nhi->set_power, NULL, 1);
+	pci_walk_bus(upstream_bridge, pci_restore_state_cb, NULL);
+	thunderbolt_resume(tb);
+	return -EAGAIN;
+}
+
+static int nhi_runtime_resume(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_bus *upstream_bridge = pdev->bus->parent->parent;
+	struct tb *tb = pci_get_drvdata(pdev);
+	acpi_status res;
+
+	if (system_state >= SYSTEM_HALT)
+		return -ESHUTDOWN;
+
+	res = acpi_disable_gpe(NULL, tb->nhi->wake_gpe);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot disable wake GPE, disabling runtime pm\n");
+		pm_runtime_disable(dev);
+	}
+
+	tb_info(tb, "powering up\n");
+	res = acpi_execute_simple_method(tb->nhi->set_power, NULL, 1);
+	if (ACPI_FAILURE(res)) {
+		dev_err(dev, "cannot call set_power method\n");
+		return -ENODEV;
+	}
+
+	pci_walk_bus(upstream_bridge, pci_restore_state_cb, NULL);
+	thunderbolt_resume(tb);
+	msleep(1500); /* allow 1.5 sec for hotplug event to arrive */
+	pm_runtime_mark_last_busy(dev);
+
+	return 0;
+}
+
+static u32 nhi_runtime_wake(acpi_handle gpe_device, u32 gpe_number, void *ctx)
+{
+	struct device *dev = ctx;
+	WARN_ON(pm_request_resume(dev) < 0);
+	return ACPI_INTERRUPT_HANDLED;
+}
+
+static struct dev_pm_domain nhi_pm_domain;
+
+void nhi_runtime_pm_init(struct tb_nhi *nhi)
+{
+	struct device *dev = &nhi->pdev->dev;
+	struct acpi_handle *nhi_handle = ACPI_HANDLE(dev);
+	acpi_status res;
+
+	/* gen 1 controllers use XRPE, gen 2+ controllers use TRPE */
+	if (nhi->pdev->device <= PCI_DEVICE_ID_INTEL_EAGLE_RIDGE)
+		res = acpi_get_handle(nhi_handle, "XRPE", &nhi->set_power);
+	else
+		res = acpi_get_handle(nhi_handle, "TRPE", &nhi->set_power);
+	if (ACPI_FAILURE(res)) {
+		dev_warn(dev, "cannot find set_power method, disabling runtime pm\n");
+		goto err;
+	}
+
+	res = acpi_evaluate_integer(nhi_handle, "XRIN", NULL, &nhi->wake_gpe);
+	if (ACPI_FAILURE(res)) {
+		dev_warn(dev, "cannot find wake GPE, disabling runtime pm\n");
+		goto err;
+	}
+
+	res = acpi_install_gpe_handler(NULL, nhi->wake_gpe,
+				       ACPI_GPE_LEVEL_TRIGGERED,
+				       nhi_runtime_wake, dev);
+	if (ACPI_FAILURE(res)) {
+		dev_warn(dev, "cannot install GPE handler, disabling runtime pm\n");
+		goto err;
+	}
+
+	nhi_pm_domain.ops		  = *pci_bus_type.pm;
+	nhi_pm_domain.ops.prepare	  = nhi_prepare;
+	nhi_pm_domain.ops.complete	  = nhi_complete;
+	nhi_pm_domain.ops.runtime_suspend = nhi_runtime_suspend;
+	nhi_pm_domain.ops.runtime_resume  = nhi_runtime_resume;
+	dev_pm_domain_set(dev, &nhi_pm_domain);
+
+	/* apply to upstream bridge and downstream bridge 0 */
+	pm_suspend_ignore_children(dev->parent->parent, true);
+	pm_suspend_ignore_children(dev->parent, true);
+
+	pm_runtime_allow(dev);
+	pm_runtime_set_autosuspend_delay(dev, 10000);
+	pm_runtime_use_autosuspend(dev);
+	pm_runtime_mark_last_busy(dev);
+	pm_runtime_put(dev);
+	return;
+
+err:
+	nhi->wake_gpe = -1;
+	if (pm_runtime_enabled(dev))
+		pm_runtime_disable(dev);
+}
+
+void nhi_runtime_pm_fini(struct tb_nhi *nhi)
+{
+	struct device *dev = &nhi->pdev->dev;
+	acpi_status res;
+
+	if (nhi->wake_gpe == -1)
+		return;
+
+	res = acpi_remove_gpe_handler(NULL, nhi->wake_gpe, nhi_runtime_wake);
+	if (ACPI_FAILURE(res))
+		dev_warn(dev, "cannot remove GPE handler\n");
+
+	pm_runtime_get(dev);
+	pm_runtime_forbid(dev);
+	dev_pm_domain_set(dev, NULL);
+}
diff --git a/drivers/thunderbolt/power.h b/drivers/thunderbolt/power.h
index 99cb900..4fc836d 100644
--- a/drivers/thunderbolt/power.h
+++ b/drivers/thunderbolt/power.h
@@ -11,4 +11,7 @@ 
 
 extern const struct dev_pm_ops nhi_pm_ops;
 
+void nhi_runtime_pm_fini(struct tb_nhi *nhi);
+void nhi_runtime_pm_init(struct tb_nhi *nhi);
+
 #endif
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index c6270f0..e9be3d5 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -5,6 +5,7 @@ 
  */
 
 #include <linux/delay.h>
+#include <linux/pm_runtime.h>
 #include <linux/slab.h>
 
 #include "tb.h"
@@ -326,6 +327,11 @@  void tb_switch_free(struct tb_switch *sw)
 	if (!sw->is_unplugged)
 		tb_plug_events_active(sw, false);
 
+	if (sw != sw->tb->root_switch) {
+		pm_runtime_mark_last_busy(&sw->tb->nhi->pdev->dev);
+		pm_runtime_put(&sw->tb->nhi->pdev->dev);
+	}
+
 	kfree(sw->ports);
 	kfree(sw->drom);
 	kfree(sw);
@@ -417,6 +423,9 @@  struct tb_switch *tb_switch_alloc(struct tb *tb, u64 route)
 	if (tb_plug_events_active(sw, true))
 		goto err;
 
+	if (tb->root_switch)
+		pm_runtime_get(&tb->nhi->pdev->dev);
+
 	return sw;
 err:
 	kfree(sw->ports);
diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index 24b6d30..c33d3f1 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -7,6 +7,7 @@ 
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/delay.h>
+#include <linux/pm_runtime.h>
 
 #include "tb.h"
 #include "tb_regs.h"
@@ -217,8 +218,11 @@  static void tb_handle_hotplug(struct work_struct *work)
 {
 	struct tb_hotplug_event *ev = container_of(work, typeof(*ev), work);
 	struct tb *tb = ev->tb;
+	struct device *dev = &tb->nhi->pdev->dev;
 	struct tb_switch *sw;
 	struct tb_port *port;
+
+	pm_runtime_get(dev);
 	mutex_lock(&tb->lock);
 	if (!tb->hotplug_active)
 		goto out; /* during init, suspend or shutdown */
@@ -274,6 +278,8 @@  static void tb_handle_hotplug(struct work_struct *work)
 out:
 	mutex_unlock(&tb->lock);
 	kfree(ev);
+	pm_runtime_mark_last_busy(dev);
+	pm_runtime_put(dev);
 }
 
 /**