diff mbox

PCI: Wait for 50ms after bridge is powered up

Message ID 1464188696-25782-1-git-send-email-mika.westerberg@linux.intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Mika Westerberg May 25, 2016, 3:04 p.m. UTC
The PCI PM 1.2 specification requires minimum of 50ms before function on a
bus is accessed after the bus is transitioned from B2 to B0. Now that we
actually power down bridges we should make sure the specification is
followed accordingly.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
Hi Bjorn,

Since this is only needed when bridges are powered down, I think it makes
sense to put this to your pci/pm branch.

 drivers/pci/pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Rafael J. Wysocki May 25, 2016, 8:44 p.m. UTC | #1
On Wed, May 25, 2016 at 5:04 PM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> The PCI PM 1.2 specification requires minimum of 50ms before function on a
> bus is accessed after the bus is transitioned from B2 to B0. Now that we
> actually power down bridges we should make sure the specification is
> followed accordingly.
>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

ACK

> ---
> Hi Bjorn,
>
> Since this is only needed when bridges are powered down, I think it makes
> sense to put this to your pci/pm branch.
>
>  drivers/pci/pci.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e785dc260e72..e645a3d4f2e0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2361,7 +2361,12 @@ void pci_pm_init(struct pci_dev *dev)
>         }
>
>         dev->pm_cap = pm;
> -       dev->d3_delay = PCI_PM_D3_WAIT;
> +       /*
> +        * PCI PM 1.2 specification requires minimum of 50ms before any
> +        * function on the bus is accessed after the bus is transitioned
> +        * from B2 to B0.
> +        */
> +       dev->d3_delay = pci_is_bridge(dev) ? PCI_PM_BUS_WAIT : PCI_PM_D3_WAIT;
>         dev->d3cold_delay = PCI_PM_D3COLD_WAIT;
>         dev->bridge_d3 = pci_bridge_d3_possible(dev);
>         dev->d3cold_allowed = true;
> --
> 2.8.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner May 26, 2016, 10:10 a.m. UTC | #2
Hi,

On Wed, May 25, 2016 at 06:04:56PM +0300, Mika Westerberg wrote:
> The PCI PM 1.2 specification requires minimum of 50ms before function on a
> bus is accessed after the bus is transitioned from B2 to B0. Now that we
> actually power down bridges we should make sure the specification is
> followed accordingly.

This patch has the unfortunate side effect of increasing boot time on
Macs with Thunderbolt by 320 ms. Granted, it's not much, but still
noticeable:

[    2.358041] pcieport 0000:06:03.0: enabling device (0000 -> 0002)
[    2.358195] pcieport 0000:06:03.0: rpm_idle
[    2.358222] pcieport 0000:06:03.0: rpm_suspend
[    2.411664] pcieport 0000:06:04.0: enabling device (0000 -> 0003)
[    2.411821] pcieport 0000:06:04.0: rpm_idle
[    2.411848] pcieport 0000:06:04.0: rpm_suspend
[    2.467664] pcieport 0000:06:05.0: enabling device (0000 -> 0003)
[    2.467817] pcieport 0000:06:05.0: rpm_idle
[    2.467843] pcieport 0000:06:05.0: rpm_suspend
[    2.523664] pcieport 0000:06:06.0: enabling device (0000 -> 0002)
[    2.523822] pcieport 0000:06:06.0: rpm_idle
[    2.523848] pcieport 0000:06:06.0: rpm_suspend
[    2.579685] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    2.579722] pcieport 0000:06:03.0: rpm_resume
[    2.635750] pciehp 0000:06:03.0:pcie204: Slot #3 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
[    2.635853] pciehp 0000:06:03.0:pcie204: service driver pciehp loaded
[    2.635858] pcieport 0000:06:03.0: rpm_idle
[    2.635886] pcieport 0000:06:04.0: rpm_resume
[    2.647645] pcieport 0000:06:03.0: rpm_suspend
[    2.691747] pciehp 0000:06:04.0:pcie204: Slot #4 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
[    2.691856] pciehp 0000:06:04.0:pcie204: service driver pciehp loaded
[    2.691859] pcieport 0000:06:04.0: rpm_idle
[    2.691888] pcieport 0000:06:05.0: rpm_resume
[    2.703649] pcieport 0000:06:04.0: rpm_suspend
[    2.747748] pciehp 0000:06:05.0:pcie204: Slot #5 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
[    2.747842] pciehp 0000:06:05.0:pcie204: service driver pciehp loaded
[    2.747845] pcieport 0000:06:06.0: rpm_resume
[    2.749213] pcieport 0000:06:05.0: rpm_idle
[    2.759650] pcieport 0000:06:05.0: rpm_suspend
[    2.805049] pciehp 0000:06:06.0:pcie204: Slot #6 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
[    2.806464] pciehp 0000:06:06.0:pcie204: service driver pciehp loaded
[    2.806471] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    2.807876] intel_idle: MWAIT substates: 0x21120
[    2.807878] intel_idle: v0.4.1 model 0x3A
[    2.808170] intel_idle: lapic_timer_reliable_states 0xffffffff
[    2.808201] GHES: HEST is not enabled!
[    2.809613] pcieport 0000:06:06.0: rpm_idle
[    2.809644] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    2.810096] Linux agpgart interface v0.103
[    2.810158] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    2.810158] AMD IOMMUv2 functionality not available on this system
[    2.816468] pcieport 0000:06:06.0: rpm_suspend

I've added debug messages whenever ->runtime_idle / _suspend / _resume
is called for a device.

As can be seen, the PCI core now waits 50 ms after ->runtime_suspend.
Also, the ports are resumed when the pciehp service driver is loaded,
adding another 50 ms delay. For four hotplug ports, that's a total of
400 ms (versus 80 ms without the patch).

I'm wondering if the delay after ->runtime_suspend is really needed. Is
this mandated by the spec? We could perhaps increase the autosuspend_delay
in pcie_portdrv_probe() slightly to prevent the port from going to sleep
between pci_enable_device() and loading the pciehp service driver.

Thanks,

Lukas

> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
> Hi Bjorn,
> 
> Since this is only needed when bridges are powered down, I think it makes
> sense to put this to your pci/pm branch.
> 
>  drivers/pci/pci.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e785dc260e72..e645a3d4f2e0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2361,7 +2361,12 @@ void pci_pm_init(struct pci_dev *dev)
>  	}
>  
>  	dev->pm_cap = pm;
> -	dev->d3_delay = PCI_PM_D3_WAIT;
> +	/*
> +	 * PCI PM 1.2 specification requires minimum of 50ms before any
> +	 * function on the bus is accessed after the bus is transitioned
> +	 * from B2 to B0.
> +	 */
> +	dev->d3_delay = pci_is_bridge(dev) ? PCI_PM_BUS_WAIT : PCI_PM_D3_WAIT;
>  	dev->d3cold_delay = PCI_PM_D3COLD_WAIT;
>  	dev->bridge_d3 = pci_bridge_d3_possible(dev);
>  	dev->d3cold_allowed = true;
> -- 
> 2.8.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg May 26, 2016, 10:25 a.m. UTC | #3
On Thu, May 26, 2016 at 12:10:06PM +0200, Lukas Wunner wrote:
> Hi,
> 
> On Wed, May 25, 2016 at 06:04:56PM +0300, Mika Westerberg wrote:
> > The PCI PM 1.2 specification requires minimum of 50ms before function on a
> > bus is accessed after the bus is transitioned from B2 to B0. Now that we
> > actually power down bridges we should make sure the specification is
> > followed accordingly.
> 
> This patch has the unfortunate side effect of increasing boot time on
> Macs with Thunderbolt by 320 ms. Granted, it's not much, but still
> noticeable:

Yes, that's the drawback.

> [    2.358041] pcieport 0000:06:03.0: enabling device (0000 -> 0002)
> [    2.358195] pcieport 0000:06:03.0: rpm_idle
> [    2.358222] pcieport 0000:06:03.0: rpm_suspend
> [    2.411664] pcieport 0000:06:04.0: enabling device (0000 -> 0003)
> [    2.411821] pcieport 0000:06:04.0: rpm_idle
> [    2.411848] pcieport 0000:06:04.0: rpm_suspend
> [    2.467664] pcieport 0000:06:05.0: enabling device (0000 -> 0003)
> [    2.467817] pcieport 0000:06:05.0: rpm_idle
> [    2.467843] pcieport 0000:06:05.0: rpm_suspend
> [    2.523664] pcieport 0000:06:06.0: enabling device (0000 -> 0002)
> [    2.523822] pcieport 0000:06:06.0: rpm_idle
> [    2.523848] pcieport 0000:06:06.0: rpm_suspend
> [    2.579685] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
> [    2.579722] pcieport 0000:06:03.0: rpm_resume
> [    2.635750] pciehp 0000:06:03.0:pcie204: Slot #3 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> [    2.635853] pciehp 0000:06:03.0:pcie204: service driver pciehp loaded
> [    2.635858] pcieport 0000:06:03.0: rpm_idle
> [    2.635886] pcieport 0000:06:04.0: rpm_resume
> [    2.647645] pcieport 0000:06:03.0: rpm_suspend
> [    2.691747] pciehp 0000:06:04.0:pcie204: Slot #4 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> [    2.691856] pciehp 0000:06:04.0:pcie204: service driver pciehp loaded
> [    2.691859] pcieport 0000:06:04.0: rpm_idle
> [    2.691888] pcieport 0000:06:05.0: rpm_resume
> [    2.703649] pcieport 0000:06:04.0: rpm_suspend
> [    2.747748] pciehp 0000:06:05.0:pcie204: Slot #5 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> [    2.747842] pciehp 0000:06:05.0:pcie204: service driver pciehp loaded
> [    2.747845] pcieport 0000:06:06.0: rpm_resume
> [    2.749213] pcieport 0000:06:05.0: rpm_idle
> [    2.759650] pcieport 0000:06:05.0: rpm_suspend
> [    2.805049] pciehp 0000:06:06.0:pcie204: Slot #6 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> [    2.806464] pciehp 0000:06:06.0:pcie204: service driver pciehp loaded
> [    2.806471] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
> [    2.807876] intel_idle: MWAIT substates: 0x21120
> [    2.807878] intel_idle: v0.4.1 model 0x3A
> [    2.808170] intel_idle: lapic_timer_reliable_states 0xffffffff
> [    2.808201] GHES: HEST is not enabled!
> [    2.809613] pcieport 0000:06:06.0: rpm_idle
> [    2.809644] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.810096] Linux agpgart interface v0.103
> [    2.810158] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
> [    2.810158] AMD IOMMUv2 functionality not available on this system
> [    2.816468] pcieport 0000:06:06.0: rpm_suspend
> 
> I've added debug messages whenever ->runtime_idle / _suspend / _resume
> is called for a device.
> 
> As can be seen, the PCI core now waits 50 ms after ->runtime_suspend.
> Also, the ports are resumed when the pciehp service driver is loaded,
> adding another 50 ms delay. For four hotplug ports, that's a total of
> 400 ms (versus 80 ms without the patch).
> 
> I'm wondering if the delay after ->runtime_suspend is really needed. Is
> this mandated by the spec? We could perhaps increase the autosuspend_delay
> in pcie_portdrv_probe() slightly to prevent the port from going to sleep
> between pci_enable_device() and loading the pciehp service driver.

The PCI PM specification version 1.2 says this in chapter 4.2 (page 37):

  There is a minimum time requirement of 50 ms which must be provided by
  system software between when the bus is switched from B2 to B0 and
  when a function on the bus is accessed to allow time for the clock to
  start up and the bus to settle.

Not sure how much of that still applies to modern hardware.

I don't see why we cannot increase autosuspend delay time, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner May 26, 2016, 10:45 a.m. UTC | #4
On Thu, May 26, 2016 at 01:25:20PM +0300, Mika Westerberg wrote:
> On Thu, May 26, 2016 at 12:10:06PM +0200, Lukas Wunner wrote:
> > On Wed, May 25, 2016 at 06:04:56PM +0300, Mika Westerberg wrote:
> > > The PCI PM 1.2 specification requires minimum of 50ms before function on a
> > > bus is accessed after the bus is transitioned from B2 to B0. Now that we
> > > actually power down bridges we should make sure the specification is
> > > followed accordingly.
> > 
> > This patch has the unfortunate side effect of increasing boot time on
> > Macs with Thunderbolt by 320 ms. Granted, it's not much, but still
> > noticeable:
> 
> Yes, that's the drawback.
> 
> > [    2.358041] pcieport 0000:06:03.0: enabling device (0000 -> 0002)
> > [    2.358195] pcieport 0000:06:03.0: rpm_idle
> > [    2.358222] pcieport 0000:06:03.0: rpm_suspend
> > [    2.411664] pcieport 0000:06:04.0: enabling device (0000 -> 0003)
> > [    2.411821] pcieport 0000:06:04.0: rpm_idle
> > [    2.411848] pcieport 0000:06:04.0: rpm_suspend
> > [    2.467664] pcieport 0000:06:05.0: enabling device (0000 -> 0003)
> > [    2.467817] pcieport 0000:06:05.0: rpm_idle
> > [    2.467843] pcieport 0000:06:05.0: rpm_suspend
> > [    2.523664] pcieport 0000:06:06.0: enabling device (0000 -> 0002)
> > [    2.523822] pcieport 0000:06:06.0: rpm_idle
> > [    2.523848] pcieport 0000:06:06.0: rpm_suspend
> > [    2.579685] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
> > [    2.579722] pcieport 0000:06:03.0: rpm_resume
> > [    2.635750] pciehp 0000:06:03.0:pcie204: Slot #3 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> > [    2.635853] pciehp 0000:06:03.0:pcie204: service driver pciehp loaded
> > [    2.635858] pcieport 0000:06:03.0: rpm_idle
> > [    2.635886] pcieport 0000:06:04.0: rpm_resume
> > [    2.647645] pcieport 0000:06:03.0: rpm_suspend
> > [    2.691747] pciehp 0000:06:04.0:pcie204: Slot #4 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> > [    2.691856] pciehp 0000:06:04.0:pcie204: service driver pciehp loaded
> > [    2.691859] pcieport 0000:06:04.0: rpm_idle
> > [    2.691888] pcieport 0000:06:05.0: rpm_resume
> > [    2.703649] pcieport 0000:06:04.0: rpm_suspend
> > [    2.747748] pciehp 0000:06:05.0:pcie204: Slot #5 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> > [    2.747842] pciehp 0000:06:05.0:pcie204: service driver pciehp loaded
> > [    2.747845] pcieport 0000:06:06.0: rpm_resume
> > [    2.749213] pcieport 0000:06:05.0: rpm_idle
> > [    2.759650] pcieport 0000:06:05.0: rpm_suspend
> > [    2.805049] pciehp 0000:06:06.0:pcie204: Slot #6 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
> > [    2.806464] pciehp 0000:06:06.0:pcie204: service driver pciehp loaded
> > [    2.806471] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
> > [    2.807876] intel_idle: MWAIT substates: 0x21120
> > [    2.807878] intel_idle: v0.4.1 model 0x3A
> > [    2.808170] intel_idle: lapic_timer_reliable_states 0xffffffff
> > [    2.808201] GHES: HEST is not enabled!
> > [    2.809613] pcieport 0000:06:06.0: rpm_idle
> > [    2.809644] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > [    2.810096] Linux agpgart interface v0.103
> > [    2.810158] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
> > [    2.810158] AMD IOMMUv2 functionality not available on this system
> > [    2.816468] pcieport 0000:06:06.0: rpm_suspend
> > 
> > I've added debug messages whenever ->runtime_idle / _suspend / _resume
> > is called for a device.
> > 
> > As can be seen, the PCI core now waits 50 ms after ->runtime_suspend.
> > Also, the ports are resumed when the pciehp service driver is loaded,
> > adding another 50 ms delay. For four hotplug ports, that's a total of
> > 400 ms (versus 80 ms without the patch).
> > 
> > I'm wondering if the delay after ->runtime_suspend is really needed. Is
> > this mandated by the spec? We could perhaps increase the autosuspend_delay
> > in pcie_portdrv_probe() slightly to prevent the port from going to sleep
> > between pci_enable_device() and loading the pciehp service driver.
> 
> The PCI PM specification version 1.2 says this in chapter 4.2 (page 37):
> 
>   There is a minimum time requirement of 50 ms which must be provided by
>   system software between when the bus is switched from B2 to B0 and
>   when a function on the bus is accessed to allow time for the clock to
>   start up and the bus to settle.

But why do we wait 50 ms when *suspending*, i.e. going from B0 to B2?

(Assuming B2 is the state when the bridge goes to D3hot, which I'm not
sure of. The spec says that the bus state may optionally be B3 if the
bridge is in D3hot.)


> Not sure how much of that still applies to modern hardware.

Could you ask hardware engineers at Intel what the bus state is on
modern chipsets (say, ILK or newer) and Thunderbolt ports to clarify
this?

Thanks,

Lukas

>
> I don't see why we cannot increase autosuspend delay time, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg May 26, 2016, 11:03 a.m. UTC | #5
On Thu, May 26, 2016 at 12:45:57PM +0200, Lukas Wunner wrote:
> > The PCI PM specification version 1.2 says this in chapter 4.2 (page 37):
> > 
> >   There is a minimum time requirement of 50 ms which must be provided by
> >   system software between when the bus is switched from B2 to B0 and
> >   when a function on the bus is accessed to allow time for the clock to
> >   start up and the bus to settle.
> 
> But why do we wait 50 ms when *suspending*, i.e. going from B0 to B2?

I guess because PCI requires delays of 10ms for both directions D0 <->
D3hot (see pci_raw_set_power_state()).

> (Assuming B2 is the state when the bridge goes to D3hot, which I'm not
> sure of. The spec says that the bus state may optionally be B3 if the
> bridge is in D3hot.)

B3 is the state where the bus goes when it's power is removed so I would
expect that to require also the 50ms even though the spec does not
explicitly say so.

> > Not sure how much of that still applies to modern hardware.
> 
> Could you ask hardware engineers at Intel what the bus state is on
> modern chipsets (say, ILK or newer) and Thunderbolt ports to clarify
> this?

I can try but it is not always easy to find the right person in company
as big as Intel.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 28, 2016, 12:29 p.m. UTC | #6
On Thursday, May 26, 2016 02:03:08 PM Mika Westerberg wrote:
> On Thu, May 26, 2016 at 12:45:57PM +0200, Lukas Wunner wrote:
> > > The PCI PM specification version 1.2 says this in chapter 4.2 (page 37):
> > > 
> > >   There is a minimum time requirement of 50 ms which must be provided by
> > >   system software between when the bus is switched from B2 to B0 and
> > >   when a function on the bus is accessed to allow time for the clock to
> > >   start up and the bus to settle.
> > 
> > But why do we wait 50 ms when *suspending*, i.e. going from B0 to B2?
> 
> I guess because PCI requires delays of 10ms for both directions D0 <->
> D3hot (see pci_raw_set_power_state()).
> 
> > (Assuming B2 is the state when the bridge goes to D3hot, which I'm not
> > sure of. The spec says that the bus state may optionally be B3 if the
> > bridge is in D3hot.)
> 
> B3 is the state where the bus goes when it's power is removed so I would
> expect that to require also the 50ms even though the spec does not
> explicitly say so.
> 
> > > Not sure how much of that still applies to modern hardware.
> > 
> > Could you ask hardware engineers at Intel what the bus state is on
> > modern chipsets (say, ILK or newer) and Thunderbolt ports to clarify
> > this?
> 
> I can try but it is not always easy to find the right person in company
> as big as Intel.

Well, even if you find someone to tell you that, what about non-Intel?

Are we going to ever know a value that's going to work for everybody unless
that value is clearly stated in the spec?

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg May 30, 2016, 9:33 a.m. UTC | #7
On Sat, May 28, 2016 at 02:29:06PM +0200, Rafael J. Wysocki wrote:
> On Thursday, May 26, 2016 02:03:08 PM Mika Westerberg wrote:
> > On Thu, May 26, 2016 at 12:45:57PM +0200, Lukas Wunner wrote:
> > > > The PCI PM specification version 1.2 says this in chapter 4.2 (page 37):
> > > > 
> > > >   There is a minimum time requirement of 50 ms which must be provided by
> > > >   system software between when the bus is switched from B2 to B0 and
> > > >   when a function on the bus is accessed to allow time for the clock to
> > > >   start up and the bus to settle.
> > > 
> > > But why do we wait 50 ms when *suspending*, i.e. going from B0 to B2?
> > 
> > I guess because PCI requires delays of 10ms for both directions D0 <->
> > D3hot (see pci_raw_set_power_state()).
> > 
> > > (Assuming B2 is the state when the bridge goes to D3hot, which I'm not
> > > sure of. The spec says that the bus state may optionally be B3 if the
> > > bridge is in D3hot.)
> > 
> > B3 is the state where the bus goes when it's power is removed so I would
> > expect that to require also the 50ms even though the spec does not
> > explicitly say so.
> > 
> > > > Not sure how much of that still applies to modern hardware.
> > > 
> > > Could you ask hardware engineers at Intel what the bus state is on
> > > modern chipsets (say, ILK or newer) and Thunderbolt ports to clarify
> > > this?
> > 
> > I can try but it is not always easy to find the right person in company
> > as big as Intel.
> 
> Well, even if you find someone to tell you that, what about non-Intel?

Indeed, I was hoping to find someone who knows more about PCIe in
general and it turns out I found one :-)

> Are we going to ever know a value that's going to work for everybody unless
> that value is clearly stated in the spec?

That is a good question and the person I managed to find tells me that
the values are already in the spec.

So there are really no B-states in PCIe. Closest thing is L-states which
are defined in the PCIe spec. The spec has two timing limitations:

 - After bringing the device to D0/L0 from D3hot/L1 there is a
   required 10ms delay after the write to PMCSR before any other config
   space access (5.3.1.4 in PCIe spec 3.1a).

 - After bringing the device out of reset, which is the path to D0/L0
   from D3cold, there is a required 100ms delay before accessing the
   device's config space (6.6.1 in PCIe spec 3.1a).

I also learned that both of these can be shortened with following
mechanisms:

 - PCIe readines notifications (6.23 in PCIe spec 3.1a)
 - ACPI _DSM method returning readines durations from  (4.6.9 in PCI
   firmware spec 3.2).

I think it is actually worth looking into the two mechanisms and try to
get Linux support them.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg May 30, 2016, 2:44 p.m. UTC | #8
On Mon, May 30, 2016 at 12:33:26PM +0300, Mika Westerberg wrote:
> I also learned that both of these can be shortened with following
> mechanisms:
> 
>  - PCIe readines notifications (6.23 in PCIe spec 3.1a)
>  - ACPI _DSM method returning readines durations from  (4.6.9 in PCI
>    firmware spec 3.2).

BTW, looks like the latter is already implemented in
pci_acpi_optimize_delay().
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Noever May 30, 2016, 3:19 p.m. UTC | #9
On Mon, May 30, 2016 at 4:44 PM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> On Mon, May 30, 2016 at 12:33:26PM +0300, Mika Westerberg wrote:
>> I also learned that both of these can be shortened with following
>> mechanisms:
>>
>>  - PCIe readines notifications (6.23 in PCIe spec 3.1a)
>>  - ACPI _DSM method returning readines durations from  (4.6.9 in PCI
>>    firmware spec 3.2).
>
> BTW, looks like the latter is already implemented in
> pci_acpi_optimize_delay().


Hmm, Lukas's trace suggest a few independent problems (or optimisations):
1. Those devices are siblings, we should wake all of them in parallel
and then wait once instead of one by one.
2. Why are we waiting after _suspend at all? It seems we should only
wait before the next access. Sleeping after _suspend looks like a stop
gap measure to guarantee that no accesses take place?
3. The idle timeout is too low, there should be no suspend between
discovery and probing.
4. Even if the spec says 50ms, we might still want to have shorter
sleep times for known good devices. Thunderbolt can produce PCI
hierarchies which are 7 levels deep with 4 hp bridges on each level.
Waking all of that would take 50ms * 7 * 4 = 1400ms (not counting
upstream bridges which have the normal d3 delay).

Does TB support PCIe readines notifications?

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg May 31, 2016, 8:33 a.m. UTC | #10
On Mon, May 30, 2016 at 05:19:53PM +0200, Andreas Noever wrote:
> On Mon, May 30, 2016 at 4:44 PM, Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> > On Mon, May 30, 2016 at 12:33:26PM +0300, Mika Westerberg wrote:
> >> I also learned that both of these can be shortened with following
> >> mechanisms:
> >>
> >>  - PCIe readines notifications (6.23 in PCIe spec 3.1a)
> >>  - ACPI _DSM method returning readines durations from  (4.6.9 in PCI
> >>    firmware spec 3.2).
> >
> > BTW, looks like the latter is already implemented in
> > pci_acpi_optimize_delay().
> 
> 
> Hmm, Lukas's trace suggest a few independent problems (or optimisations):
> 1. Those devices are siblings, we should wake all of them in parallel
> and then wait once instead of one by one.
> 2. Why are we waiting after _suspend at all? It seems we should only
> wait before the next access. Sleeping after _suspend looks like a stop
> gap measure to guarantee that no accesses take place?

I guess it is because PCI PM spec says that the delay needs to be in
both directions. See table 5-6 in PCI PM spec v1.2.

The code in question is in pci_raw_set_power_state():

        /* Mandatory power management transition delays */
        /* see PCI PM 1.1 5.6.1 table 18 */
        if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
                pci_dev_d3_sleep(dev);
        else if (state == PCI_D2 || dev->current_state == PCI_D2)
                udelay(PCI_PM_D2_DELAY);

> 3. The idle timeout is too low, there should be no suspend between
> discovery and probing.

I agree. We took the 10ms from the original patch but I don't see why we
could not increase it to 100ms or even 500ms.

> 4. Even if the spec says 50ms, we might still want to have shorter
> sleep times for known good devices. Thunderbolt can produce PCI
> hierarchies which are 7 levels deep with 4 hp bridges on each level.
> Waking all of that would take 50ms * 7 * 4 = 1400ms (not counting
> upstream bridges which have the normal d3 delay).

PCIe spec want to have 100ms delay from when transitioning from D3cold
to D0 and we already do that in __pci_start_power_transition().

In other words we should have all necessary delays for PCIe in place
already. This patch should not be needed.

For decreasing delays we already support ACPI _DSM method. What is
missing is PCIe readines notification support which can be added
separately.

> Does TB support PCIe readines notifications?

It seems to be pretty recent feature so it might not be supported by the
existing TBT hardware.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg May 31, 2016, 8:58 a.m. UTC | #11
On Tue, May 31, 2016 at 11:33:49AM +0300, Mika Westerberg wrote:
> On Mon, May 30, 2016 at 05:19:53PM +0200, Andreas Noever wrote:
> > On Mon, May 30, 2016 at 4:44 PM, Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > > On Mon, May 30, 2016 at 12:33:26PM +0300, Mika Westerberg wrote:
> > >> I also learned that both of these can be shortened with following
> > >> mechanisms:
> > >>
> > >>  - PCIe readines notifications (6.23 in PCIe spec 3.1a)
> > >>  - ACPI _DSM method returning readines durations from  (4.6.9 in PCI
> > >>    firmware spec 3.2).
> > >
> > > BTW, looks like the latter is already implemented in
> > > pci_acpi_optimize_delay().
> > 
> > 
> > Hmm, Lukas's trace suggest a few independent problems (or optimisations):
> > 1. Those devices are siblings, we should wake all of them in parallel
> > and then wait once instead of one by one.
> > 2. Why are we waiting after _suspend at all? It seems we should only
> > wait before the next access. Sleeping after _suspend looks like a stop
> > gap measure to guarantee that no accesses take place?
> 
> I guess it is because PCI PM spec says that the delay needs to be in
> both directions. See table 5-6 in PCI PM spec v1.2.
> 
> The code in question is in pci_raw_set_power_state():
> 
>         /* Mandatory power management transition delays */
>         /* see PCI PM 1.1 5.6.1 table 18 */
>         if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
>                 pci_dev_d3_sleep(dev);
>         else if (state == PCI_D2 || dev->current_state == PCI_D2)
>                 udelay(PCI_PM_D2_DELAY);
> 
> > 3. The idle timeout is too low, there should be no suspend between
> > discovery and probing.
> 
> I agree. We took the 10ms from the original patch but I don't see why we
> could not increase it to 100ms or even 500ms.
> 
> > 4. Even if the spec says 50ms, we might still want to have shorter
> > sleep times for known good devices. Thunderbolt can produce PCI
> > hierarchies which are 7 levels deep with 4 hp bridges on each level.
> > Waking all of that would take 50ms * 7 * 4 = 1400ms (not counting
> > upstream bridges which have the normal d3 delay).
> 
> PCIe spec want to have 100ms delay from when transitioning from D3cold
> to D0 and we already do that in __pci_start_power_transition().
> 
> In other words we should have all necessary delays for PCIe in place
> already. This patch should not be needed.

To summarize the next steps. I will send new version of the
PCI PM patches with following changes.

  - Drop this 50ms patch, we should have the PCIe 100ms delay already
    covered.

  - Increase runtime PM autosuspend time from 10ms to 500ms (or whatever
    is the prefered default).

  - Add new version of ACPI hotplug patch where pm_runtime_get/put() is
    moved into acpiphp_check_bridge().

Let me know if I'm forgetting something.

Bjorn, is this ok for you? It would be nice to get the updated series to
linux-next for wider testing :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e785dc260e72..e645a3d4f2e0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2361,7 +2361,12 @@  void pci_pm_init(struct pci_dev *dev)
 	}
 
 	dev->pm_cap = pm;
-	dev->d3_delay = PCI_PM_D3_WAIT;
+	/*
+	 * PCI PM 1.2 specification requires minimum of 50ms before any
+	 * function on the bus is accessed after the bus is transitioned
+	 * from B2 to B0.
+	 */
+	dev->d3_delay = pci_is_bridge(dev) ? PCI_PM_BUS_WAIT : PCI_PM_D3_WAIT;
 	dev->d3cold_delay = PCI_PM_D3COLD_WAIT;
 	dev->bridge_d3 = pci_bridge_d3_possible(dev);
 	dev->d3cold_allowed = true;