[RFC,v2,1/3] PCI: rockchip: Add support for pcie wake irq

Message ID	20170817120431.12398-2-jeffy.chen@rock-chips.com (mailing list archive)
State	New, archived
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> From: Jeffy Chen <jeffy.chen@rock-chips.com> To: linux-kernel@vger.kernel.org, bhelgaas@google.com Cc: shawn.lin@rock-chips.com, briannorris@chromium.org, dianders@chromium.org, Jeffy Chen <jeffy.chen@rock-chips.com>, Heiko Stuebner <heiko@sntech.de>, linux-pci@vger.kernel.org, linux-rockchip@lists.infradead.org, linux-arm-kernel@lists.infradead.org Subject: [RFC PATCH v2 1/3] PCI: rockchip: Add support for pcie wake irq Date: Thu, 17 Aug 2017 20:04:29 +0800 Message-Id: <20170817120431.12398-2-jeffy.chen@rock-chips.com> In-Reply-To: <20170817120431.12398-1-jeffy.chen@rock-chips.com> References: <20170817120431.12398-1-jeffy.chen@rock-chips.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk

Jeffy Chen Aug. 17, 2017, 12:04 p.m. UTC

Add support for PCIE_WAKE pin in rockchip pcie driver.

Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
---

Changes in v2:
Use dev_pm_set_dedicated_wake_irq
        -- Suggested by Brian Norris <briannorris@chromium.com>

 drivers/pci/host/pcie-rockchip.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Shawn Lin Aug. 18, 2017, 7:23 a.m. UTC | #1

Hi Jeffy

On 2017/8/17 20:04, Jeffy Chen wrote:
> Add support for PCIE_WAKE pin in rockchip pcie driver.
> 
> Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> ---
> 
> Changes in v2:
> Use dev_pm_set_dedicated_wake_irq
>          -- Suggested by Brian Norris <briannorris@chromium.com>
> 
>   drivers/pci/host/pcie-rockchip.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c
> index 7bb9870f6d8c..c2b973c738fe 100644
> --- a/drivers/pci/host/pcie-rockchip.c
> +++ b/drivers/pci/host/pcie-rockchip.c
> @@ -36,6 +36,7 @@
>   #include <linux/pci_ids.h>
>   #include <linux/phy/phy.h>
>   #include <linux/platform_device.h>
> +#include <linux/pm_wakeirq.h>
>   #include <linux/reset.h>
>   #include <linux/regmap.h>
>   
> @@ -853,7 +854,6 @@ static void rockchip_pcie_legacy_int_handler(struct irq_desc *desc)
>   	chained_irq_exit(chip, desc);
>   }
>   
> -
>   /**
>    * rockchip_pcie_parse_dt - Parse Device Tree
>    * @rockchip: PCIe port information
> @@ -1018,6 +1018,14 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
>   		return err;
>   	}
>   
> +	device_init_wakeup(dev, true);
> +	irq = platform_get_irq_byname(pdev, "wake");
> +	if (irq >= 0) {
> +		err = dev_pm_set_dedicated_wake_irq(dev, irq);
> +		if (err)
> +			dev_err(dev, "failed to setup PCIe wake IRQ\n");
> +	}
> +
>   	rockchip->vpcie3v3 = devm_regulator_get_optional(dev, "vpcie3v3");
>   	if (IS_ERR(rockchip->vpcie3v3)) {
>   		if (PTR_ERR(rockchip->vpcie3v3) == -EPROBE_DEFER)
> @@ -1524,6 +1532,9 @@ static int rockchip_pcie_remove(struct platform_device *pdev)
>   	struct device *dev = &pdev->dev;
>   	struct rockchip_pcie *rockchip = dev_get_drvdata(dev);
>   
> +	dev_pm_clear_wake_irq(dev);
> +	device_init_wakeup(dev, false);
> +

Looks good overall but I think we need this on the
error handling path of rockchip_pcie_probe as well?

>   	pci_stop_root_bus(rockchip->root_bus);
>   	pci_remove_root_bus(rockchip->root_bus);
>   	pci_unmap_iospace(rockchip->io);
>

Jeffy Chen Aug. 18, 2017, 8:32 a.m. UTC | #2

Hi Shawn,

On 08/18/2017 03:23 PM, Shawn Lin wrote:
>>
>> @@ -1524,6 +1532,9 @@ static int rockchip_pcie_remove(struct
>> platform_device *pdev)
>>       struct device *dev = &pdev->dev;
>>       struct rockchip_pcie *rockchip = dev_get_drvdata(dev);
>> +    dev_pm_clear_wake_irq(dev);
>> +    device_init_wakeup(dev, false);
>> +
>
> Looks good overall but I think we need this on the
> error handling path of rockchip_pcie_probe as well?

hmm, right, will fix it, thanks.

and i also notice some other missing error handling, will fix them too:)

Brian Norris Aug. 18, 2017, 5:01 p.m. UTC | #3

+ Tony

On Thu, Aug 17, 2017 at 08:04:29PM +0800, Jeffy Chen wrote:
> Add support for PCIE_WAKE pin in rockchip pcie driver.
> 
> Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> ---
> 
> Changes in v2:
> Use dev_pm_set_dedicated_wake_irq
>         -- Suggested by Brian Norris <briannorris@chromium.com>
> 
>  drivers/pci/host/pcie-rockchip.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c
> index 7bb9870f6d8c..c2b973c738fe 100644
> --- a/drivers/pci/host/pcie-rockchip.c
> +++ b/drivers/pci/host/pcie-rockchip.c
> @@ -36,6 +36,7 @@
>  #include <linux/pci_ids.h>
>  #include <linux/phy/phy.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_wakeirq.h>
>  #include <linux/reset.h>
>  #include <linux/regmap.h>
>  
> @@ -853,7 +854,6 @@ static void rockchip_pcie_legacy_int_handler(struct irq_desc *desc)
>  	chained_irq_exit(chip, desc);
>  }
>  
> -
>  /**
>   * rockchip_pcie_parse_dt - Parse Device Tree
>   * @rockchip: PCIe port information
> @@ -1018,6 +1018,14 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
>  		return err;
>  	}
>  
> +	device_init_wakeup(dev, true);
> +	irq = platform_get_irq_byname(pdev, "wake");
> +	if (irq >= 0) {
> +		err = dev_pm_set_dedicated_wake_irq(dev, irq);

Did you test that this works out correctly as a level-triggered
interrupt? IIUC, the dummy handler won't mask the interrupt, so it might
keep firing. See:

static irqreturn_t handle_threaded_wake_irq(int irq, void *_wirq)
{
        struct wake_irq *wirq = _wirq;
        int res;

        /* Maybe abort suspend? */
        if (irqd_is_wakeup_set(irq_get_irq_data(irq))) {
                pm_wakeup_event(wirq->dev, 0);
        
                return IRQ_HANDLED; <--- We can return here, with the trigger still asserted
        }
...

This could cause some kind of an IRQ storm, including a lockup or
significant slowdown, I think.

BTW, in another context, Tony suggested we might need to fix up the IRQ flags
like this:

int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq)
{
...
        err = request_threaded_irq(irq, NULL, handle_threaded_wake_irq,
-                                  IRQF_ONESHOT, dev_name(dev), wirq);
+                                  IRQF_ONESHOT | irq_get_trigger_type(irq), dev_name(dev), wirq);

But IIUC, that's not actually necessary, because __setup_irq()
automatically configures the trigger type if the driver didn't request
one explicitly.

Brian

> +		if (err)
> +			dev_err(dev, "failed to setup PCIe wake IRQ\n");
> +	}
> +
>  	rockchip->vpcie3v3 = devm_regulator_get_optional(dev, "vpcie3v3");
>  	if (IS_ERR(rockchip->vpcie3v3)) {
>  		if (PTR_ERR(rockchip->vpcie3v3) == -EPROBE_DEFER)
> @@ -1524,6 +1532,9 @@ static int rockchip_pcie_remove(struct platform_device *pdev)
>  	struct device *dev = &pdev->dev;
>  	struct rockchip_pcie *rockchip = dev_get_drvdata(dev);
>  
> +	dev_pm_clear_wake_irq(dev);
> +	device_init_wakeup(dev, false);
> +
>  	pci_stop_root_bus(rockchip->root_bus);
>  	pci_remove_root_bus(rockchip->root_bus);
>  	pci_unmap_iospace(rockchip->io);
> -- 
> 2.11.0
> 
>

Brian Norris Aug. 18, 2017, 5:07 p.m. UTC | #4

On Fri, Aug 18, 2017 at 10:01:07AM -0700, Brian Norris wrote:
> + Tony
> 
> On Thu, Aug 17, 2017 at 08:04:29PM +0800, Jeffy Chen wrote:
> > Add support for PCIE_WAKE pin in rockchip pcie driver.
> > 
> > Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> > ---
> > 
> > Changes in v2:
> > Use dev_pm_set_dedicated_wake_irq
> >         -- Suggested by Brian Norris <briannorris@chromium.com>

Oops, I forgot you sent a v3. Same questions/comments apply there
though.

Jeffy Chen Aug. 18, 2017, 5:47 p.m. UTC | #5

Hi Brian,

On 08/19/2017 01:01 AM, Brian Norris wrote:
> Did you test that this works out correctly as a level-triggered
> interrupt? IIUC, the dummy handler won't mask the interrupt, so it might
> keep firing. See:
>
> static irqreturn_t handle_threaded_wake_irq(int irq, void *_wirq)
> {
>          struct wake_irq *wirq = _wirq;
>          int res;
>
>          /* Maybe abort suspend? */
>          if (irqd_is_wakeup_set(irq_get_irq_data(irq))) {
>                  pm_wakeup_event(wirq->dev, 0);
>
>                  return IRQ_HANDLED; <--- We can return here, with the trigger still asserted
>          }
> ...
>
> This could cause some kind of an IRQ storm, including a lockup or
> significant slowdown, I think.

hmmm, right, but as i replied at cros partner issue, this irq handle 
might not be called actually...

in my test on cros 4.4 kernel, it would break by irq_may_run(returning 
false):
static bool irq_may_run(struct irq_desc *desc)
{
...
         /*
          * If the interrupt is an armed wakeup source, mark it pending
          * and suspended, disable it and notify the pm core about the
          * event.
          */
         if (irq_pm_check_wakeup(desc))
                 return false;


bool irq_pm_check_wakeup(struct irq_desc *desc)
{
         if (irqd_is_wakeup_armed(&desc->irq_data)) {
                 irqd_clear(&desc->irq_data, IRQD_WAKEUP_ARMED);
                 desc->istate |= IRQS_SUSPENDED | IRQS_PENDING;
                 desc->depth++;
                 irq_disable(desc); <--- disabled here
                 pm_system_irq_wakeup(irq_desc_get_irq(desc));
                 return true;



and for irqd_is_wakeup_armed:

static bool suspend_device_irq(struct irq_desc *desc)
{
...
         if (irqd_is_wakeup_set(&desc->irq_data)) {
                 irqd_set(&desc->irq_data, IRQD_WAKEUP_ARMED); <-- set 
irqd_is_wakeup_armed here


void dpm_noirq_begin(void)
{
         cpuidle_pause();
         device_wakeup_arm_wake_irqs();
         suspend_device_irqs();


so unless we get an irq between device_wakeup_arm_wake_irqs and 
suspend_device_irq, the irq_pm_check_wakeup would not let us get to 
handle_threaded_wake_irq...


>
> BTW, in another context, Tony suggested we might need to fix up the IRQ flags
> like this:
>
> int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq)
> {
> ...
>          err = request_threaded_irq(irq, NULL, handle_threaded_wake_irq,
> -                                  IRQF_ONESHOT, dev_name(dev), wirq);
> +                                  IRQF_ONESHOT | irq_get_trigger_type(irq), dev_name(dev), wirq);
>
> But IIUC, that's not actually necessary, because __setup_irq()
> automatically configures the trigger type if the driver didn't request
> one explicitly.

actually this would not work...irq_get_trigger_type would return zero 
due to a bug which we have a patch for it already:

9908207 New          [tip:irq/urgent] genirq: Restore trigger settings 
in irq_modify_status()


BTW, using dev_name for the name of this wake irq seems not very 
convenient...maybe add a ":wake" suffix?
>
> Brian

Tony Lindgren Aug. 18, 2017, 6:14 p.m. UTC | #6

* Brian Norris <briannorris@chromium.org> [170818 10:01]:
> + Tony
> 
> On Thu, Aug 17, 2017 at 08:04:29PM +0800, Jeffy Chen wrote:
> > Add support for PCIE_WAKE pin in rockchip pcie driver.
> > 
> > Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> > ---
> > 
> > Changes in v2:
> > Use dev_pm_set_dedicated_wake_irq
> >         -- Suggested by Brian Norris <briannorris@chromium.com>
> > 
> >  drivers/pci/host/pcie-rockchip.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c
> > index 7bb9870f6d8c..c2b973c738fe 100644
> > --- a/drivers/pci/host/pcie-rockchip.c
> > +++ b/drivers/pci/host/pcie-rockchip.c
> > @@ -36,6 +36,7 @@
> >  #include <linux/pci_ids.h>
> >  #include <linux/phy/phy.h>
> >  #include <linux/platform_device.h>
> > +#include <linux/pm_wakeirq.h>
> >  #include <linux/reset.h>
> >  #include <linux/regmap.h>
> >  
> > @@ -853,7 +854,6 @@ static void rockchip_pcie_legacy_int_handler(struct irq_desc *desc)
> >  	chained_irq_exit(chip, desc);
> >  }
> >  
> > -
> >  /**
> >   * rockchip_pcie_parse_dt - Parse Device Tree
> >   * @rockchip: PCIe port information
> > @@ -1018,6 +1018,14 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
> >  		return err;
> >  	}
> >  
> > +	device_init_wakeup(dev, true);
> > +	irq = platform_get_irq_byname(pdev, "wake");
> > +	if (irq >= 0) {
> > +		err = dev_pm_set_dedicated_wake_irq(dev, irq);
> 
> Did you test that this works out correctly as a level-triggered
> interrupt? IIUC, the dummy handler won't mask the interrupt, so it might
> keep firing. See:
> 
> static irqreturn_t handle_threaded_wake_irq(int irq, void *_wirq)
> {
>         struct wake_irq *wirq = _wirq;
>         int res;
> 
>         /* Maybe abort suspend? */
>         if (irqd_is_wakeup_set(irq_get_irq_data(irq))) {
>                 pm_wakeup_event(wirq->dev, 0);
>         
>                 return IRQ_HANDLED; <--- We can return here, with the trigger still asserted
>         }
> ...
> 
> This could cause some kind of an IRQ storm, including a lockup or
> significant slowdown, I think.

Hmm yeah that should be checked. The test cases I have are all
edge interrupts where there is no status whatsoever after the
wake-up event except which irq fired.

To me it seems that the wakeirq consumer driver should be able
to clear the level wakeirq in it's runtime_resume, or even better,
maybe all it takes is just let the consumer driver's irq handler
clear the level wakeirq when it runs after runtime_resume.

> BTW, in another context, Tony suggested we might need to fix up the IRQ flags
> like this:
> 
> int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq)
> {
> ...
>         err = request_threaded_irq(irq, NULL, handle_threaded_wake_irq,
> -                                  IRQF_ONESHOT, dev_name(dev), wirq);
> +                                  IRQF_ONESHOT | irq_get_trigger_type(irq), dev_name(dev), wirq);
> 
> But IIUC, that's not actually necessary, because __setup_irq()
> automatically configures the trigger type if the driver didn't request
> one explicitly.

OK so we should make sure that the triggering is parsed
properly when passed in from device tree.

Regards,

Tony

Tony Lindgren Aug. 18, 2017, 6:28 p.m. UTC | #7

* jeffy <jeffy.chen@rock-chips.com> [170818 11:05]:
> On 08/19/2017 01:01 AM, Brian Norris wrote:
> > BTW, in another context, Tony suggested we might need to fix up the IRQ flags
> > like this:
> > 
> > int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq)
> > {
> > ...
> >          err = request_threaded_irq(irq, NULL, handle_threaded_wake_irq,
> > -                                  IRQF_ONESHOT, dev_name(dev), wirq);
> > +                                  IRQF_ONESHOT | irq_get_trigger_type(irq), dev_name(dev), wirq);
> > 
> > But IIUC, that's not actually necessary, because __setup_irq()
> > automatically configures the trigger type if the driver didn't request
> > one explicitly.
> 
> actually this would not work...irq_get_trigger_type would return zero due to
> a bug which we have a patch for it already:
> 
> 9908207 New          [tip:irq/urgent] genirq: Restore trigger settings in
> irq_modify_status()

Thanks for that information. So it seems we can leave out the
irq_get_trigger_type() here then? Might be worth checking
that it really does get populated though :)

> BTW, using dev_name for the name of this wake irq seems not very
> convenient...maybe add a ":wake" suffix?

Good idea, will take a look.

Regards,

Tony

Jeffy Chen Aug. 18, 2017, 8:05 p.m. UTC | #8

Hi guys,

On 08/19/2017 02:14 AM, Tony Lindgren wrote:
>> static irqreturn_t handle_threaded_wake_irq(int irq, void *_wirq)
>> >{
>> >         struct wake_irq *wirq = _wirq;
>> >         int res;
>> >
>> >         /* Maybe abort suspend? */
>> >         if (irqd_is_wakeup_set(irq_get_irq_data(irq))) {
>> >                 pm_wakeup_event(wirq->dev, 0);
>> >
>> >                 return IRQ_HANDLED; <--- We can return here, with the trigger still asserted
>> >         }
>> >...
>> >
>> >This could cause some kind of an IRQ storm, including a lockup or
>> >significant slowdown, I think.
> Hmm yeah that should be checked. The test cases I have are all
> edge interrupts where there is no status whatsoever after the
> wake-up event except which irq fired.
>
> To me it seems that the wakeirq consumer driver should be able
> to clear the level wakeirq in it's runtime_resume, or even better,
> maybe all it takes is just let the consumer driver's irq handler
> clear the level wakeirq when it runs after runtime_resume.
>

i did some tests about it:
[   70.335883] device_wakeup_arm_wake_irqs <-- enable wake irq
[   70.335932] handle_threaded_wake_irq
...<--- a lot of wake irq handler log
[   70.335965] suspend_device_irq
[   70.335987] irq_pm_check_wakeup <--- wake and disable wake irq
...<--- no wake irq handler log
[   70.336173] resume_irqs <-- enable wake irq
[   70.336480] handle_threaded_wake_irq
...<--- a lot of wake irq handler log
[   70.336600] device_wakeup_disarm_wake_irqs < disable wake irq
...<--- no wake irq handler log


so it would indeed possible to get irq storm in 
device_wakeup_arm_wake_irqs to suspend_device_irq
and resume_irqs to device_wakeup_disarm_wake_irqs.


a simple workaround would be:
enable_irq_wake
suspend_device_irq
enable_irq
...irq fired, irq_pm_check_wakeup disabled irq
disable_irq
resume_irqs
disable_irq_wake




and i have a hacky patch for that, which works well:

+++ b/drivers/pci/host/pcie-rockchip.c
@@ -1308,6 +1308,8 @@ static int __maybe_unused 
rockchip_pcie_suspend_noirq(struct
device *dev)
         if (!IS_ERR(rockchip->vpcie0v9))
                 regulator_disable(rockchip->vpcie0v9);

+       dev_pm_enable_wake_irq(dev);
+
         return ret;
  }

@@ -1316,6 +1318,8 @@ static int __maybe_unused 
rockchip_pcie_resume_noirq(struct d
evice *dev)
         struct rockchip_pcie *rockchip = dev_get_drvdata(dev);
         int err;

+       dev_pm_disable_wake_irq(dev);
+


@ -323,7 +324,7 @@ void dev_pm_arm_wake_irq(struct wake_irq *wirq)
                 return;

         if (device_may_wakeup(wirq->dev)) {
-               if (wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
+               if (0 && wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
                         enable_irq(wirq->irq);

                 enable_irq_wake(wirq->irq);
@@ -345,7 +346,7 @@ void dev_pm_disarm_wake_irq(struct wake_irq *wirq)
         if (device_may_wakeup(wirq->dev)) {
                 disable_irq_wake(wirq->irq);

-               if (wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
+               if (0 && wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
                         disable_irq_nosync(wirq->irq);
         }



which is basically move enable_irq and disable_irq to driver noirq 
stages, to avoid:
1/ not disabled by irq_pm_check_wakeup when it first fired
2/ re-enabled by resume_irq when it disabled by irq_pm_check_wakeup


with that hack, i no longer saw the irq storm, and the irq handler would 
never be called:

[    9.693385] device_wakeup_arm_wake_irqs
[    9.697130] suspend_device_irq
<--- suspend noirq, enable wake irq
[    9.716569] irq_pm_check_wakeup disable the wake irq
<--- resume noirq, disable wake irq
[    9.968115] resume_irqs <-- enable wake irq(not actually enable, 
since disabled twice)
[   10.193072] device_wakeup_disarm_wake_irqs

Tony Lindgren Aug. 22, 2017, 5:26 p.m. UTC | #9

* jeffy <jeffy.chen@rock-chips.com> [170818 13:05]:
> Hi guys,
> 
> On 08/19/2017 02:14 AM, Tony Lindgren wrote:
> > > static irqreturn_t handle_threaded_wake_irq(int irq, void *_wirq)
> > > >{
> > > >         struct wake_irq *wirq = _wirq;
> > > >         int res;
> > > >
> > > >         /* Maybe abort suspend? */
> > > >         if (irqd_is_wakeup_set(irq_get_irq_data(irq))) {
> > > >                 pm_wakeup_event(wirq->dev, 0);
> > > >
> > > >                 return IRQ_HANDLED; <--- We can return here, with the trigger still asserted
> > > >         }
> > > >...
> > > >
> > > >This could cause some kind of an IRQ storm, including a lockup or
> > > >significant slowdown, I think.
> > Hmm yeah that should be checked. The test cases I have are all
> > edge interrupts where there is no status whatsoever after the
> > wake-up event except which irq fired.
> > 
> > To me it seems that the wakeirq consumer driver should be able
> > to clear the level wakeirq in it's runtime_resume, or even better,
> > maybe all it takes is just let the consumer driver's irq handler
> > clear the level wakeirq when it runs after runtime_resume.
> > 
> 
> i did some tests about it:
> [   70.335883] device_wakeup_arm_wake_irqs <-- enable wake irq
> [   70.335932] handle_threaded_wake_irq
> ...<--- a lot of wake irq handler log
> [   70.335965] suspend_device_irq
> [   70.335987] irq_pm_check_wakeup <--- wake and disable wake irq
> ...<--- no wake irq handler log
> [   70.336173] resume_irqs <-- enable wake irq
> [   70.336480] handle_threaded_wake_irq
> ...<--- a lot of wake irq handler log
> [   70.336600] device_wakeup_disarm_wake_irqs < disable wake irq
> ...<--- no wake irq handler log
> 
> 
> so it would indeed possible to get irq storm in device_wakeup_arm_wake_irqs
> to suspend_device_irq
> and resume_irqs to device_wakeup_disarm_wake_irqs.
> 
> 
> a simple workaround would be:
> enable_irq_wake
> suspend_device_irq
> enable_irq
> ...irq fired, irq_pm_check_wakeup disabled irq
> disable_irq
> resume_irqs
> disable_irq_wake
> 
> 
> 
> 
> and i have a hacky patch for that, which works well:
> 
> +++ b/drivers/pci/host/pcie-rockchip.c
> @@ -1308,6 +1308,8 @@ static int __maybe_unused
> rockchip_pcie_suspend_noirq(struct
> device *dev)
>         if (!IS_ERR(rockchip->vpcie0v9))
>                 regulator_disable(rockchip->vpcie0v9);
> 
> +       dev_pm_enable_wake_irq(dev);
> +
>         return ret;
>  }
> 
> @@ -1316,6 +1318,8 @@ static int __maybe_unused
> rockchip_pcie_resume_noirq(struct d
> evice *dev)
>         struct rockchip_pcie *rockchip = dev_get_drvdata(dev);
>         int err;
> 
> +       dev_pm_disable_wake_irq(dev);
> +
> 
> 
> @ -323,7 +324,7 @@ void dev_pm_arm_wake_irq(struct wake_irq *wirq)
>                 return;
> 
>         if (device_may_wakeup(wirq->dev)) {
> -               if (wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
> +               if (0 && wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
>                         enable_irq(wirq->irq);
> 
>                 enable_irq_wake(wirq->irq);
> @@ -345,7 +346,7 @@ void dev_pm_disarm_wake_irq(struct wake_irq *wirq)
>         if (device_may_wakeup(wirq->dev)) {
>                 disable_irq_wake(wirq->irq);
> 
> -               if (wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
> +               if (0 && wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED)
>                         disable_irq_nosync(wirq->irq);
>         }
> 
> 
> 
> which is basically move enable_irq and disable_irq to driver noirq stages,
> to avoid:
> 1/ not disabled by irq_pm_check_wakeup when it first fired
> 2/ re-enabled by resume_irq when it disabled by irq_pm_check_wakeup
> 
> 
> with that hack, i no longer saw the irq storm, and the irq handler would
> never be called:
> 
> [    9.693385] device_wakeup_arm_wake_irqs
> [    9.697130] suspend_device_irq
> <--- suspend noirq, enable wake irq
> [    9.716569] irq_pm_check_wakeup disable the wake irq
> <--- resume noirq, disable wake irq
> [    9.968115] resume_irqs <-- enable wake irq(not actually enable, since
> disabled twice)
> [   10.193072] device_wakeup_disarm_wake_irqs

OK, let's fix any wakeriq ordering issues to make it more
usable. Sounds like in your case the wakeirq needs to be enabled
late and disabled early, while in my test cases I can keep it
enabled basically any time.

If this is for suspend/resume, You could just register the
wakeirq on suspend and then remove it on resume. We do have at
least network drivers doing device_init_wakeup(dev, true) and
device_init_wakeup(dev, false) as needed for WOL, see for example
bfin_mac_ethtool_setwol().

Regards,

Tony

Jeffy Chen Aug. 23, 2017, 1:32 a.m. UTC | #10

Hi Tony,

On 08/23/2017 01:26 AM, Tony Lindgren wrote:
> OK, let's fix any wakeriq ordering issues to make it more
> usable. Sounds like in your case the wakeirq needs to be enabled
> late and disabled early, while in my test cases I can keep it
> enabled basically any time.

yes, in my case it's a level triggered irq, which needs to be disabled 
when receive it(by irq_pm_check_wakeup(my hack) or inside of the custom 
irq handler)


and for eage irq, maybe we should enable it right after(or before) the 
driver activate wake function(for example activate WOWLAN or WOLAN), 
otherwise would it be possible to miss some irqs(triggered before we 
actually enable the wake irq)?

>
> If this is for suspend/resume, You could just register the
> wakeirq on suspend and then remove it on resume. We do have at
> least network drivers doing device_init_wakeup(dev, true) and
> device_init_wakeup(dev, false) as needed for WOL, see for example
> bfin_mac_ethtool_setwol().
>
> Regards,
>
> Tony
>
>
>
>

Brian Norris Aug. 23, 2017, 1:57 a.m. UTC | #11

Hi Jeffy,

On Wed, Aug 23, 2017 at 09:32:39AM +0800, Jeffy Chen wrote:
> and for eage irq, maybe we should enable it right after(or before)
> the driver activate wake function(for example activate WOWLAN or
> WOLAN), otherwise would it be possible to miss some irqs(triggered
> before we actually enable the wake irq)?

I already mentioned this: for the PCI case, the specification explicitly
says that the WAKE# pin must remain asserted until the system wakes and
resets the link. So we don't have this problem.

But it is probably still useful to make sure there's a well-defined
point at which these interrupts are armed, so that if a device driver
does care, it can account for that. Just before suspend_noirq (as it is
today) is probably fine, so if there's some device-level handling that
needs to happen before we get to suspend (but after the wakeirq is
armed), it can go in the device or bus {suspend,resume}_noirq callbacks.

Brian

Jeffy Chen Aug. 23, 2017, 2:16 a.m. UTC | #12

Hi Brian,

On 08/23/2017 09:57 AM, Brian Norris wrote:
> Hi Jeffy,
>
> On Wed, Aug 23, 2017 at 09:32:39AM +0800, Jeffy Chen wrote:
>> and for eage irq, maybe we should enable it right after(or before)
>> the driver activate wake function(for example activate WOWLAN or
>> WOLAN), otherwise would it be possible to miss some irqs(triggered
>> before we actually enable the wake irq)?
>
> I already mentioned this: for the PCI case, the specification explicitly
> says that the WAKE# pin must remain asserted until the system wakes and
> resets the link. So we don't have this problem.
Sorry, i means for other use cases of wakeirq, for example sdio wifi

>
> But it is probably still useful to make sure there's a well-defined
> point at which these interrupts are armed, so that if a device driver
> does care, it can account for that. Just before suspend_noirq (as it is
> today) is probably fine, so if there's some device-level handling that
> needs to happen before we get to suspend (but after the wakeirq is
> armed), it can go in the device or bus {suspend,resume}_noirq callbacks.

Yes, then we may need to handle "disable level irq" job in the irq 
handler(or runtime resume callback as current wakeirq API suggested) for 
irqs received before suspend devices irqs.
>
> Brian
>
>
>

Brian Norris Dec. 19, 2017, 12:48 a.m. UTC | #13

Hi Jeffy, Tony,

On Wed, Aug 23, 2017 at 09:32:39AM +0800, Jeffy Chen wrote:
> Hi Tony,
> 
> On 08/23/2017 01:26 AM, Tony Lindgren wrote:
> >OK, let's fix any wakeriq ordering issues to make it more
> >usable. Sounds like in your case the wakeirq needs to be enabled
> >late and disabled early, while in my test cases I can keep it
> >enabled basically any time.
> 
> yes, in my case it's a level triggered irq, which needs to be
> disabled when receive it(by irq_pm_check_wakeup(my hack) or inside
> of the custom irq handler)
> 
> 
> and for eage irq, maybe we should enable it right after(or before)
> the driver activate wake function(for example activate WOWLAN or
> WOLAN), otherwise would it be possible to miss some irqs(triggered
> before we actually enable the wake irq)?

Did this problem ever get resolved? To be clear, I believe the problem
at hand is:

(a) in suspend/resume (not runtime PM; we may not even have runtime PM
support for most PCI devices)
(b) using a level-triggered signal (PCI WAKE# is active low, and it's
nice to avoid certain races by treating it as level-triggered)

And with the current wakeirq code (and the latest version of Jeffy's
patch series, IIUC), I believe the above case can still trigger an
interrupt storm of sorts (it's not usually unrecoverably-stormy, since
it's a threaded IRQ, and we make "enough" progress).

I don't see how "ordering" can really resolve this problem, unless the
ordering is configured such that the interrupt handler never runs (e.g.,
we disable the IRQ before we get out of any "noirq" phase).

Options I can think of:
(1) implement runtime PM callbacks for all PCI devices, where we clear
any PME status and ensure WAKE# stops asserting [1]
(2) synchronize a device's resume() with the dedicated wake IRQ
(3) skip using the dedicated wake IRQ infrastructure and write our own
interrupt handler for this PCI/PM function

Option (1) seems pretty strange; we don't actually want to manage these
devices with runtime PM.

Option (2) could work, but it would probably require sharing more of the
core suspend/resume internals between
drivers/base/power/{wakeirq,main}.c, which may not be desirable. Among
other problems, that seems more fragile.

Option (3) is easy enough, and we already did that once for the first
pass at poorly implementing this WAKE# logic within the mwifiex driver
:)

> >If this is for suspend/resume, You could just register the
> >wakeirq on suspend and then remove it on resume. We do have at
> >least network drivers doing device_init_wakeup(dev, true) and
> >device_init_wakeup(dev, false) as needed for WOL, see for example
> >bfin_mac_ethtool_setwol().

I don't see how that would be good enough. You still have a window of
time while the driver hasn't finished resuming, in which the interrupt
handler might trigger many times.

Brian

[1] Then we also need to fixup handle_threaded_wake_irq(). Currently it
will not even try to resume the device:

	/* Maybe abort suspend? */
	if (irqd_is_wakeup_set(irq_get_irq_data(irq))) {
		pm_wakeup_event(wirq->dev, 0);

		return IRQ_HANDLED; <--- we exit here
	}

	/* We don't want RPM_ASYNC or RPM_NOWAIT here */
	res = pm_runtime_resume(wirq->dev);
	...

Tony Lindgren Dec. 20, 2017, 7:19 p.m. UTC | #14

Hi,

* Brian Norris <briannorris@chromium.org> [171219 00:50]:
> On Wed, Aug 23, 2017 at 09:32:39AM +0800, Jeffy Chen wrote:
> 
> Did this problem ever get resolved? To be clear, I believe the problem
> at hand is:
> 
> (a) in suspend/resume (not runtime PM; we may not even have runtime PM
> support for most PCI devices)

It seems it should be enough to implement runtime PM in the PCI
controller. Isn't each PCI WAKE# line is wired from each PCI device
to the PCI controller?

Then the PCI controller can figure out from which PCI device the
WAKE# came from.

> Options I can think of:
> (1) implement runtime PM callbacks for all PCI devices, where we clear
> any PME status and ensure WAKE# stops asserting [1]

I don't think this is needed, it should be enough to have just
the PCI controller implement runtime PM :)

Regards,

Tony

Brian Norris Dec. 22, 2017, 11:20 p.m. UTC | #15

+ Rafael to this thread

On Wed, Dec 20, 2017 at 11:19:12AM -0800, Tony Lindgren wrote:
> * Brian Norris <briannorris@chromium.org> [171219 00:50]:
> > On Wed, Aug 23, 2017 at 09:32:39AM +0800, Jeffy Chen wrote:
> > 
> > Did this problem ever get resolved? To be clear, I believe the problem
> > at hand is:
> > 
> > (a) in suspend/resume (not runtime PM; we may not even have runtime PM
> > support for most PCI devices)
> 
> It seems it should be enough to implement runtime PM in the PCI
> controller. Isn't each PCI WAKE# line is wired from each PCI device
> to the PCI controller?

No, not really. As discussed in later versions of this thread already,
the WAKE# hierarchy is orthogonal to the PCI hierarchy, and I think we
settled that it's reasonable to just consider this as a 1-per-device
thing, with each device "directly" attached to the PM controller. While
sharing could happen, that's something we decided to punt on...didn't
we?

> Then the PCI controller can figure out from which PCI device the
> WAKE# came from.

I'm not completely sure of the details, but I believe this *can* be
determined by PME. But I'm not sure it's guaranteed to be supported,
especially in cases where we already have 1:1 WAKE#. So we should be
*trying* to report this wakeirq info from the device, if possible.

> > Options I can think of:
> > (1) implement runtime PM callbacks for all PCI devices, where we clear
> > any PME status and ensure WAKE# stops asserting [1]
> 
> I don't think this is needed, it should be enough to have just
> the PCI controller implement runtime PM :)

Brian

Tony Lindgren Dec. 23, 2017, 4:36 p.m. UTC | #16

* Brian Norris <briannorris@chromium.org> [171222 23:23]:
> + Rafael to this thread
> 
> On Wed, Dec 20, 2017 at 11:19:12AM -0800, Tony Lindgren wrote:
> > * Brian Norris <briannorris@chromium.org> [171219 00:50]:
> > > On Wed, Aug 23, 2017 at 09:32:39AM +0800, Jeffy Chen wrote:
> > > 
> > > Did this problem ever get resolved? To be clear, I believe the problem
> > > at hand is:
> > > 
> > > (a) in suspend/resume (not runtime PM; we may not even have runtime PM
> > > support for most PCI devices)
> > 
> > It seems it should be enough to implement runtime PM in the PCI
> > controller. Isn't each PCI WAKE# line is wired from each PCI device
> > to the PCI controller?
> 
> No, not really. As discussed in later versions of this thread already,
> the WAKE# hierarchy is orthogonal to the PCI hierarchy, and I think we
> settled that it's reasonable to just consider this as a 1-per-device
> thing, with each device "directly" attached to the PM controller. While
> sharing could happen, that's something we decided to punt on...didn't
> we?

Sounds like we need a confirmation from some hardware people on
this if the PCI device WAKE# can be wired to PCI controller or
to a separate PM controller directly :)

> > Then the PCI controller can figure out from which PCI device the
> > WAKE# came from.
> 
> I'm not completely sure of the details, but I believe this *can* be
> determined by PME. But I'm not sure it's guaranteed to be supported,
> especially in cases where we already have 1:1 WAKE#. So we should be
> *trying* to report this wakeirq info from the device, if possible.

If a PCI device has it's WAKE# wired directly to a PM controller
instead of the PCI controller, then it seems that the PCI controller
should be woken up and then presumably the regular PCI device
interrupts will take care of the rest.

Or else I'd say if the driver for the PCI device in some custom
case needs to do something specific, then having that driver
implement PM runtime makes sense. Naturally we want to avoid a
dependency where all the possible drivers would need to implement
PM runtime, I doubt that's needed though.

Regards,

Tony

[RFC,v2,1/3] PCI: rockchip: Add support for pcie wake irq

Commit Message

Comments

Patch