diff mbox series

[v2] mmc: dw_mmc: Disable SDIO interrupts while suspended to fix suspend/resume

Message ID 20190429204040.18725-1-dianders@chromium.org (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show
Series [v2] mmc: dw_mmc: Disable SDIO interrupts while suspended to fix suspend/resume | expand

Commit Message

Doug Anderson April 29, 2019, 8:40 p.m. UTC
Processing SDIO interrupts while dw_mmc is suspended (or partly
suspended) seems like a bad idea.  We really don't want to be
processing them until we've gotten ourselves fully powered up.

You might be wondering how it's even possible to become suspended when
an SDIO interrupt is active.  As can be seen in
dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
suspend when the SDIO interrupt is enabled.  ...but even though we
stop normal runtime suspend transitions when SDIO interrupts are
enabled, the dw_mci_runtime_suspend() can still get called for a full
system suspend.

Let's handle all this by explicitly masking SDIO interrupts in the
suspend call and unmasking them later in the resume call.  To do this
cleanly I'll keep track of whether the client requested that SDIO
interrupts be enabled so that we can reliably restore them regardless
of whether we're masking them for one reason or another.

It should be noted that if dw_mci_enable_sdio_irq() is never called
(for instance, if we don't have an SDIO card plugged in) that
"client_sdio_enb" will always be false.  In those cases this patch
adds a tiny bit of overhead to suspend/resume (a spinlock and a
read/write of INTMASK) but other than that is a no-op.  The
SDMMC_INT_SDIO bit should always be clear and clearing it again won't
hurt.

Without this fix it can be seen that rk3288-veyron Chromebooks with
Marvell WiFi would sometimes fail to resume WiFi even after picking my
recent mwifiex patch [1].  Specifically you'd see messages like this:
  mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
  mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state

...and tracing through the resume code in the failing cases showed
that we were processing a SDIO interrupt really early in the resume
call.

NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
Defer SDIO interrupt handling until after resume") [2].  Presumably
this is the same problem that was solved by that patch.

[1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
[2] https://crrev.com/c/230765

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I didn't put any "Fixes" tag here, but presumably this could be
backported to whichever kernels folks found it useful for.  I have at
least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
show the problem.  It is very easy to pick this to v4.19 and it
definitely fixes the problem there.

I haven't spent the time to pick this to 4.14 myself, but presumably
it wouldn't be too hard to backport this as far as v4.13 since that
contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
make sense for anyone experiencing this problem to just pick the old
CHROMIUM patch to fix them.

Changes in v2:
- Suggested 4.14+ in the stable tag (Sasha-bot)
- Extra note that this is a noop on non-SDIO (Shawn / Emil)
- Make boolean logic cleaner as per https://crrev.com/c/1586207/1
- Hopefully clear comments as per https://crrev.com/c/1586207/1

 drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
 drivers/mmc/host/dw_mmc.h |  3 +++
 2 files changed, 26 insertions(+), 4 deletions(-)

Comments

Guenter Roeck April 29, 2019, 8:46 p.m. UTC | #1
On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> Processing SDIO interrupts while dw_mmc is suspended (or partly
> suspended) seems like a bad idea.  We really don't want to be
> processing them until we've gotten ourselves fully powered up.
>
> You might be wondering how it's even possible to become suspended when
> an SDIO interrupt is active.  As can be seen in
> dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> suspend when the SDIO interrupt is enabled.  ...but even though we
> stop normal runtime suspend transitions when SDIO interrupts are
> enabled, the dw_mci_runtime_suspend() can still get called for a full
> system suspend.
>
> Let's handle all this by explicitly masking SDIO interrupts in the
> suspend call and unmasking them later in the resume call.  To do this
> cleanly I'll keep track of whether the client requested that SDIO
> interrupts be enabled so that we can reliably restore them regardless
> of whether we're masking them for one reason or another.
>
> It should be noted that if dw_mci_enable_sdio_irq() is never called
> (for instance, if we don't have an SDIO card plugged in) that
> "client_sdio_enb" will always be false.  In those cases this patch
> adds a tiny bit of overhead to suspend/resume (a spinlock and a
> read/write of INTMASK) but other than that is a no-op.  The
> SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> hurt.
>
> Without this fix it can be seen that rk3288-veyron Chromebooks with
> Marvell WiFi would sometimes fail to resume WiFi even after picking my
> recent mwifiex patch [1].  Specifically you'd see messages like this:
>   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
>   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
>
> ...and tracing through the resume code in the failing cases showed
> that we were processing a SDIO interrupt really early in the resume
> call.
>
> NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> Defer SDIO interrupt handling until after resume") [2].  Presumably
> this is the same problem that was solved by that patch.
>
> [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
> [2] https://crrev.com/c/230765
>
> Cc: <stable@vger.kernel.org> # 4.14.x
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Guenter Roeck <groeck@chromium.org>

> ---
> I didn't put any "Fixes" tag here, but presumably this could be
> backported to whichever kernels folks found it useful for.  I have at
> least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> show the problem.  It is very easy to pick this to v4.19 and it
> definitely fixes the problem there.
>
> I haven't spent the time to pick this to 4.14 myself, but presumably
> it wouldn't be too hard to backport this as far as v4.13 since that
> contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
> make sense for anyone experiencing this problem to just pick the old
> CHROMIUM patch to fix them.
>
> Changes in v2:
> - Suggested 4.14+ in the stable tag (Sasha-bot)
> - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> - Hopefully clear comments as per https://crrev.com/c/1586207/1
>
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
>  drivers/mmc/host/dw_mmc.h |  3 +++
>  2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 80dc2fd6576c..480067b87a94 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card)
>         }
>  }
>
> -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
> +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb,
> +                                    bool client_requested)
>  {
>         struct dw_mci *host = slot->host;
>         unsigned long irqflags;
> @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
>
>         spin_lock_irqsave(&host->irq_lock, irqflags);
>
> +       /*
> +        * If we're being called directly from dw_mci_enable_sdio_irq()
> +        * (which means that the client driver actually wants to enable or
> +        * disable interrupts) then save the request.  Otherwise this
> +        * wasn't directly requested by the client and we should logically
> +        * AND it with the client request since we want to disable if
> +        * _either_ the client disabled OR we have some other reason to
> +        * disable temporarily.
> +        */
> +       if (client_requested)
> +               host->client_sdio_enb = enb;
> +       else
> +               enb &= host->client_sdio_enb;
> +
>         /* Enable/disable Slot Specific SDIO interrupt */
>         int_mask = mci_readl(host, INTMASK);
>         if (enb)
> @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb)
>         struct dw_mci_slot *slot = mmc_priv(mmc);
>         struct dw_mci *host = slot->host;
>
> -       __dw_mci_enable_sdio_irq(slot, enb);
> +       __dw_mci_enable_sdio_irq(slot, enb, true);
>
>         /* Avoid runtime suspending the device when SDIO IRQ is enabled */
>         if (enb)
> @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc)
>  {
>         struct dw_mci_slot *slot = mmc_priv(mmc);
>
> -       __dw_mci_enable_sdio_irq(slot, 1);
> +       __dw_mci_enable_sdio_irq(slot, true, false);
>  }
>
>  static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode)
> @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>                 if (pending & SDMMC_INT_SDIO(slot->sdio_id)) {
>                         mci_writel(host, RINTSTS,
>                                    SDMMC_INT_SDIO(slot->sdio_id));
> -                       __dw_mci_enable_sdio_irq(slot, 0);
> +                       __dw_mci_enable_sdio_irq(slot, false, false);
>                         sdio_signal_irq(slot->mmc);
>                 }
>
> @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev)
>  {
>         struct dw_mci *host = dev_get_drvdata(dev);
>
> +       __dw_mci_enable_sdio_irq(host->slot, false, false);
> +
>         if (host->use_dma && host->dma_ops->exit)
>                 host->dma_ops->exit(host);
>
> @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev)
>         /* Now that slots are all setup, we can enable card detect */
>         dw_mci_enable_cd(host);
>
> +       __dw_mci_enable_sdio_irq(host->slot, true, false);
> +
>         return 0;
>
>  err:
> diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h
> index 46e9f8ec5398..dfbace0f5043 100644
> --- a/drivers/mmc/host/dw_mmc.h
> +++ b/drivers/mmc/host/dw_mmc.h
> @@ -127,6 +127,7 @@ struct dw_mci_dma_slave {
>   * @cmd11_timer: Timer for SD3.0 voltage switch over scheme.
>   * @cto_timer: Timer for broken command transfer over scheme.
>   * @dto_timer: Timer for broken data transfer over scheme.
> + * @client_sdio_enb: The value last passed to enable_sdio_irq.
>   *
>   * Locking
>   * =======
> @@ -234,6 +235,8 @@ struct dw_mci {
>         struct timer_list       cmd11_timer;
>         struct timer_list       cto_timer;
>         struct timer_list       dto_timer;
> +
> +       bool                    client_sdio_enb;
>  };
>
>  /* DMA ops for Internal/External DMAC interface */
> --
> 2.21.0.593.g511ec345e18-goog
>
Doug Anderson May 20, 2019, 6:41 p.m. UTC | #2
Hi,

On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> Processing SDIO interrupts while dw_mmc is suspended (or partly
> suspended) seems like a bad idea.  We really don't want to be
> processing them until we've gotten ourselves fully powered up.
>
> You might be wondering how it's even possible to become suspended when
> an SDIO interrupt is active.  As can be seen in
> dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> suspend when the SDIO interrupt is enabled.  ...but even though we
> stop normal runtime suspend transitions when SDIO interrupts are
> enabled, the dw_mci_runtime_suspend() can still get called for a full
> system suspend.
>
> Let's handle all this by explicitly masking SDIO interrupts in the
> suspend call and unmasking them later in the resume call.  To do this
> cleanly I'll keep track of whether the client requested that SDIO
> interrupts be enabled so that we can reliably restore them regardless
> of whether we're masking them for one reason or another.
>
> It should be noted that if dw_mci_enable_sdio_irq() is never called
> (for instance, if we don't have an SDIO card plugged in) that
> "client_sdio_enb" will always be false.  In those cases this patch
> adds a tiny bit of overhead to suspend/resume (a spinlock and a
> read/write of INTMASK) but other than that is a no-op.  The
> SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> hurt.
>
> Without this fix it can be seen that rk3288-veyron Chromebooks with
> Marvell WiFi would sometimes fail to resume WiFi even after picking my
> recent mwifiex patch [1].  Specifically you'd see messages like this:
>   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
>   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
>
> ...and tracing through the resume code in the failing cases showed
> that we were processing a SDIO interrupt really early in the resume
> call.
>
> NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> Defer SDIO interrupt handling until after resume") [2].  Presumably
> this is the same problem that was solved by that patch.
>
> [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
> [2] https://crrev.com/c/230765
>
> Cc: <stable@vger.kernel.org> # 4.14.x
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> I didn't put any "Fixes" tag here, but presumably this could be
> backported to whichever kernels folks found it useful for.  I have at
> least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> show the problem.  It is very easy to pick this to v4.19 and it
> definitely fixes the problem there.
>
> I haven't spent the time to pick this to 4.14 myself, but presumably
> it wouldn't be too hard to backport this as far as v4.13 since that
> contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
> make sense for anyone experiencing this problem to just pick the old
> CHROMIUM patch to fix them.
>
> Changes in v2:
> - Suggested 4.14+ in the stable tag (Sasha-bot)
> - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> - Hopefully clear comments as per https://crrev.com/c/1586207/1
>
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
>  drivers/mmc/host/dw_mmc.h |  3 +++
>  2 files changed, 26 insertions(+), 4 deletions(-)

Ulf: are you the right person to land this?  With 5.2-rc1 out it might
be a good time for it?  To refresh your memory about this patch:

* Patch v1 was posted back on April 10th [1] so we're at about 1.5
months of time for people to comment about it now.  Should be more
than enough.

* Shawn Lin saw it and didn't hate it.  He had some confusion about
how it worked and I've hopefully alleviated via extra comments / text.

* Emil Renner Berthing thought it caused a regression for him but then
tested further and was convinced that it didn't.  This is extra
confirmation that someone other than me did try the patch and found it
to not break things.  ;-)

* It has been reviewed by Guenter Roeck (in v2)

[1] https://lkml.kernel.org/r/20190410221237.160856-1-dianders@chromium.org


-Doug
Shawn Lin May 28, 2019, 7:47 a.m. UTC | #3
On 2019/4/30 4:40, Douglas Anderson wrote:
> Processing SDIO interrupts while dw_mmc is suspended (or partly
> suspended) seems like a bad idea.  We really don't want to be
> processing them until we've gotten ourselves fully powered up.
> 
> You might be wondering how it's even possible to become suspended when
> an SDIO interrupt is active.  As can be seen in
> dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> suspend when the SDIO interrupt is enabled.  ...but even though we
> stop normal runtime suspend transitions when SDIO interrupts are
> enabled, the dw_mci_runtime_suspend() can still get called for a full
> system suspend.
> 
> Let's handle all this by explicitly masking SDIO interrupts in the
> suspend call and unmasking them later in the resume call.  To do this
> cleanly I'll keep track of whether the client requested that SDIO
> interrupts be enabled so that we can reliably restore them regardless
> of whether we're masking them for one reason or another.
> 
> It should be noted that if dw_mci_enable_sdio_irq() is never called
> (for instance, if we don't have an SDIO card plugged in) that
> "client_sdio_enb" will always be false.  In those cases this patch
> adds a tiny bit of overhead to suspend/resume (a spinlock and a
> read/write of INTMASK) but other than that is a no-op.  The
> SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> hurt.
> 
> Without this fix it can be seen that rk3288-veyron Chromebooks with
> Marvell WiFi would sometimes fail to resume WiFi even after picking my
> recent mwifiex patch [1].  Specifically you'd see messages like this:
>    mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
>    mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
> 
> ...and tracing through the resume code in the failing cases showed
> that we were processing a SDIO interrupt really early in the resume
> call.
> 
> NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> Defer SDIO interrupt handling until after resume") [2].  Presumably
> this is the same problem that was solved by that patch.
> 
> [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
> [2] https://crrev.com/c/230765
> 

Sorry for late, but FWIW:

Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>

> Cc: <stable@vger.kernel.org> # 4.14.x
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> I didn't put any "Fixes" tag here, but presumably this could be
> backported to whichever kernels folks found it useful for.  I have at
> least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> show the problem.  It is very easy to pick this to v4.19 and it
> definitely fixes the problem there.
> 
> I haven't spent the time to pick this to 4.14 myself, but presumably
> it wouldn't be too hard to backport this as far as v4.13 since that
> contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
> make sense for anyone experiencing this problem to just pick the old
> CHROMIUM patch to fix them.
> 
> Changes in v2:
> - Suggested 4.14+ in the stable tag (Sasha-bot)
> - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> - Hopefully clear comments as per https://crrev.com/c/1586207/1
> 
>   drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
>   drivers/mmc/host/dw_mmc.h |  3 +++
>   2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 80dc2fd6576c..480067b87a94 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card)
>   	}
>   }
>   
> -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
> +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb,
> +				     bool client_requested)
>   {
>   	struct dw_mci *host = slot->host;
>   	unsigned long irqflags;
> @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
>   
>   	spin_lock_irqsave(&host->irq_lock, irqflags);
>   
> +	/*
> +	 * If we're being called directly from dw_mci_enable_sdio_irq()
> +	 * (which means that the client driver actually wants to enable or
> +	 * disable interrupts) then save the request.  Otherwise this
> +	 * wasn't directly requested by the client and we should logically
> +	 * AND it with the client request since we want to disable if
> +	 * _either_ the client disabled OR we have some other reason to
> +	 * disable temporarily.
> +	 */
> +	if (client_requested)
> +		host->client_sdio_enb = enb;
> +	else
> +		enb &= host->client_sdio_enb;
> +
>   	/* Enable/disable Slot Specific SDIO interrupt */
>   	int_mask = mci_readl(host, INTMASK);
>   	if (enb)
> @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb)
>   	struct dw_mci_slot *slot = mmc_priv(mmc);
>   	struct dw_mci *host = slot->host;
>   
> -	__dw_mci_enable_sdio_irq(slot, enb);
> +	__dw_mci_enable_sdio_irq(slot, enb, true);
>   
>   	/* Avoid runtime suspending the device when SDIO IRQ is enabled */
>   	if (enb)
> @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc)
>   {
>   	struct dw_mci_slot *slot = mmc_priv(mmc);
>   
> -	__dw_mci_enable_sdio_irq(slot, 1);
> +	__dw_mci_enable_sdio_irq(slot, true, false);
>   }
>   
>   static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode)
> @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>   		if (pending & SDMMC_INT_SDIO(slot->sdio_id)) {
>   			mci_writel(host, RINTSTS,
>   				   SDMMC_INT_SDIO(slot->sdio_id));
> -			__dw_mci_enable_sdio_irq(slot, 0);
> +			__dw_mci_enable_sdio_irq(slot, false, false);
>   			sdio_signal_irq(slot->mmc);
>   		}
>   
> @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev)
>   {
>   	struct dw_mci *host = dev_get_drvdata(dev);
>   
> +	__dw_mci_enable_sdio_irq(host->slot, false, false);
> +
>   	if (host->use_dma && host->dma_ops->exit)
>   		host->dma_ops->exit(host);
>   
> @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev)
>   	/* Now that slots are all setup, we can enable card detect */
>   	dw_mci_enable_cd(host);
>   
> +	__dw_mci_enable_sdio_irq(host->slot, true, false);
> +
>   	return 0;
>   
>   err:
> diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h
> index 46e9f8ec5398..dfbace0f5043 100644
> --- a/drivers/mmc/host/dw_mmc.h
> +++ b/drivers/mmc/host/dw_mmc.h
> @@ -127,6 +127,7 @@ struct dw_mci_dma_slave {
>    * @cmd11_timer: Timer for SD3.0 voltage switch over scheme.
>    * @cto_timer: Timer for broken command transfer over scheme.
>    * @dto_timer: Timer for broken data transfer over scheme.
> + * @client_sdio_enb: The value last passed to enable_sdio_irq.
>    *
>    * Locking
>    * =======
> @@ -234,6 +235,8 @@ struct dw_mci {
>   	struct timer_list       cmd11_timer;
>   	struct timer_list       cto_timer;
>   	struct timer_list       dto_timer;
> +
> +	bool			client_sdio_enb;
>   };
>   
>   /* DMA ops for Internal/External DMAC interface */
>
Ulf Hansson May 28, 2019, 1:11 p.m. UTC | #4
On Mon, 20 May 2019 at 20:41, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > Processing SDIO interrupts while dw_mmc is suspended (or partly
> > suspended) seems like a bad idea.  We really don't want to be
> > processing them until we've gotten ourselves fully powered up.
> >
> > You might be wondering how it's even possible to become suspended when
> > an SDIO interrupt is active.  As can be seen in
> > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> > suspend when the SDIO interrupt is enabled.  ...but even though we
> > stop normal runtime suspend transitions when SDIO interrupts are
> > enabled, the dw_mci_runtime_suspend() can still get called for a full
> > system suspend.
> >
> > Let's handle all this by explicitly masking SDIO interrupts in the
> > suspend call and unmasking them later in the resume call.  To do this
> > cleanly I'll keep track of whether the client requested that SDIO
> > interrupts be enabled so that we can reliably restore them regardless
> > of whether we're masking them for one reason or another.
> >
> > It should be noted that if dw_mci_enable_sdio_irq() is never called
> > (for instance, if we don't have an SDIO card plugged in) that
> > "client_sdio_enb" will always be false.  In those cases this patch
> > adds a tiny bit of overhead to suspend/resume (a spinlock and a
> > read/write of INTMASK) but other than that is a no-op.  The
> > SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> > hurt.
> >
> > Without this fix it can be seen that rk3288-veyron Chromebooks with
> > Marvell WiFi would sometimes fail to resume WiFi even after picking my
> > recent mwifiex patch [1].  Specifically you'd see messages like this:
> >   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
> >   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
> >
> > ...and tracing through the resume code in the failing cases showed
> > that we were processing a SDIO interrupt really early in the resume
> > call.
> >
> > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> > Defer SDIO interrupt handling until after resume") [2].  Presumably
> > this is the same problem that was solved by that patch.
> >
> > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
> > [2] https://crrev.com/c/230765
> >
> > Cc: <stable@vger.kernel.org> # 4.14.x
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > I didn't put any "Fixes" tag here, but presumably this could be
> > backported to whichever kernels folks found it useful for.  I have at
> > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> > show the problem.  It is very easy to pick this to v4.19 and it
> > definitely fixes the problem there.
> >
> > I haven't spent the time to pick this to 4.14 myself, but presumably
> > it wouldn't be too hard to backport this as far as v4.13 since that
> > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
> > make sense for anyone experiencing this problem to just pick the old
> > CHROMIUM patch to fix them.
> >
> > Changes in v2:
> > - Suggested 4.14+ in the stable tag (Sasha-bot)
> > - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> > - Hopefully clear comments as per https://crrev.com/c/1586207/1
> >
> >  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
> >  drivers/mmc/host/dw_mmc.h |  3 +++
> >  2 files changed, 26 insertions(+), 4 deletions(-)
>
> Ulf: are you the right person to land this?  With 5.2-rc1 out it might
> be a good time for it?  To refresh your memory about this patch:
>
> * Patch v1 was posted back on April 10th [1] so we're at about 1.5
> months of time for people to comment about it now.  Should be more
> than enough.

Apologize for the delay, not sure why this has slipped through my
filters. Anyway, let me have a look at it now.

>
> * Shawn Lin saw it and didn't hate it.  He had some confusion about
> how it worked and I've hopefully alleviated via extra comments / text.
>
> * Emil Renner Berthing thought it caused a regression for him but then
> tested further and was convinced that it didn't.  This is extra
> confirmation that someone other than me did try the patch and found it
> to not break things.  ;-)
>
> * It has been reviewed by Guenter Roeck (in v2)

One question, I am guessing you are considering
https://lkml.org/lkml/2019/5/17/761 as the long term solution, and
thus $subject patch should go as fix+stable? No?

>
> [1] https://lkml.kernel.org/r/20190410221237.160856-1-dianders@chromium.org
>
>
> -Doug

Kind regards
Uffe
Doug Anderson May 28, 2019, 2:38 p.m. UTC | #5
Hi,

On Tue, May 28, 2019 at 6:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 20 May 2019 at 20:41, Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote:
> > >
> > > Processing SDIO interrupts while dw_mmc is suspended (or partly
> > > suspended) seems like a bad idea.  We really don't want to be
> > > processing them until we've gotten ourselves fully powered up.
> > >
> > > You might be wondering how it's even possible to become suspended when
> > > an SDIO interrupt is active.  As can be seen in
> > > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> > > suspend when the SDIO interrupt is enabled.  ...but even though we
> > > stop normal runtime suspend transitions when SDIO interrupts are
> > > enabled, the dw_mci_runtime_suspend() can still get called for a full
> > > system suspend.
> > >
> > > Let's handle all this by explicitly masking SDIO interrupts in the
> > > suspend call and unmasking them later in the resume call.  To do this
> > > cleanly I'll keep track of whether the client requested that SDIO
> > > interrupts be enabled so that we can reliably restore them regardless
> > > of whether we're masking them for one reason or another.
> > >
> > > It should be noted that if dw_mci_enable_sdio_irq() is never called
> > > (for instance, if we don't have an SDIO card plugged in) that
> > > "client_sdio_enb" will always be false.  In those cases this patch
> > > adds a tiny bit of overhead to suspend/resume (a spinlock and a
> > > read/write of INTMASK) but other than that is a no-op.  The
> > > SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> > > hurt.
> > >
> > > Without this fix it can be seen that rk3288-veyron Chromebooks with
> > > Marvell WiFi would sometimes fail to resume WiFi even after picking my
> > > recent mwifiex patch [1].  Specifically you'd see messages like this:
> > >   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
> > >   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
> > >
> > > ...and tracing through the resume code in the failing cases showed
> > > that we were processing a SDIO interrupt really early in the resume
> > > call.
> > >
> > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> > > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> > > Defer SDIO interrupt handling until after resume") [2].  Presumably
> > > this is the same problem that was solved by that patch.
> > >
> > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
> > > [2] https://crrev.com/c/230765
> > >
> > > Cc: <stable@vger.kernel.org> # 4.14.x
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > I didn't put any "Fixes" tag here, but presumably this could be
> > > backported to whichever kernels folks found it useful for.  I have at
> > > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> > > show the problem.  It is very easy to pick this to v4.19 and it
> > > definitely fixes the problem there.
> > >
> > > I haven't spent the time to pick this to 4.14 myself, but presumably
> > > it wouldn't be too hard to backport this as far as v4.13 since that
> > > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> > > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
> > > make sense for anyone experiencing this problem to just pick the old
> > > CHROMIUM patch to fix them.
> > >
> > > Changes in v2:
> > > - Suggested 4.14+ in the stable tag (Sasha-bot)
> > > - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> > > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> > > - Hopefully clear comments as per https://crrev.com/c/1586207/1
> > >
> > >  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
> > >  drivers/mmc/host/dw_mmc.h |  3 +++
> > >  2 files changed, 26 insertions(+), 4 deletions(-)
> >
> > Ulf: are you the right person to land this?  With 5.2-rc1 out it might
> > be a good time for it?  To refresh your memory about this patch:
> >
> > * Patch v1 was posted back on April 10th [1] so we're at about 1.5
> > months of time for people to comment about it now.  Should be more
> > than enough.
>
> Apologize for the delay, not sure why this has slipped through my
> filters. Anyway, let me have a look at it now.

No worries.  If there's something better I can do in the future to
avoid problems, please let me know.


> > * Shawn Lin saw it and didn't hate it.  He had some confusion about
> > how it worked and I've hopefully alleviated via extra comments / text.
> >
> > * Emil Renner Berthing thought it caused a regression for him but then
> > tested further and was convinced that it didn't.  This is extra
> > confirmation that someone other than me did try the patch and found it
> > to not break things.  ;-)
> >
> > * It has been reviewed by Guenter Roeck (in v2)
>
> One question, I am guessing you are considering
> https://lkml.org/lkml/2019/5/17/761 as the long term solution, and
> thus $subject patch should go as fix+stable? No?

No, the two problems are completely separate.  ${SUBJECT} patch deals
with full system suspend/resume.  I originally reproduced the problems
on a device with Marvell WiFi, though it could possibly affect
Broadcom parts on dw_mmc too.  ${SUBJECT} patch is ready to land on
mainline Linux at your leisure.  Also: in case you didn't notice
(since it didn't thread properly for me), Shawn Lin gave it a reviewed
by [1].

The patch you refer to [2] is related to something akin to an idle
state (more like Runtime PM) and only affects Broadcom parts.  Also
the Broadcom issue ought to happen across all host controllers (AKA
not just dw_mmc).

[1] https://lkml.kernel.org/r/982ffba1-c599-e73d-e5e0-b1be5668851c@rock-chips.com
[2] https://lkml.org/lkml/2019/5/17/761

-Doug
Ulf Hansson May 28, 2019, 7:21 p.m. UTC | #6
On Mon, 29 Apr 2019 at 22:41, Douglas Anderson <dianders@chromium.org> wrote:
>
> Processing SDIO interrupts while dw_mmc is suspended (or partly
> suspended) seems like a bad idea.  We really don't want to be
> processing them until we've gotten ourselves fully powered up.

I fully agree.

Although, this is important not only from the host driver/controller
perspective, but also from the SDIO card (managed by mmc core) point
of view.

In $subject patch you mange the driver/controller issue, but only for
one specific host driver (dw_mmc). I am thinking that this problem may
be a rather common problem, so perhaps we should try to address this
from the core in a way that it affects all host drivers. Did you
consider that?

The other problem I refer to, is in principle a way to prevent
sdio_run_irqs() from being executed before the SDIO card has been
resumed, via mmc_sdio_resume(). It's a separate problem, but certainly
related. This may need some more thinking to address properly, let's
just keep this in mind and discuss this in a separate thread.

>
> You might be wondering how it's even possible to become suspended when
> an SDIO interrupt is active.  As can be seen in
> dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> suspend when the SDIO interrupt is enabled.  ...but even though we
> stop normal runtime suspend transitions when SDIO interrupts are
> enabled, the dw_mci_runtime_suspend() can still get called for a full
> system suspend.
>
> Let's handle all this by explicitly masking SDIO interrupts in the
> suspend call and unmasking them later in the resume call.  To do this
> cleanly I'll keep track of whether the client requested that SDIO
> interrupts be enabled so that we can reliably restore them regardless
> of whether we're masking them for one reason or another.
>
> It should be noted that if dw_mci_enable_sdio_irq() is never called
> (for instance, if we don't have an SDIO card plugged in) that
> "client_sdio_enb" will always be false.  In those cases this patch
> adds a tiny bit of overhead to suspend/resume (a spinlock and a
> read/write of INTMASK) but other than that is a no-op.  The
> SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> hurt.

Thanks for the detailed problem description. In general your approach
sounds okay to me, but I have a few questions.

1) As kind of stated above, did you consider a solution where the core
simply disables the SDIO IRQ in case it isn't enabled for system
wakeup? In this way all host drivers would benefit.

2) dw_mmc isn't calling device_init_wakeup() during ->probe(), hence I
assume it doesn't support the SDIO IRQ being configured as system
wakeup. Correct? I understand this is platform specific, but still it
would be good to know your view.

3) Because of 2) The below code in dw_mci_runtime_suspend(), puzzles me:
"if (host->slot->mmc->pm_flags & MMC_PM_KEEP_POWER)"
       dw_mci_set_ios(host->slot->mmc, &host->slot->mmc->ios);

Why is 3) needed at all in case system wakeup isn't supported?

A note; the current support in the mmc core for the SDIO IRQ being
used as system wakeup, really needs some re-work. For example, we
should convert to use common wakeup interfaces, as to allow the PM
core to behave correctly during system suspend/resume. These are
changes that have been scheduled on my TODO list since long time ago,
I hope I can get some time to look into them soon.

>
> Without this fix it can be seen that rk3288-veyron Chromebooks with
> Marvell WiFi would sometimes fail to resume WiFi even after picking my
> recent mwifiex patch [1].  Specifically you'd see messages like this:
>   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
>   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
>
> ...and tracing through the resume code in the failing cases showed
> that we were processing a SDIO interrupt really early in the resume
> call.
>
> NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> Defer SDIO interrupt handling until after resume") [2].  Presumably
> this is the same problem that was solved by that patch.

Seems reasonable.

>
> [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org
> [2] https://crrev.com/c/230765
>
> Cc: <stable@vger.kernel.org> # 4.14.x
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> I didn't put any "Fixes" tag here, but presumably this could be
> backported to whichever kernels folks found it useful for.  I have at
> least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> show the problem.  It is very easy to pick this to v4.19 and it
> definitely fixes the problem there.
>
> I haven't spent the time to pick this to 4.14 myself, but presumably
> it wouldn't be too hard to backport this as far as v4.13 since that
> contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs").  Prior to that it might
> make sense for anyone experiencing this problem to just pick the old
> CHROMIUM patch to fix them.
>
> Changes in v2:
> - Suggested 4.14+ in the stable tag (Sasha-bot)
> - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> - Hopefully clear comments as per https://crrev.com/c/1586207/1
>
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
>  drivers/mmc/host/dw_mmc.h |  3 +++
>  2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 80dc2fd6576c..480067b87a94 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card)
>         }
>  }
>
> -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
> +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb,
> +                                    bool client_requested)
>  {
>         struct dw_mci *host = slot->host;
>         unsigned long irqflags;
> @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
>
>         spin_lock_irqsave(&host->irq_lock, irqflags);
>
> +       /*
> +        * If we're being called directly from dw_mci_enable_sdio_irq()
> +        * (which means that the client driver actually wants to enable or
> +        * disable interrupts) then save the request.  Otherwise this
> +        * wasn't directly requested by the client and we should logically
> +        * AND it with the client request since we want to disable if
> +        * _either_ the client disabled OR we have some other reason to
> +        * disable temporarily.
> +        */
> +       if (client_requested)
> +               host->client_sdio_enb = enb;
> +       else
> +               enb &= host->client_sdio_enb;
> +
>         /* Enable/disable Slot Specific SDIO interrupt */
>         int_mask = mci_readl(host, INTMASK);
>         if (enb)
> @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb)
>         struct dw_mci_slot *slot = mmc_priv(mmc);
>         struct dw_mci *host = slot->host;
>
> -       __dw_mci_enable_sdio_irq(slot, enb);
> +       __dw_mci_enable_sdio_irq(slot, enb, true);
>
>         /* Avoid runtime suspending the device when SDIO IRQ is enabled */
>         if (enb)
> @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc)
>  {
>         struct dw_mci_slot *slot = mmc_priv(mmc);
>
> -       __dw_mci_enable_sdio_irq(slot, 1);
> +       __dw_mci_enable_sdio_irq(slot, true, false);
>  }
>
>  static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode)
> @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
>                 if (pending & SDMMC_INT_SDIO(slot->sdio_id)) {
>                         mci_writel(host, RINTSTS,
>                                    SDMMC_INT_SDIO(slot->sdio_id));
> -                       __dw_mci_enable_sdio_irq(slot, 0);
> +                       __dw_mci_enable_sdio_irq(slot, false, false);
>                         sdio_signal_irq(slot->mmc);
>                 }
>
> @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev)
>  {
>         struct dw_mci *host = dev_get_drvdata(dev);
>
> +       __dw_mci_enable_sdio_irq(host->slot, false, false);
> +
>         if (host->use_dma && host->dma_ops->exit)
>                 host->dma_ops->exit(host);
>
> @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev)
>         /* Now that slots are all setup, we can enable card detect */
>         dw_mci_enable_cd(host);
>
> +       __dw_mci_enable_sdio_irq(host->slot, true, false);
> +
>         return 0;
>
>  err:
> diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h
> index 46e9f8ec5398..dfbace0f5043 100644
> --- a/drivers/mmc/host/dw_mmc.h
> +++ b/drivers/mmc/host/dw_mmc.h
> @@ -127,6 +127,7 @@ struct dw_mci_dma_slave {
>   * @cmd11_timer: Timer for SD3.0 voltage switch over scheme.
>   * @cto_timer: Timer for broken command transfer over scheme.
>   * @dto_timer: Timer for broken data transfer over scheme.
> + * @client_sdio_enb: The value last passed to enable_sdio_irq.
>   *
>   * Locking
>   * =======
> @@ -234,6 +235,8 @@ struct dw_mci {
>         struct timer_list       cmd11_timer;
>         struct timer_list       cto_timer;
>         struct timer_list       dto_timer;
> +
> +       bool                    client_sdio_enb;
>  };
>
>  /* DMA ops for Internal/External DMAC interface */
> --
> 2.21.0.593.g511ec345e18-goog
>

Kind regards
Uffe
Doug Anderson May 28, 2019, 10:49 p.m. UTC | #7
Hi,

On Tue, May 28, 2019 at 12:22 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 29 Apr 2019 at 22:41, Douglas Anderson <dianders@chromium.org> wrote:
> >
> > Processing SDIO interrupts while dw_mmc is suspended (or partly
> > suspended) seems like a bad idea.  We really don't want to be
> > processing them until we've gotten ourselves fully powered up.
>
> I fully agree.
>
> Although, this is important not only from the host driver/controller
> perspective, but also from the SDIO card (managed by mmc core) point
> of view.
>
> In $subject patch you mange the driver/controller issue, but only for
> one specific host driver (dw_mmc). I am thinking that this problem may
> be a rather common problem, so perhaps we should try to address this
> from the core in a way that it affects all host drivers. Did you
> consider that?

I did wonder that.  See below, but in general I don't have massive
experience with all the host controllers out there.  Looking at sdhci
code, though, it might not have the same problems?  At least in some
cases it fully turns off the interrupts.  In other cases it seems like
the controller itself keeps power and so maybe getting the SDIO
interrupts early is OK?


> The other problem I refer to, is in principle a way to prevent
> sdio_run_irqs() from being executed before the SDIO card has been
> resumed, via mmc_sdio_resume(). It's a separate problem, but certainly
> related. This may need some more thinking to address properly, let's
> just keep this in mind and discuss this in a separate thread.

Actually, I think if we could figure out how to do this well it might
solve my particular problem.  Specifically I don't believe that
running dw_mmc's interrupt handler itself is causing the problem
(though it does seem pretty odd to run it while we're in the middle of
initting the host), I think it's the SDIO driver calling back into us
that's causing the problems.


> > You might be wondering how it's even possible to become suspended when
> > an SDIO interrupt is active.  As can be seen in
> > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> > suspend when the SDIO interrupt is enabled.  ...but even though we
> > stop normal runtime suspend transitions when SDIO interrupts are
> > enabled, the dw_mci_runtime_suspend() can still get called for a full
> > system suspend.
> >
> > Let's handle all this by explicitly masking SDIO interrupts in the
> > suspend call and unmasking them later in the resume call.  To do this
> > cleanly I'll keep track of whether the client requested that SDIO
> > interrupts be enabled so that we can reliably restore them regardless
> > of whether we're masking them for one reason or another.
> >
> > It should be noted that if dw_mci_enable_sdio_irq() is never called
> > (for instance, if we don't have an SDIO card plugged in) that
> > "client_sdio_enb" will always be false.  In those cases this patch
> > adds a tiny bit of overhead to suspend/resume (a spinlock and a
> > read/write of INTMASK) but other than that is a no-op.  The
> > SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> > hurt.
>
> Thanks for the detailed problem description. In general your approach
> sounds okay to me, but I have a few questions.
>
> 1) As kind of stated above, did you consider a solution where the core
> simply disables the SDIO IRQ in case it isn't enabled for system
> wakeup? In this way all host drivers would benefit.

I can give it a shot if you can give me a bunch of specific advice,
but I only have access to a few devices doing anything with SDIO and
they are all using Marvell or Broadcom on dw_mmc.

In general I have no idea how SDIO wakeup (plumbed through the SD
controller) would work.  As per below the only way I've seen it done
is totally out-of-band.  ...and actually, I'm not sure I've actually
ever seen even the out of band stuff truly work on a system myself.
It's always been one of those "we should support wake on WiFi" but
never made it all the way to implementation.  In any case, if there
are examples of people plumbing wakeup through the SD controller I'd
need to figure out how to not break them.  Just doing a solution for
dw_mmc means I don't have to worry about this since dw_mmc definitely
doesn't support SDIO wakeup.

Maybe one way to get a more generic solution is if you had an idea for
a patch that would work for many host controllers then you could post
it and I could test to confirm that it's happy on dw_mmc?  ...similar
to when you switched dw_mmc away from the old kthread-based SDIO
stuff?


> 2) dw_mmc isn't calling device_init_wakeup() during ->probe(), hence I
> assume it doesn't support the SDIO IRQ being configured as system
> wakeup. Correct? I understand this is platform specific, but still it
> would be good to know your view.

Right, currently dw_mmc doesn't support the SDIO IRQ being configured
as a system wakeup.  I'm kinda curious how this works in general.
Don't you need a clock running to get an SDIO interrupt?  How does
that work for suspend?  ...I do know that I've seen some dw_mmc host
controllers have an "SDIO IRQ" line that could be pinned out and that
line would also assert the same SDIO interrupt, but the mainline
driver has never supported it.  Whenever I asked Marvell about it in
the past they were always confused about what to do with that line and
(if I remember correctly) we never hooked it up.  I always though it
would be super interesting because it seemed like it would let us
disable the card clock when the slot was idle.  ...as far as I was
ever able to discern the pin was officially "non-standard".

As far as I know SDIO cards that want to be able to wakeup the device
end up using some sort of out of band mechanism.  For instance
"marvell,wakeup-pin" or Broadcom's "host-wake" interrupt.  As per
above, I don't have tons of experience here.  I see that
"rk3399-gru-kevin" has "marvell,wakeup-pin" defined, but that's PCIe
not SDIO.  Downstream in our Chrome OS kernel it seems like
"mt8173-oak.dtsi" and "mt8176-rowan.dtsi" have this for SDIO but they
are are devices I didn't work on and don't have much familiarity with.


> 3) Because of 2) The below code in dw_mci_runtime_suspend(), puzzles me:
> "if (host->slot->mmc->pm_flags & MMC_PM_KEEP_POWER)"
>        dw_mci_set_ios(host->slot->mmc, &host->slot->mmc->ios);
>
> Why is 3) needed at all in case system wakeup isn't supported?

That code has been in there for a really long time, dating back to
commit ab269128a2cf ("mmc: dw_mmc: Add sdio power bindings").  ...but,
in general, I know that we always keep power to the SDIO card in
suspend time.  This doesn't take a whole lot of power and speeds up
WiFi acquisition after resume by a whole lot (because otherwise we'd
have to fully re-load the WiFi firmware).  In fact, I believe that the
Marvell WiFi driver requires it.  Ah yes, search for
"MMC_PM_KEEP_POWER" in "marvell/mwifiex/sdio.c" and you'll see that it
gets yells if power isn't kept.

If we are keeping power during suspend/resume then presumably the card
will expect that communication resumes as normal upon resume.  AKA:
the clock should come up at the expected rate / voltage level and we
should resume as normal.


> A note; the current support in the mmc core for the SDIO IRQ being
> used as system wakeup, really needs some re-work. For example, we
> should convert to use common wakeup interfaces, as to allow the PM
> core to behave correctly during system suspend/resume. These are
> changes that have been scheduled on my TODO list since long time ago,
> I hope I can get some time to look into them soon.

Personally my knowledge of SD / SDIO is mostly acquired by trying to
make the dw_mmc driver do what we've needed it to over the years, so I
don't actually have a ton of broad understanding of the SDIO spec and
what is generally done for host controllers / SDIO cards.  If you're
aware of standard ways to get an SDIO IRQ to work in suspend time then
that'd be interesting.


> > Without this fix it can be seen that rk3288-veyron Chromebooks with
> > Marvell WiFi would sometimes fail to resume WiFi even after picking my
> > recent mwifiex patch [1].  Specifically you'd see messages like this:
> >   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
> >   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
> >
> > ...and tracing through the resume code in the failing cases showed
> > that we were processing a SDIO interrupt really early in the resume
> > call.
> >
> > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> > Defer SDIO interrupt handling until after resume") [2].  Presumably
> > this is the same problem that was solved by that patch.
>
> Seems reasonable.

Anyway, let me know what you think the next set of steps ought to be.
It would be sorta nice to get suspend/resume working reliably and land
this patch, but if you think that will impede our ability to come up
with a more generic solution then I guess we don't need to land it...


-Doug
Doug Anderson June 3, 2019, 6:41 p.m. UTC | #8
Ulf,

On Tue, May 28, 2019 at 3:49 PM Doug Anderson <dianders@chromium.org> wrote:
>
> > 1) As kind of stated above, did you consider a solution where the core
> > simply disables the SDIO IRQ in case it isn't enabled for system
> > wakeup? In this way all host drivers would benefit.
>
> I can give it a shot if you can give me a bunch of specific advice,
> but I only have access to a few devices doing anything with SDIO and
> they are all using Marvell or Broadcom on dw_mmc.
>
> In general I have no idea how SDIO wakeup (plumbed through the SD
> controller) would work.  As per below the only way I've seen it done
> is totally out-of-band.  ...and actually, I'm not sure I've actually
> ever seen even the out of band stuff truly work on a system myself.
> It's always been one of those "we should support wake on WiFi" but
> never made it all the way to implementation.  In any case, if there
> are examples of people plumbing wakeup through the SD controller I'd
> need to figure out how to not break them.  Just doing a solution for
> dw_mmc means I don't have to worry about this since dw_mmc definitely
> doesn't support SDIO wakeup.
>
> Maybe one way to get a more generic solution is if you had an idea for
> a patch that would work for many host controllers then you could post
> it and I could test to confirm that it's happy on dw_mmc?  ...similar
> to when you switched dw_mmc away from the old kthread-based SDIO
> stuff?

Unless you have time to help dig into all the possibilities here to
help understand how this should behave across all the different host
controllers / wakeup setups, maybe we could just land ${SUBJECT} patch
for now and when there is more clarity we can revisit?

Thanks!

-Doug
Ulf Hansson June 4, 2019, 7:30 a.m. UTC | #9
On Mon, 3 Jun 2019 at 20:41, Doug Anderson <dianders@chromium.org> wrote:
>
> Ulf,
>
> On Tue, May 28, 2019 at 3:49 PM Doug Anderson <dianders@chromium.org> wrote:
> >
> > > 1) As kind of stated above, did you consider a solution where the core
> > > simply disables the SDIO IRQ in case it isn't enabled for system
> > > wakeup? In this way all host drivers would benefit.
> >
> > I can give it a shot if you can give me a bunch of specific advice,
> > but I only have access to a few devices doing anything with SDIO and
> > they are all using Marvell or Broadcom on dw_mmc.
> >
> > In general I have no idea how SDIO wakeup (plumbed through the SD
> > controller) would work.  As per below the only way I've seen it done
> > is totally out-of-band.  ...and actually, I'm not sure I've actually
> > ever seen even the out of band stuff truly work on a system myself.
> > It's always been one of those "we should support wake on WiFi" but
> > never made it all the way to implementation.  In any case, if there
> > are examples of people plumbing wakeup through the SD controller I'd
> > need to figure out how to not break them.  Just doing a solution for
> > dw_mmc means I don't have to worry about this since dw_mmc definitely
> > doesn't support SDIO wakeup.
> >
> > Maybe one way to get a more generic solution is if you had an idea for
> > a patch that would work for many host controllers then you could post
> > it and I could test to confirm that it's happy on dw_mmc?  ...similar
> > to when you switched dw_mmc away from the old kthread-based SDIO
> > stuff?

Let me have a look and see if I can post something for you to test.

>
> Unless you have time to help dig into all the possibilities here to
> help understand how this should behave across all the different host
> controllers / wakeup setups, maybe we could just land ${SUBJECT} patch
> for now and when there is more clarity we can revisit?

That's an option. I only fear that the revisit part never happens
(because of me personally being occupied with other things).

If I not able to come up with something within a week, then I will
queue up this as fix.

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 80dc2fd6576c..480067b87a94 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1664,7 +1664,8 @@  static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card)
 	}
 }
 
-static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
+static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb,
+				     bool client_requested)
 {
 	struct dw_mci *host = slot->host;
 	unsigned long irqflags;
@@ -1672,6 +1673,20 @@  static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
 
 	spin_lock_irqsave(&host->irq_lock, irqflags);
 
+	/*
+	 * If we're being called directly from dw_mci_enable_sdio_irq()
+	 * (which means that the client driver actually wants to enable or
+	 * disable interrupts) then save the request.  Otherwise this
+	 * wasn't directly requested by the client and we should logically
+	 * AND it with the client request since we want to disable if
+	 * _either_ the client disabled OR we have some other reason to
+	 * disable temporarily.
+	 */
+	if (client_requested)
+		host->client_sdio_enb = enb;
+	else
+		enb &= host->client_sdio_enb;
+
 	/* Enable/disable Slot Specific SDIO interrupt */
 	int_mask = mci_readl(host, INTMASK);
 	if (enb)
@@ -1688,7 +1703,7 @@  static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb)
 	struct dw_mci_slot *slot = mmc_priv(mmc);
 	struct dw_mci *host = slot->host;
 
-	__dw_mci_enable_sdio_irq(slot, enb);
+	__dw_mci_enable_sdio_irq(slot, enb, true);
 
 	/* Avoid runtime suspending the device when SDIO IRQ is enabled */
 	if (enb)
@@ -1701,7 +1716,7 @@  static void dw_mci_ack_sdio_irq(struct mmc_host *mmc)
 {
 	struct dw_mci_slot *slot = mmc_priv(mmc);
 
-	__dw_mci_enable_sdio_irq(slot, 1);
+	__dw_mci_enable_sdio_irq(slot, true, false);
 }
 
 static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode)
@@ -2734,7 +2749,7 @@  static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
 		if (pending & SDMMC_INT_SDIO(slot->sdio_id)) {
 			mci_writel(host, RINTSTS,
 				   SDMMC_INT_SDIO(slot->sdio_id));
-			__dw_mci_enable_sdio_irq(slot, 0);
+			__dw_mci_enable_sdio_irq(slot, false, false);
 			sdio_signal_irq(slot->mmc);
 		}
 
@@ -3424,6 +3439,8 @@  int dw_mci_runtime_suspend(struct device *dev)
 {
 	struct dw_mci *host = dev_get_drvdata(dev);
 
+	__dw_mci_enable_sdio_irq(host->slot, false, false);
+
 	if (host->use_dma && host->dma_ops->exit)
 		host->dma_ops->exit(host);
 
@@ -3490,6 +3507,8 @@  int dw_mci_runtime_resume(struct device *dev)
 	/* Now that slots are all setup, we can enable card detect */
 	dw_mci_enable_cd(host);
 
+	__dw_mci_enable_sdio_irq(host->slot, true, false);
+
 	return 0;
 
 err:
diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h
index 46e9f8ec5398..dfbace0f5043 100644
--- a/drivers/mmc/host/dw_mmc.h
+++ b/drivers/mmc/host/dw_mmc.h
@@ -127,6 +127,7 @@  struct dw_mci_dma_slave {
  * @cmd11_timer: Timer for SD3.0 voltage switch over scheme.
  * @cto_timer: Timer for broken command transfer over scheme.
  * @dto_timer: Timer for broken data transfer over scheme.
+ * @client_sdio_enb: The value last passed to enable_sdio_irq.
  *
  * Locking
  * =======
@@ -234,6 +235,8 @@  struct dw_mci {
 	struct timer_list       cmd11_timer;
 	struct timer_list       cto_timer;
 	struct timer_list       dto_timer;
+
+	bool			client_sdio_enb;
 };
 
 /* DMA ops for Internal/External DMAC interface */