Message ID | 20190429204040.18725-1-dianders@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mmc: dw_mmc: Disable SDIO interrupts while suspended to fix suspend/resume | expand |
On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote: > > Processing SDIO interrupts while dw_mmc is suspended (or partly > suspended) seems like a bad idea. We really don't want to be > processing them until we've gotten ourselves fully powered up. > > You might be wondering how it's even possible to become suspended when > an SDIO interrupt is active. As can be seen in > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > suspend when the SDIO interrupt is enabled. ...but even though we > stop normal runtime suspend transitions when SDIO interrupts are > enabled, the dw_mci_runtime_suspend() can still get called for a full > system suspend. > > Let's handle all this by explicitly masking SDIO interrupts in the > suspend call and unmasking them later in the resume call. To do this > cleanly I'll keep track of whether the client requested that SDIO > interrupts be enabled so that we can reliably restore them regardless > of whether we're masking them for one reason or another. > > It should be noted that if dw_mci_enable_sdio_irq() is never called > (for instance, if we don't have an SDIO card plugged in) that > "client_sdio_enb" will always be false. In those cases this patch > adds a tiny bit of overhead to suspend/resume (a spinlock and a > read/write of INTMASK) but other than that is a no-op. The > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > hurt. > > Without this fix it can be seen that rk3288-veyron Chromebooks with > Marvell WiFi would sometimes fail to resume WiFi even after picking my > recent mwifiex patch [1]. Specifically you'd see messages like this: > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > ...and tracing through the resume code in the failing cases showed > that we were processing a SDIO interrupt really early in the resume > call. > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > Defer SDIO interrupt handling until after resume") [2]. Presumably > this is the same problem that was solved by that patch. > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org > [2] https://crrev.com/c/230765 > > Cc: <stable@vger.kernel.org> # 4.14.x > Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> > --- > I didn't put any "Fixes" tag here, but presumably this could be > backported to whichever kernels folks found it useful for. I have at > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) > show the problem. It is very easy to pick this to v4.19 and it > definitely fixes the problem there. > > I haven't spent the time to pick this to 4.14 myself, but presumably > it wouldn't be too hard to backport this as far as v4.13 since that > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might > make sense for anyone experiencing this problem to just pick the old > CHROMIUM patch to fix them. > > Changes in v2: > - Suggested 4.14+ in the stable tag (Sasha-bot) > - Extra note that this is a noop on non-SDIO (Shawn / Emil) > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 > - Hopefully clear comments as per https://crrev.com/c/1586207/1 > > drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- > drivers/mmc/host/dw_mmc.h | 3 +++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c > index 80dc2fd6576c..480067b87a94 100644 > --- a/drivers/mmc/host/dw_mmc.c > +++ b/drivers/mmc/host/dw_mmc.c > @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card) > } > } > > -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) > +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb, > + bool client_requested) > { > struct dw_mci *host = slot->host; > unsigned long irqflags; > @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) > > spin_lock_irqsave(&host->irq_lock, irqflags); > > + /* > + * If we're being called directly from dw_mci_enable_sdio_irq() > + * (which means that the client driver actually wants to enable or > + * disable interrupts) then save the request. Otherwise this > + * wasn't directly requested by the client and we should logically > + * AND it with the client request since we want to disable if > + * _either_ the client disabled OR we have some other reason to > + * disable temporarily. > + */ > + if (client_requested) > + host->client_sdio_enb = enb; > + else > + enb &= host->client_sdio_enb; > + > /* Enable/disable Slot Specific SDIO interrupt */ > int_mask = mci_readl(host, INTMASK); > if (enb) > @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb) > struct dw_mci_slot *slot = mmc_priv(mmc); > struct dw_mci *host = slot->host; > > - __dw_mci_enable_sdio_irq(slot, enb); > + __dw_mci_enable_sdio_irq(slot, enb, true); > > /* Avoid runtime suspending the device when SDIO IRQ is enabled */ > if (enb) > @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc) > { > struct dw_mci_slot *slot = mmc_priv(mmc); > > - __dw_mci_enable_sdio_irq(slot, 1); > + __dw_mci_enable_sdio_irq(slot, true, false); > } > > static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode) > @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id) > if (pending & SDMMC_INT_SDIO(slot->sdio_id)) { > mci_writel(host, RINTSTS, > SDMMC_INT_SDIO(slot->sdio_id)); > - __dw_mci_enable_sdio_irq(slot, 0); > + __dw_mci_enable_sdio_irq(slot, false, false); > sdio_signal_irq(slot->mmc); > } > > @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev) > { > struct dw_mci *host = dev_get_drvdata(dev); > > + __dw_mci_enable_sdio_irq(host->slot, false, false); > + > if (host->use_dma && host->dma_ops->exit) > host->dma_ops->exit(host); > > @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev) > /* Now that slots are all setup, we can enable card detect */ > dw_mci_enable_cd(host); > > + __dw_mci_enable_sdio_irq(host->slot, true, false); > + > return 0; > > err: > diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h > index 46e9f8ec5398..dfbace0f5043 100644 > --- a/drivers/mmc/host/dw_mmc.h > +++ b/drivers/mmc/host/dw_mmc.h > @@ -127,6 +127,7 @@ struct dw_mci_dma_slave { > * @cmd11_timer: Timer for SD3.0 voltage switch over scheme. > * @cto_timer: Timer for broken command transfer over scheme. > * @dto_timer: Timer for broken data transfer over scheme. > + * @client_sdio_enb: The value last passed to enable_sdio_irq. > * > * Locking > * ======= > @@ -234,6 +235,8 @@ struct dw_mci { > struct timer_list cmd11_timer; > struct timer_list cto_timer; > struct timer_list dto_timer; > + > + bool client_sdio_enb; > }; > > /* DMA ops for Internal/External DMAC interface */ > -- > 2.21.0.593.g511ec345e18-goog >
Hi, On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote: > > Processing SDIO interrupts while dw_mmc is suspended (or partly > suspended) seems like a bad idea. We really don't want to be > processing them until we've gotten ourselves fully powered up. > > You might be wondering how it's even possible to become suspended when > an SDIO interrupt is active. As can be seen in > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > suspend when the SDIO interrupt is enabled. ...but even though we > stop normal runtime suspend transitions when SDIO interrupts are > enabled, the dw_mci_runtime_suspend() can still get called for a full > system suspend. > > Let's handle all this by explicitly masking SDIO interrupts in the > suspend call and unmasking them later in the resume call. To do this > cleanly I'll keep track of whether the client requested that SDIO > interrupts be enabled so that we can reliably restore them regardless > of whether we're masking them for one reason or another. > > It should be noted that if dw_mci_enable_sdio_irq() is never called > (for instance, if we don't have an SDIO card plugged in) that > "client_sdio_enb" will always be false. In those cases this patch > adds a tiny bit of overhead to suspend/resume (a spinlock and a > read/write of INTMASK) but other than that is a no-op. The > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > hurt. > > Without this fix it can be seen that rk3288-veyron Chromebooks with > Marvell WiFi would sometimes fail to resume WiFi even after picking my > recent mwifiex patch [1]. Specifically you'd see messages like this: > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > ...and tracing through the resume code in the failing cases showed > that we were processing a SDIO interrupt really early in the resume > call. > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > Defer SDIO interrupt handling until after resume") [2]. Presumably > this is the same problem that was solved by that patch. > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org > [2] https://crrev.com/c/230765 > > Cc: <stable@vger.kernel.org> # 4.14.x > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > I didn't put any "Fixes" tag here, but presumably this could be > backported to whichever kernels folks found it useful for. I have at > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) > show the problem. It is very easy to pick this to v4.19 and it > definitely fixes the problem there. > > I haven't spent the time to pick this to 4.14 myself, but presumably > it wouldn't be too hard to backport this as far as v4.13 since that > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might > make sense for anyone experiencing this problem to just pick the old > CHROMIUM patch to fix them. > > Changes in v2: > - Suggested 4.14+ in the stable tag (Sasha-bot) > - Extra note that this is a noop on non-SDIO (Shawn / Emil) > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 > - Hopefully clear comments as per https://crrev.com/c/1586207/1 > > drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- > drivers/mmc/host/dw_mmc.h | 3 +++ > 2 files changed, 26 insertions(+), 4 deletions(-) Ulf: are you the right person to land this? With 5.2-rc1 out it might be a good time for it? To refresh your memory about this patch: * Patch v1 was posted back on April 10th [1] so we're at about 1.5 months of time for people to comment about it now. Should be more than enough. * Shawn Lin saw it and didn't hate it. He had some confusion about how it worked and I've hopefully alleviated via extra comments / text. * Emil Renner Berthing thought it caused a regression for him but then tested further and was convinced that it didn't. This is extra confirmation that someone other than me did try the patch and found it to not break things. ;-) * It has been reviewed by Guenter Roeck (in v2) [1] https://lkml.kernel.org/r/20190410221237.160856-1-dianders@chromium.org -Doug
On 2019/4/30 4:40, Douglas Anderson wrote: > Processing SDIO interrupts while dw_mmc is suspended (or partly > suspended) seems like a bad idea. We really don't want to be > processing them until we've gotten ourselves fully powered up. > > You might be wondering how it's even possible to become suspended when > an SDIO interrupt is active. As can be seen in > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > suspend when the SDIO interrupt is enabled. ...but even though we > stop normal runtime suspend transitions when SDIO interrupts are > enabled, the dw_mci_runtime_suspend() can still get called for a full > system suspend. > > Let's handle all this by explicitly masking SDIO interrupts in the > suspend call and unmasking them later in the resume call. To do this > cleanly I'll keep track of whether the client requested that SDIO > interrupts be enabled so that we can reliably restore them regardless > of whether we're masking them for one reason or another. > > It should be noted that if dw_mci_enable_sdio_irq() is never called > (for instance, if we don't have an SDIO card plugged in) that > "client_sdio_enb" will always be false. In those cases this patch > adds a tiny bit of overhead to suspend/resume (a spinlock and a > read/write of INTMASK) but other than that is a no-op. The > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > hurt. > > Without this fix it can be seen that rk3288-veyron Chromebooks with > Marvell WiFi would sometimes fail to resume WiFi even after picking my > recent mwifiex patch [1]. Specifically you'd see messages like this: > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > ...and tracing through the resume code in the failing cases showed > that we were processing a SDIO interrupt really early in the resume > call. > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > Defer SDIO interrupt handling until after resume") [2]. Presumably > this is the same problem that was solved by that patch. > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org > [2] https://crrev.com/c/230765 > Sorry for late, but FWIW: Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> > Cc: <stable@vger.kernel.org> # 4.14.x > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > I didn't put any "Fixes" tag here, but presumably this could be > backported to whichever kernels folks found it useful for. I have at > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) > show the problem. It is very easy to pick this to v4.19 and it > definitely fixes the problem there. > > I haven't spent the time to pick this to 4.14 myself, but presumably > it wouldn't be too hard to backport this as far as v4.13 since that > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might > make sense for anyone experiencing this problem to just pick the old > CHROMIUM patch to fix them. > > Changes in v2: > - Suggested 4.14+ in the stable tag (Sasha-bot) > - Extra note that this is a noop on non-SDIO (Shawn / Emil) > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 > - Hopefully clear comments as per https://crrev.com/c/1586207/1 > > drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- > drivers/mmc/host/dw_mmc.h | 3 +++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c > index 80dc2fd6576c..480067b87a94 100644 > --- a/drivers/mmc/host/dw_mmc.c > +++ b/drivers/mmc/host/dw_mmc.c > @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card) > } > } > > -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) > +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb, > + bool client_requested) > { > struct dw_mci *host = slot->host; > unsigned long irqflags; > @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) > > spin_lock_irqsave(&host->irq_lock, irqflags); > > + /* > + * If we're being called directly from dw_mci_enable_sdio_irq() > + * (which means that the client driver actually wants to enable or > + * disable interrupts) then save the request. Otherwise this > + * wasn't directly requested by the client and we should logically > + * AND it with the client request since we want to disable if > + * _either_ the client disabled OR we have some other reason to > + * disable temporarily. > + */ > + if (client_requested) > + host->client_sdio_enb = enb; > + else > + enb &= host->client_sdio_enb; > + > /* Enable/disable Slot Specific SDIO interrupt */ > int_mask = mci_readl(host, INTMASK); > if (enb) > @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb) > struct dw_mci_slot *slot = mmc_priv(mmc); > struct dw_mci *host = slot->host; > > - __dw_mci_enable_sdio_irq(slot, enb); > + __dw_mci_enable_sdio_irq(slot, enb, true); > > /* Avoid runtime suspending the device when SDIO IRQ is enabled */ > if (enb) > @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc) > { > struct dw_mci_slot *slot = mmc_priv(mmc); > > - __dw_mci_enable_sdio_irq(slot, 1); > + __dw_mci_enable_sdio_irq(slot, true, false); > } > > static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode) > @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id) > if (pending & SDMMC_INT_SDIO(slot->sdio_id)) { > mci_writel(host, RINTSTS, > SDMMC_INT_SDIO(slot->sdio_id)); > - __dw_mci_enable_sdio_irq(slot, 0); > + __dw_mci_enable_sdio_irq(slot, false, false); > sdio_signal_irq(slot->mmc); > } > > @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev) > { > struct dw_mci *host = dev_get_drvdata(dev); > > + __dw_mci_enable_sdio_irq(host->slot, false, false); > + > if (host->use_dma && host->dma_ops->exit) > host->dma_ops->exit(host); > > @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev) > /* Now that slots are all setup, we can enable card detect */ > dw_mci_enable_cd(host); > > + __dw_mci_enable_sdio_irq(host->slot, true, false); > + > return 0; > > err: > diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h > index 46e9f8ec5398..dfbace0f5043 100644 > --- a/drivers/mmc/host/dw_mmc.h > +++ b/drivers/mmc/host/dw_mmc.h > @@ -127,6 +127,7 @@ struct dw_mci_dma_slave { > * @cmd11_timer: Timer for SD3.0 voltage switch over scheme. > * @cto_timer: Timer for broken command transfer over scheme. > * @dto_timer: Timer for broken data transfer over scheme. > + * @client_sdio_enb: The value last passed to enable_sdio_irq. > * > * Locking > * ======= > @@ -234,6 +235,8 @@ struct dw_mci { > struct timer_list cmd11_timer; > struct timer_list cto_timer; > struct timer_list dto_timer; > + > + bool client_sdio_enb; > }; > > /* DMA ops for Internal/External DMAC interface */ >
On Mon, 20 May 2019 at 20:41, Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > Processing SDIO interrupts while dw_mmc is suspended (or partly > > suspended) seems like a bad idea. We really don't want to be > > processing them until we've gotten ourselves fully powered up. > > > > You might be wondering how it's even possible to become suspended when > > an SDIO interrupt is active. As can be seen in > > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > > suspend when the SDIO interrupt is enabled. ...but even though we > > stop normal runtime suspend transitions when SDIO interrupts are > > enabled, the dw_mci_runtime_suspend() can still get called for a full > > system suspend. > > > > Let's handle all this by explicitly masking SDIO interrupts in the > > suspend call and unmasking them later in the resume call. To do this > > cleanly I'll keep track of whether the client requested that SDIO > > interrupts be enabled so that we can reliably restore them regardless > > of whether we're masking them for one reason or another. > > > > It should be noted that if dw_mci_enable_sdio_irq() is never called > > (for instance, if we don't have an SDIO card plugged in) that > > "client_sdio_enb" will always be false. In those cases this patch > > adds a tiny bit of overhead to suspend/resume (a spinlock and a > > read/write of INTMASK) but other than that is a no-op. The > > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > > hurt. > > > > Without this fix it can be seen that rk3288-veyron Chromebooks with > > Marvell WiFi would sometimes fail to resume WiFi even after picking my > > recent mwifiex patch [1]. Specifically you'd see messages like this: > > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > > > ...and tracing through the resume code in the failing cases showed > > that we were processing a SDIO interrupt really early in the resume > > call. > > > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > > Defer SDIO interrupt handling until after resume") [2]. Presumably > > this is the same problem that was solved by that patch. > > > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org > > [2] https://crrev.com/c/230765 > > > > Cc: <stable@vger.kernel.org> # 4.14.x > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > I didn't put any "Fixes" tag here, but presumably this could be > > backported to whichever kernels folks found it useful for. I have at > > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) > > show the problem. It is very easy to pick this to v4.19 and it > > definitely fixes the problem there. > > > > I haven't spent the time to pick this to 4.14 myself, but presumably > > it wouldn't be too hard to backport this as far as v4.13 since that > > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use > > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might > > make sense for anyone experiencing this problem to just pick the old > > CHROMIUM patch to fix them. > > > > Changes in v2: > > - Suggested 4.14+ in the stable tag (Sasha-bot) > > - Extra note that this is a noop on non-SDIO (Shawn / Emil) > > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 > > - Hopefully clear comments as per https://crrev.com/c/1586207/1 > > > > drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- > > drivers/mmc/host/dw_mmc.h | 3 +++ > > 2 files changed, 26 insertions(+), 4 deletions(-) > > Ulf: are you the right person to land this? With 5.2-rc1 out it might > be a good time for it? To refresh your memory about this patch: > > * Patch v1 was posted back on April 10th [1] so we're at about 1.5 > months of time for people to comment about it now. Should be more > than enough. Apologize for the delay, not sure why this has slipped through my filters. Anyway, let me have a look at it now. > > * Shawn Lin saw it and didn't hate it. He had some confusion about > how it worked and I've hopefully alleviated via extra comments / text. > > * Emil Renner Berthing thought it caused a regression for him but then > tested further and was convinced that it didn't. This is extra > confirmation that someone other than me did try the patch and found it > to not break things. ;-) > > * It has been reviewed by Guenter Roeck (in v2) One question, I am guessing you are considering https://lkml.org/lkml/2019/5/17/761 as the long term solution, and thus $subject patch should go as fix+stable? No? > > [1] https://lkml.kernel.org/r/20190410221237.160856-1-dianders@chromium.org > > > -Doug Kind regards Uffe
Hi, On Tue, May 28, 2019 at 6:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Mon, 20 May 2019 at 20:41, Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Mon, Apr 29, 2019 at 1:41 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > Processing SDIO interrupts while dw_mmc is suspended (or partly > > > suspended) seems like a bad idea. We really don't want to be > > > processing them until we've gotten ourselves fully powered up. > > > > > > You might be wondering how it's even possible to become suspended when > > > an SDIO interrupt is active. As can be seen in > > > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > > > suspend when the SDIO interrupt is enabled. ...but even though we > > > stop normal runtime suspend transitions when SDIO interrupts are > > > enabled, the dw_mci_runtime_suspend() can still get called for a full > > > system suspend. > > > > > > Let's handle all this by explicitly masking SDIO interrupts in the > > > suspend call and unmasking them later in the resume call. To do this > > > cleanly I'll keep track of whether the client requested that SDIO > > > interrupts be enabled so that we can reliably restore them regardless > > > of whether we're masking them for one reason or another. > > > > > > It should be noted that if dw_mci_enable_sdio_irq() is never called > > > (for instance, if we don't have an SDIO card plugged in) that > > > "client_sdio_enb" will always be false. In those cases this patch > > > adds a tiny bit of overhead to suspend/resume (a spinlock and a > > > read/write of INTMASK) but other than that is a no-op. The > > > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > > > hurt. > > > > > > Without this fix it can be seen that rk3288-veyron Chromebooks with > > > Marvell WiFi would sometimes fail to resume WiFi even after picking my > > > recent mwifiex patch [1]. Specifically you'd see messages like this: > > > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > > > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > > > > > ...and tracing through the resume code in the failing cases showed > > > that we were processing a SDIO interrupt really early in the resume > > > call. > > > > > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > > > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > > > Defer SDIO interrupt handling until after resume") [2]. Presumably > > > this is the same problem that was solved by that patch. > > > > > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org > > > [2] https://crrev.com/c/230765 > > > > > > Cc: <stable@vger.kernel.org> # 4.14.x > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > I didn't put any "Fixes" tag here, but presumably this could be > > > backported to whichever kernels folks found it useful for. I have at > > > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) > > > show the problem. It is very easy to pick this to v4.19 and it > > > definitely fixes the problem there. > > > > > > I haven't spent the time to pick this to 4.14 myself, but presumably > > > it wouldn't be too hard to backport this as far as v4.13 since that > > > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use > > > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might > > > make sense for anyone experiencing this problem to just pick the old > > > CHROMIUM patch to fix them. > > > > > > Changes in v2: > > > - Suggested 4.14+ in the stable tag (Sasha-bot) > > > - Extra note that this is a noop on non-SDIO (Shawn / Emil) > > > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 > > > - Hopefully clear comments as per https://crrev.com/c/1586207/1 > > > > > > drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- > > > drivers/mmc/host/dw_mmc.h | 3 +++ > > > 2 files changed, 26 insertions(+), 4 deletions(-) > > > > Ulf: are you the right person to land this? With 5.2-rc1 out it might > > be a good time for it? To refresh your memory about this patch: > > > > * Patch v1 was posted back on April 10th [1] so we're at about 1.5 > > months of time for people to comment about it now. Should be more > > than enough. > > Apologize for the delay, not sure why this has slipped through my > filters. Anyway, let me have a look at it now. No worries. If there's something better I can do in the future to avoid problems, please let me know. > > * Shawn Lin saw it and didn't hate it. He had some confusion about > > how it worked and I've hopefully alleviated via extra comments / text. > > > > * Emil Renner Berthing thought it caused a regression for him but then > > tested further and was convinced that it didn't. This is extra > > confirmation that someone other than me did try the patch and found it > > to not break things. ;-) > > > > * It has been reviewed by Guenter Roeck (in v2) > > One question, I am guessing you are considering > https://lkml.org/lkml/2019/5/17/761 as the long term solution, and > thus $subject patch should go as fix+stable? No? No, the two problems are completely separate. ${SUBJECT} patch deals with full system suspend/resume. I originally reproduced the problems on a device with Marvell WiFi, though it could possibly affect Broadcom parts on dw_mmc too. ${SUBJECT} patch is ready to land on mainline Linux at your leisure. Also: in case you didn't notice (since it didn't thread properly for me), Shawn Lin gave it a reviewed by [1]. The patch you refer to [2] is related to something akin to an idle state (more like Runtime PM) and only affects Broadcom parts. Also the Broadcom issue ought to happen across all host controllers (AKA not just dw_mmc). [1] https://lkml.kernel.org/r/982ffba1-c599-e73d-e5e0-b1be5668851c@rock-chips.com [2] https://lkml.org/lkml/2019/5/17/761 -Doug
On Mon, 29 Apr 2019 at 22:41, Douglas Anderson <dianders@chromium.org> wrote: > > Processing SDIO interrupts while dw_mmc is suspended (or partly > suspended) seems like a bad idea. We really don't want to be > processing them until we've gotten ourselves fully powered up. I fully agree. Although, this is important not only from the host driver/controller perspective, but also from the SDIO card (managed by mmc core) point of view. In $subject patch you mange the driver/controller issue, but only for one specific host driver (dw_mmc). I am thinking that this problem may be a rather common problem, so perhaps we should try to address this from the core in a way that it affects all host drivers. Did you consider that? The other problem I refer to, is in principle a way to prevent sdio_run_irqs() from being executed before the SDIO card has been resumed, via mmc_sdio_resume(). It's a separate problem, but certainly related. This may need some more thinking to address properly, let's just keep this in mind and discuss this in a separate thread. > > You might be wondering how it's even possible to become suspended when > an SDIO interrupt is active. As can be seen in > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > suspend when the SDIO interrupt is enabled. ...but even though we > stop normal runtime suspend transitions when SDIO interrupts are > enabled, the dw_mci_runtime_suspend() can still get called for a full > system suspend. > > Let's handle all this by explicitly masking SDIO interrupts in the > suspend call and unmasking them later in the resume call. To do this > cleanly I'll keep track of whether the client requested that SDIO > interrupts be enabled so that we can reliably restore them regardless > of whether we're masking them for one reason or another. > > It should be noted that if dw_mci_enable_sdio_irq() is never called > (for instance, if we don't have an SDIO card plugged in) that > "client_sdio_enb" will always be false. In those cases this patch > adds a tiny bit of overhead to suspend/resume (a spinlock and a > read/write of INTMASK) but other than that is a no-op. The > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > hurt. Thanks for the detailed problem description. In general your approach sounds okay to me, but I have a few questions. 1) As kind of stated above, did you consider a solution where the core simply disables the SDIO IRQ in case it isn't enabled for system wakeup? In this way all host drivers would benefit. 2) dw_mmc isn't calling device_init_wakeup() during ->probe(), hence I assume it doesn't support the SDIO IRQ being configured as system wakeup. Correct? I understand this is platform specific, but still it would be good to know your view. 3) Because of 2) The below code in dw_mci_runtime_suspend(), puzzles me: "if (host->slot->mmc->pm_flags & MMC_PM_KEEP_POWER)" dw_mci_set_ios(host->slot->mmc, &host->slot->mmc->ios); Why is 3) needed at all in case system wakeup isn't supported? A note; the current support in the mmc core for the SDIO IRQ being used as system wakeup, really needs some re-work. For example, we should convert to use common wakeup interfaces, as to allow the PM core to behave correctly during system suspend/resume. These are changes that have been scheduled on my TODO list since long time ago, I hope I can get some time to look into them soon. > > Without this fix it can be seen that rk3288-veyron Chromebooks with > Marvell WiFi would sometimes fail to resume WiFi even after picking my > recent mwifiex patch [1]. Specifically you'd see messages like this: > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > ...and tracing through the resume code in the failing cases showed > that we were processing a SDIO interrupt really early in the resume > call. > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > Defer SDIO interrupt handling until after resume") [2]. Presumably > this is the same problem that was solved by that patch. Seems reasonable. > > [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org > [2] https://crrev.com/c/230765 > > Cc: <stable@vger.kernel.org> # 4.14.x > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > I didn't put any "Fixes" tag here, but presumably this could be > backported to whichever kernels folks found it useful for. I have at > least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) > show the problem. It is very easy to pick this to v4.19 and it > definitely fixes the problem there. > > I haven't spent the time to pick this to 4.14 myself, but presumably > it wouldn't be too hard to backport this as far as v4.13 since that > contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use > MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might > make sense for anyone experiencing this problem to just pick the old > CHROMIUM patch to fix them. > > Changes in v2: > - Suggested 4.14+ in the stable tag (Sasha-bot) > - Extra note that this is a noop on non-SDIO (Shawn / Emil) > - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 > - Hopefully clear comments as per https://crrev.com/c/1586207/1 > > drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- > drivers/mmc/host/dw_mmc.h | 3 +++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c > index 80dc2fd6576c..480067b87a94 100644 > --- a/drivers/mmc/host/dw_mmc.c > +++ b/drivers/mmc/host/dw_mmc.c > @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card) > } > } > > -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) > +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb, > + bool client_requested) > { > struct dw_mci *host = slot->host; > unsigned long irqflags; > @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) > > spin_lock_irqsave(&host->irq_lock, irqflags); > > + /* > + * If we're being called directly from dw_mci_enable_sdio_irq() > + * (which means that the client driver actually wants to enable or > + * disable interrupts) then save the request. Otherwise this > + * wasn't directly requested by the client and we should logically > + * AND it with the client request since we want to disable if > + * _either_ the client disabled OR we have some other reason to > + * disable temporarily. > + */ > + if (client_requested) > + host->client_sdio_enb = enb; > + else > + enb &= host->client_sdio_enb; > + > /* Enable/disable Slot Specific SDIO interrupt */ > int_mask = mci_readl(host, INTMASK); > if (enb) > @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb) > struct dw_mci_slot *slot = mmc_priv(mmc); > struct dw_mci *host = slot->host; > > - __dw_mci_enable_sdio_irq(slot, enb); > + __dw_mci_enable_sdio_irq(slot, enb, true); > > /* Avoid runtime suspending the device when SDIO IRQ is enabled */ > if (enb) > @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc) > { > struct dw_mci_slot *slot = mmc_priv(mmc); > > - __dw_mci_enable_sdio_irq(slot, 1); > + __dw_mci_enable_sdio_irq(slot, true, false); > } > > static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode) > @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id) > if (pending & SDMMC_INT_SDIO(slot->sdio_id)) { > mci_writel(host, RINTSTS, > SDMMC_INT_SDIO(slot->sdio_id)); > - __dw_mci_enable_sdio_irq(slot, 0); > + __dw_mci_enable_sdio_irq(slot, false, false); > sdio_signal_irq(slot->mmc); > } > > @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev) > { > struct dw_mci *host = dev_get_drvdata(dev); > > + __dw_mci_enable_sdio_irq(host->slot, false, false); > + > if (host->use_dma && host->dma_ops->exit) > host->dma_ops->exit(host); > > @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev) > /* Now that slots are all setup, we can enable card detect */ > dw_mci_enable_cd(host); > > + __dw_mci_enable_sdio_irq(host->slot, true, false); > + > return 0; > > err: > diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h > index 46e9f8ec5398..dfbace0f5043 100644 > --- a/drivers/mmc/host/dw_mmc.h > +++ b/drivers/mmc/host/dw_mmc.h > @@ -127,6 +127,7 @@ struct dw_mci_dma_slave { > * @cmd11_timer: Timer for SD3.0 voltage switch over scheme. > * @cto_timer: Timer for broken command transfer over scheme. > * @dto_timer: Timer for broken data transfer over scheme. > + * @client_sdio_enb: The value last passed to enable_sdio_irq. > * > * Locking > * ======= > @@ -234,6 +235,8 @@ struct dw_mci { > struct timer_list cmd11_timer; > struct timer_list cto_timer; > struct timer_list dto_timer; > + > + bool client_sdio_enb; > }; > > /* DMA ops for Internal/External DMAC interface */ > -- > 2.21.0.593.g511ec345e18-goog > Kind regards Uffe
Hi, On Tue, May 28, 2019 at 12:22 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Mon, 29 Apr 2019 at 22:41, Douglas Anderson <dianders@chromium.org> wrote: > > > > Processing SDIO interrupts while dw_mmc is suspended (or partly > > suspended) seems like a bad idea. We really don't want to be > > processing them until we've gotten ourselves fully powered up. > > I fully agree. > > Although, this is important not only from the host driver/controller > perspective, but also from the SDIO card (managed by mmc core) point > of view. > > In $subject patch you mange the driver/controller issue, but only for > one specific host driver (dw_mmc). I am thinking that this problem may > be a rather common problem, so perhaps we should try to address this > from the core in a way that it affects all host drivers. Did you > consider that? I did wonder that. See below, but in general I don't have massive experience with all the host controllers out there. Looking at sdhci code, though, it might not have the same problems? At least in some cases it fully turns off the interrupts. In other cases it seems like the controller itself keeps power and so maybe getting the SDIO interrupts early is OK? > The other problem I refer to, is in principle a way to prevent > sdio_run_irqs() from being executed before the SDIO card has been > resumed, via mmc_sdio_resume(). It's a separate problem, but certainly > related. This may need some more thinking to address properly, let's > just keep this in mind and discuss this in a separate thread. Actually, I think if we could figure out how to do this well it might solve my particular problem. Specifically I don't believe that running dw_mmc's interrupt handler itself is causing the problem (though it does seem pretty odd to run it while we're in the middle of initting the host), I think it's the SDIO driver calling back into us that's causing the problems. > > You might be wondering how it's even possible to become suspended when > > an SDIO interrupt is active. As can be seen in > > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime > > suspend when the SDIO interrupt is enabled. ...but even though we > > stop normal runtime suspend transitions when SDIO interrupts are > > enabled, the dw_mci_runtime_suspend() can still get called for a full > > system suspend. > > > > Let's handle all this by explicitly masking SDIO interrupts in the > > suspend call and unmasking them later in the resume call. To do this > > cleanly I'll keep track of whether the client requested that SDIO > > interrupts be enabled so that we can reliably restore them regardless > > of whether we're masking them for one reason or another. > > > > It should be noted that if dw_mci_enable_sdio_irq() is never called > > (for instance, if we don't have an SDIO card plugged in) that > > "client_sdio_enb" will always be false. In those cases this patch > > adds a tiny bit of overhead to suspend/resume (a spinlock and a > > read/write of INTMASK) but other than that is a no-op. The > > SDMMC_INT_SDIO bit should always be clear and clearing it again won't > > hurt. > > Thanks for the detailed problem description. In general your approach > sounds okay to me, but I have a few questions. > > 1) As kind of stated above, did you consider a solution where the core > simply disables the SDIO IRQ in case it isn't enabled for system > wakeup? In this way all host drivers would benefit. I can give it a shot if you can give me a bunch of specific advice, but I only have access to a few devices doing anything with SDIO and they are all using Marvell or Broadcom on dw_mmc. In general I have no idea how SDIO wakeup (plumbed through the SD controller) would work. As per below the only way I've seen it done is totally out-of-band. ...and actually, I'm not sure I've actually ever seen even the out of band stuff truly work on a system myself. It's always been one of those "we should support wake on WiFi" but never made it all the way to implementation. In any case, if there are examples of people plumbing wakeup through the SD controller I'd need to figure out how to not break them. Just doing a solution for dw_mmc means I don't have to worry about this since dw_mmc definitely doesn't support SDIO wakeup. Maybe one way to get a more generic solution is if you had an idea for a patch that would work for many host controllers then you could post it and I could test to confirm that it's happy on dw_mmc? ...similar to when you switched dw_mmc away from the old kthread-based SDIO stuff? > 2) dw_mmc isn't calling device_init_wakeup() during ->probe(), hence I > assume it doesn't support the SDIO IRQ being configured as system > wakeup. Correct? I understand this is platform specific, but still it > would be good to know your view. Right, currently dw_mmc doesn't support the SDIO IRQ being configured as a system wakeup. I'm kinda curious how this works in general. Don't you need a clock running to get an SDIO interrupt? How does that work for suspend? ...I do know that I've seen some dw_mmc host controllers have an "SDIO IRQ" line that could be pinned out and that line would also assert the same SDIO interrupt, but the mainline driver has never supported it. Whenever I asked Marvell about it in the past they were always confused about what to do with that line and (if I remember correctly) we never hooked it up. I always though it would be super interesting because it seemed like it would let us disable the card clock when the slot was idle. ...as far as I was ever able to discern the pin was officially "non-standard". As far as I know SDIO cards that want to be able to wakeup the device end up using some sort of out of band mechanism. For instance "marvell,wakeup-pin" or Broadcom's "host-wake" interrupt. As per above, I don't have tons of experience here. I see that "rk3399-gru-kevin" has "marvell,wakeup-pin" defined, but that's PCIe not SDIO. Downstream in our Chrome OS kernel it seems like "mt8173-oak.dtsi" and "mt8176-rowan.dtsi" have this for SDIO but they are are devices I didn't work on and don't have much familiarity with. > 3) Because of 2) The below code in dw_mci_runtime_suspend(), puzzles me: > "if (host->slot->mmc->pm_flags & MMC_PM_KEEP_POWER)" > dw_mci_set_ios(host->slot->mmc, &host->slot->mmc->ios); > > Why is 3) needed at all in case system wakeup isn't supported? That code has been in there for a really long time, dating back to commit ab269128a2cf ("mmc: dw_mmc: Add sdio power bindings"). ...but, in general, I know that we always keep power to the SDIO card in suspend time. This doesn't take a whole lot of power and speeds up WiFi acquisition after resume by a whole lot (because otherwise we'd have to fully re-load the WiFi firmware). In fact, I believe that the Marvell WiFi driver requires it. Ah yes, search for "MMC_PM_KEEP_POWER" in "marvell/mwifiex/sdio.c" and you'll see that it gets yells if power isn't kept. If we are keeping power during suspend/resume then presumably the card will expect that communication resumes as normal upon resume. AKA: the clock should come up at the expected rate / voltage level and we should resume as normal. > A note; the current support in the mmc core for the SDIO IRQ being > used as system wakeup, really needs some re-work. For example, we > should convert to use common wakeup interfaces, as to allow the PM > core to behave correctly during system suspend/resume. These are > changes that have been scheduled on my TODO list since long time ago, > I hope I can get some time to look into them soon. Personally my knowledge of SD / SDIO is mostly acquired by trying to make the dw_mmc driver do what we've needed it to over the years, so I don't actually have a ton of broad understanding of the SDIO spec and what is generally done for host controllers / SDIO cards. If you're aware of standard ways to get an SDIO IRQ to work in suspend time then that'd be interesting. > > Without this fix it can be seen that rk3288-veyron Chromebooks with > > Marvell WiFi would sometimes fail to resume WiFi even after picking my > > recent mwifiex patch [1]. Specifically you'd see messages like this: > > mwifiex_sdio mmc1:0001:1: Firmware wakeup failed > > mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state > > > > ...and tracing through the resume code in the failing cases showed > > that we were processing a SDIO interrupt really early in the resume > > call. > > > > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which > > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: > > Defer SDIO interrupt handling until after resume") [2]. Presumably > > this is the same problem that was solved by that patch. > > Seems reasonable. Anyway, let me know what you think the next set of steps ought to be. It would be sorta nice to get suspend/resume working reliably and land this patch, but if you think that will impede our ability to come up with a more generic solution then I guess we don't need to land it... -Doug
Ulf, On Tue, May 28, 2019 at 3:49 PM Doug Anderson <dianders@chromium.org> wrote: > > > 1) As kind of stated above, did you consider a solution where the core > > simply disables the SDIO IRQ in case it isn't enabled for system > > wakeup? In this way all host drivers would benefit. > > I can give it a shot if you can give me a bunch of specific advice, > but I only have access to a few devices doing anything with SDIO and > they are all using Marvell or Broadcom on dw_mmc. > > In general I have no idea how SDIO wakeup (plumbed through the SD > controller) would work. As per below the only way I've seen it done > is totally out-of-band. ...and actually, I'm not sure I've actually > ever seen even the out of band stuff truly work on a system myself. > It's always been one of those "we should support wake on WiFi" but > never made it all the way to implementation. In any case, if there > are examples of people plumbing wakeup through the SD controller I'd > need to figure out how to not break them. Just doing a solution for > dw_mmc means I don't have to worry about this since dw_mmc definitely > doesn't support SDIO wakeup. > > Maybe one way to get a more generic solution is if you had an idea for > a patch that would work for many host controllers then you could post > it and I could test to confirm that it's happy on dw_mmc? ...similar > to when you switched dw_mmc away from the old kthread-based SDIO > stuff? Unless you have time to help dig into all the possibilities here to help understand how this should behave across all the different host controllers / wakeup setups, maybe we could just land ${SUBJECT} patch for now and when there is more clarity we can revisit? Thanks! -Doug
On Mon, 3 Jun 2019 at 20:41, Doug Anderson <dianders@chromium.org> wrote: > > Ulf, > > On Tue, May 28, 2019 at 3:49 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > 1) As kind of stated above, did you consider a solution where the core > > > simply disables the SDIO IRQ in case it isn't enabled for system > > > wakeup? In this way all host drivers would benefit. > > > > I can give it a shot if you can give me a bunch of specific advice, > > but I only have access to a few devices doing anything with SDIO and > > they are all using Marvell or Broadcom on dw_mmc. > > > > In general I have no idea how SDIO wakeup (plumbed through the SD > > controller) would work. As per below the only way I've seen it done > > is totally out-of-band. ...and actually, I'm not sure I've actually > > ever seen even the out of band stuff truly work on a system myself. > > It's always been one of those "we should support wake on WiFi" but > > never made it all the way to implementation. In any case, if there > > are examples of people plumbing wakeup through the SD controller I'd > > need to figure out how to not break them. Just doing a solution for > > dw_mmc means I don't have to worry about this since dw_mmc definitely > > doesn't support SDIO wakeup. > > > > Maybe one way to get a more generic solution is if you had an idea for > > a patch that would work for many host controllers then you could post > > it and I could test to confirm that it's happy on dw_mmc? ...similar > > to when you switched dw_mmc away from the old kthread-based SDIO > > stuff? Let me have a look and see if I can post something for you to test. > > Unless you have time to help dig into all the possibilities here to > help understand how this should behave across all the different host > controllers / wakeup setups, maybe we could just land ${SUBJECT} patch > for now and when there is more clarity we can revisit? That's an option. I only fear that the revisit part never happens (because of me personally being occupied with other things). If I not able to come up with something within a week, then I will queue up this as fix. Kind regards Uffe
diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 80dc2fd6576c..480067b87a94 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card) } } -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb, + bool client_requested) { struct dw_mci *host = slot->host; unsigned long irqflags; @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb) spin_lock_irqsave(&host->irq_lock, irqflags); + /* + * If we're being called directly from dw_mci_enable_sdio_irq() + * (which means that the client driver actually wants to enable or + * disable interrupts) then save the request. Otherwise this + * wasn't directly requested by the client and we should logically + * AND it with the client request since we want to disable if + * _either_ the client disabled OR we have some other reason to + * disable temporarily. + */ + if (client_requested) + host->client_sdio_enb = enb; + else + enb &= host->client_sdio_enb; + /* Enable/disable Slot Specific SDIO interrupt */ int_mask = mci_readl(host, INTMASK); if (enb) @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb) struct dw_mci_slot *slot = mmc_priv(mmc); struct dw_mci *host = slot->host; - __dw_mci_enable_sdio_irq(slot, enb); + __dw_mci_enable_sdio_irq(slot, enb, true); /* Avoid runtime suspending the device when SDIO IRQ is enabled */ if (enb) @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc) { struct dw_mci_slot *slot = mmc_priv(mmc); - __dw_mci_enable_sdio_irq(slot, 1); + __dw_mci_enable_sdio_irq(slot, true, false); } static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode) @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id) if (pending & SDMMC_INT_SDIO(slot->sdio_id)) { mci_writel(host, RINTSTS, SDMMC_INT_SDIO(slot->sdio_id)); - __dw_mci_enable_sdio_irq(slot, 0); + __dw_mci_enable_sdio_irq(slot, false, false); sdio_signal_irq(slot->mmc); } @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev) { struct dw_mci *host = dev_get_drvdata(dev); + __dw_mci_enable_sdio_irq(host->slot, false, false); + if (host->use_dma && host->dma_ops->exit) host->dma_ops->exit(host); @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev) /* Now that slots are all setup, we can enable card detect */ dw_mci_enable_cd(host); + __dw_mci_enable_sdio_irq(host->slot, true, false); + return 0; err: diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h index 46e9f8ec5398..dfbace0f5043 100644 --- a/drivers/mmc/host/dw_mmc.h +++ b/drivers/mmc/host/dw_mmc.h @@ -127,6 +127,7 @@ struct dw_mci_dma_slave { * @cmd11_timer: Timer for SD3.0 voltage switch over scheme. * @cto_timer: Timer for broken command transfer over scheme. * @dto_timer: Timer for broken data transfer over scheme. + * @client_sdio_enb: The value last passed to enable_sdio_irq. * * Locking * ======= @@ -234,6 +235,8 @@ struct dw_mci { struct timer_list cmd11_timer; struct timer_list cto_timer; struct timer_list dto_timer; + + bool client_sdio_enb; }; /* DMA ops for Internal/External DMAC interface */
Processing SDIO interrupts while dw_mmc is suspended (or partly suspended) seems like a bad idea. We really don't want to be processing them until we've gotten ourselves fully powered up. You might be wondering how it's even possible to become suspended when an SDIO interrupt is active. As can be seen in dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime suspend when the SDIO interrupt is enabled. ...but even though we stop normal runtime suspend transitions when SDIO interrupts are enabled, the dw_mci_runtime_suspend() can still get called for a full system suspend. Let's handle all this by explicitly masking SDIO interrupts in the suspend call and unmasking them later in the resume call. To do this cleanly I'll keep track of whether the client requested that SDIO interrupts be enabled so that we can reliably restore them regardless of whether we're masking them for one reason or another. It should be noted that if dw_mci_enable_sdio_irq() is never called (for instance, if we don't have an SDIO card plugged in) that "client_sdio_enb" will always be false. In those cases this patch adds a tiny bit of overhead to suspend/resume (a spinlock and a read/write of INTMASK) but other than that is a no-op. The SDMMC_INT_SDIO bit should always be clear and clearing it again won't hurt. Without this fix it can be seen that rk3288-veyron Chromebooks with Marvell WiFi would sometimes fail to resume WiFi even after picking my recent mwifiex patch [1]. Specifically you'd see messages like this: mwifiex_sdio mmc1:0001:1: Firmware wakeup failed mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state ...and tracing through the resume code in the failing cases showed that we were processing a SDIO interrupt really early in the resume call. NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio: Defer SDIO interrupt handling until after resume") [2]. Presumably this is the same problem that was solved by that patch. [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@chromium.org [2] https://crrev.com/c/230765 Cc: <stable@vger.kernel.org> # 4.14.x Signed-off-by: Douglas Anderson <dianders@chromium.org> --- I didn't put any "Fixes" tag here, but presumably this could be backported to whichever kernels folks found it useful for. I have at least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2) show the problem. It is very easy to pick this to v4.19 and it definitely fixes the problem there. I haven't spent the time to pick this to 4.14 myself, but presumably it wouldn't be too hard to backport this as far as v4.13 since that contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might make sense for anyone experiencing this problem to just pick the old CHROMIUM patch to fix them. Changes in v2: - Suggested 4.14+ in the stable tag (Sasha-bot) - Extra note that this is a noop on non-SDIO (Shawn / Emil) - Make boolean logic cleaner as per https://crrev.com/c/1586207/1 - Hopefully clear comments as per https://crrev.com/c/1586207/1 drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++---- drivers/mmc/host/dw_mmc.h | 3 +++ 2 files changed, 26 insertions(+), 4 deletions(-)