Message ID | 20190603183740.239031-4-dianders@chromium.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | brcmfmac: sdio: Deal better w/ transmission errors waking from idle | expand |
On 3/06/19 9:37 PM, Douglas Anderson wrote: > There are certain cases, notably when transitioning between sleep and > active state, when Broadcom SDIO WiFi cards will produce errors on the > SDIO bus. This is evident from the source code where you can see that > we try commands in a loop until we either get success or we've tried > too many times. The comment in the code reinforces this by saying > "just one write attempt may fail" > > Unfortunately these failures sometimes end up causing an "-EILSEQ" > back to the core which triggers a retuning of the SDIO card and that > blocks all traffic to the card until it's done. > > Let's disable retuning around the commands we expect might fail. It seems to me that re-tuning needs to be prevented before the first access otherwise it might be attempted there, and it needs to continue to be prevented during the transition when it might reasonably be expected to fail. What about something along these lines: diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 4e15ea57d4f5..d932780ef56e 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) int err = 0; int err_cnt = 0; int try_cnt = 0; + int need_retune = 0; + bool retune_release = false; brcmf_dbg(TRACE, "Enter: on=%d\n", on); + /* Cannot re-tune if device is asleep */ + if (on) { + need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0 + sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now() + retune_release = true; + } + wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT); /* 1st KSO write goes to AOS wake up core if device is asleep */ brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err); @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) err_cnt = 0; } /* bail out upon subsequent access errors */ - if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) - break; + if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) { + if (!retune_release) + break; + /* + * Allow one more retry with re-tuning released in case + * it helps. + */ + sdio_retune_release(bus->sdiodev->func1); + retune_release = false; + } udelay(KSO_WAIT_US); brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, @@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) if (try_cnt > MAX_KSO_ATTEMPTS) brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err); + if (retune_release) { + /* + * CRC errors are not unexpected during the transition but they + * also trigger re-tuning. Clear that here to avoid an + * unnecessary re-tune if it wasn't already triggered to start + * with. + */ + if (!need_retune) + sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0 + sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release() + } + return err; }
Hi, On Thu, Jun 6, 2019 at 7:00 AM Adrian Hunter <adrian.hunter@intel.com> wrote: > > On 3/06/19 9:37 PM, Douglas Anderson wrote: > > There are certain cases, notably when transitioning between sleep and > > active state, when Broadcom SDIO WiFi cards will produce errors on the > > SDIO bus. This is evident from the source code where you can see that > > we try commands in a loop until we either get success or we've tried > > too many times. The comment in the code reinforces this by saying > > "just one write attempt may fail" > > > > Unfortunately these failures sometimes end up causing an "-EILSEQ" > > back to the core which triggers a retuning of the SDIO card and that > > blocks all traffic to the card until it's done. > > > > Let's disable retuning around the commands we expect might fail. > > It seems to me that re-tuning needs to be prevented before the > first access otherwise it might be attempted there, By this I think you mean I wasn't starting my section early enough to catch the "1st KSO write". Oops. Thanks! > and it needs > to continue to be prevented during the transition when it might > reasonably be expected to fail. > > What about something along these lines: > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > index 4e15ea57d4f5..d932780ef56e 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > @@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) > int err = 0; > int err_cnt = 0; > int try_cnt = 0; > + int need_retune = 0; > + bool retune_release = false; > > brcmf_dbg(TRACE, "Enter: on=%d\n", on); > > + /* Cannot re-tune if device is asleep */ > + if (on) { > + need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0 > + sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now() > + retune_release = true; > + } The code below still has retries even for the "!on" case. That implies that you could still get CRC errors from the card in the "!on" direction too. Any reason why we shouldn't just hold retuning even for the !on case? > + > wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT); > /* 1st KSO write goes to AOS wake up core if device is asleep */ > brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err); > @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) > err_cnt = 0; > } > /* bail out upon subsequent access errors */ > - if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) > - break; > + if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) { > + if (!retune_release) > + break; > + /* > + * Allow one more retry with re-tuning released in case > + * it helps. > + */ > + sdio_retune_release(bus->sdiodev->func1); > + retune_release = false; I would be tempted to wait before adding this logic until we actually see that it's needed. Sure, doing one more transfer probably won't really hurt, but until we know that it actually helps it seems like we're just adding extra complexity? > + } > > udelay(KSO_WAIT_US); > brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, > @@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) > if (try_cnt > MAX_KSO_ATTEMPTS) > brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err); > > + if (retune_release) { > + /* > + * CRC errors are not unexpected during the transition but they > + * also trigger re-tuning. Clear that here to avoid an > + * unnecessary re-tune if it wasn't already triggered to start > + * with. > + */ > + if (!need_retune) > + sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0 > + sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release() > + } Every time I re-look at this I have to re-figure out all the subtle differences between the variables and functions involved here. Let me see if I got everything right: * need_retune: set to 1 if we can retune and some event happened that makes us truly believe that we need to be retuned, like we got a CRC error or a timer expired or our host controller told us to retune. * retune_now: set to 1 it's an OK time to be retuning. Specifically if retune_now is false we won't send any retuning commands but we'll still keep track of the need to retune. * hold_retune: If this gets set to 1 by mmc_retune_hold_now() then a future call to mmc_retune_hold() will _not_ schedule a retune by setting retune_now (because mmc_retune_hold() will see that hold_retune was already 1). ...and a future call to mmc_retune_recheck() between mmc_hold() and mmc_release() will also not schedule a retune because hold_retune will be 2 (or generally > 1). --- So overall trying to summarize what I think are the differences between your patch and my patch. 1. If we needed to re-tune _before_ calling brcmf_sdio_kso_control(), with your patch we'll make sure that we don't actually attempt to retune until brcmf_sdio_kso_control() finishes. 2. If we needed to retune during brcmf_sdio_kso_control() (because a timer expired?) then we wouldn't trigger that retune while brcmf_sdio_kso_control() is running. In the case of dw_mmc, which I'm most familiar with, we don't have any sort of automated or timed-based retuning. ...so we'll only re-tune when we see the CRC error. If I'm understanding things correctly then that for dw_mmc my solution and yours behave the same. That means the difference is how we deal with other retuning requests, either ones that come about because of an interrupt that the host controller provided or because of a timer. Did I get that right? ...and I guess the reason we have to deal specially with these cases is because any time that SDIO card is "sleeping" we don't want to retune because it won't work. Right? NOTE: the solution that would come to my mind first to solve this would be to hold the retuning for the whole time that the card was sleeping and then release it once the card was awake again. ...but I guess we don't truly need to do that because tuning only happens as a side effect of sending a command to the card and the only command we send to the card is the "wake up" command. That's why your solution to hold tuning while sending the "wake up" command works, right? --- OK, so assuming all the above is correct, I feel like we're actually solving two problems and in fact I believe we actually need both our approaches to solve everything correctly. With just your patch in place there's a problem because we will clobber any external retuning requests that happened while we were waking up the card. AKA, imagine this: A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0 B) We call sdio_retune_hold_now() C) A retuning timer goes off or the SD Host controller tells us to retune D) We get to the end of brcmf_sdio_kso_control() and clear the "retune needed" since need_retune was 0 at the start. ...so we dropped the retuning request from C), right? What we truly need is: 1. CRC errors shouldn't trigger a retuning request when we're in brcmf_sdio_kso_control() 2. A separate patch that holds any retuning requests while the SDIO card is off. This patch _shouldn't_ do any clearing of retuning requests, just defer them. Does that make sense to you? If so, I can try to code it up... -Doug
On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote: > > In the case of dw_mmc, which I'm most familiar with, we don't have any > sort of automated or timed-based retuning. ...so we'll only re-tune > when we see the CRC error. If I'm understanding things correctly then > that for dw_mmc my solution and yours behave the same. That means the > difference is how we deal with other retuning requests, either ones > that come about because of an interrupt that the host controller > provided or because of a timer. Did I get that right? Right. > ...and I guess the reason we have to deal specially with these cases > is because any time that SDIO card is "sleeping" we don't want to > retune because it won't work. Right? NOTE: the solution that would > come to my mind first to solve this would be to hold the retuning for > the whole time that the card was sleeping and then release it once the > card was awake again. ...but I guess we don't truly need to do that > because tuning only happens as a side effect of sending a command to > the card and the only command we send to the card is the "wake up" > command. That's why your solution to hold tuning while sending the > "wake up" command works, right? Yup. > --- > > OK, so assuming all the above is correct, I feel like we're actually > solving two problems and in fact I believe we actually need both our > approaches to solve everything correctly. With just your patch in > place there's a problem because we will clobber any external retuning > requests that happened while we were waking up the card. AKA, imagine > this: > > A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0 > > B) We call sdio_retune_hold_now() > > C) A retuning timer goes off or the SD Host controller tells us to retune > > D) We get to the end of brcmf_sdio_kso_control() and clear the "retune > needed" since need_retune was 0 at the start. > > ...so we dropped the retuning request from C), right? > > > What we truly need is: > > 1. CRC errors shouldn't trigger a retuning request when we're in > brcmf_sdio_kso_control() > > 2. A separate patch that holds any retuning requests while the SDIO > card is off. This patch _shouldn't_ do any clearing of retuning > requests, just defer them. > > > Does that make sense to you? If so, I can try to code it up... FWIW it does make sense to me. However, I am still not sure if our sdio hardware supports retuning. Have to track down an asic designer who can tell or dive into vhdl myself. So I want to disable device sleep and trigger retuning through debugfs or some other hack. Regards, Arend
On 7/06/19 12:37 AM, Doug Anderson wrote: > Hi, > > On Thu, Jun 6, 2019 at 7:00 AM Adrian Hunter <adrian.hunter@intel.com> wrote: >> >> On 3/06/19 9:37 PM, Douglas Anderson wrote: >>> There are certain cases, notably when transitioning between sleep and >>> active state, when Broadcom SDIO WiFi cards will produce errors on the >>> SDIO bus. This is evident from the source code where you can see that >>> we try commands in a loop until we either get success or we've tried >>> too many times. The comment in the code reinforces this by saying >>> "just one write attempt may fail" >>> >>> Unfortunately these failures sometimes end up causing an "-EILSEQ" >>> back to the core which triggers a retuning of the SDIO card and that >>> blocks all traffic to the card until it's done. >>> >>> Let's disable retuning around the commands we expect might fail. >> >> It seems to me that re-tuning needs to be prevented before the >> first access otherwise it might be attempted there, > > By this I think you mean I wasn't starting my section early enough to > catch the "1st KSO write". Oops. Thanks! > > >> and it needs >> to continue to be prevented during the transition when it might >> reasonably be expected to fail. >> >> What about something along these lines: >> >> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c >> index 4e15ea57d4f5..d932780ef56e 100644 >> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c >> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c >> @@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) >> int err = 0; >> int err_cnt = 0; >> int try_cnt = 0; >> + int need_retune = 0; >> + bool retune_release = false; >> >> brcmf_dbg(TRACE, "Enter: on=%d\n", on); >> >> + /* Cannot re-tune if device is asleep */ >> + if (on) { >> + need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0 >> + sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now() >> + retune_release = true; >> + } > > The code below still has retries even for the "!on" case. That > implies that you could still get CRC errors from the card in the "!on" > direction too. Any reason why we shouldn't just hold retuning even > for the !on case? No > > >> + >> wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT); >> /* 1st KSO write goes to AOS wake up core if device is asleep */ >> brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err); >> @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) >> err_cnt = 0; >> } >> /* bail out upon subsequent access errors */ >> - if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) >> - break; >> + if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) { >> + if (!retune_release) >> + break; >> + /* >> + * Allow one more retry with re-tuning released in case >> + * it helps. >> + */ >> + sdio_retune_release(bus->sdiodev->func1); >> + retune_release = false; > > I would be tempted to wait before adding this logic until we actually > see that it's needed. Sure, doing one more transfer probably won't > really hurt, but until we know that it actually helps it seems like > we're just adding extra complexity? Depends, what is the downside of unnecessarily returning an error from brcmf_sdio_kso_control() in that case? > > >> + } >> >> udelay(KSO_WAIT_US); >> brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, >> @@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) >> if (try_cnt > MAX_KSO_ATTEMPTS) >> brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err); >> >> + if (retune_release) { >> + /* >> + * CRC errors are not unexpected during the transition but they >> + * also trigger re-tuning. Clear that here to avoid an >> + * unnecessary re-tune if it wasn't already triggered to start >> + * with. >> + */ >> + if (!need_retune) >> + sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0 >> + sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release() >> + } > > Every time I re-look at this I have to re-figure out all the subtle > differences between the variables and functions involved here. Let me > see if I got everything right: > > * need_retune: set to 1 if we can retune and some event happened that > makes us truly believe that we need to be retuned, like we got a CRC > error or a timer expired or our host controller told us to retune. > > * retune_now: set to 1 it's an OK time to be retuning. Specifically > if retune_now is false we won't send any retuning commands but we'll > still keep track of the need to retune. > > * hold_retune: If this gets set to 1 by mmc_retune_hold_now() then a > future call to mmc_retune_hold() will _not_ schedule a retune by > setting retune_now (because mmc_retune_hold() will see that > hold_retune was already 1). ...and a future call to > mmc_retune_recheck() between mmc_hold() and mmc_release() will also > not schedule a retune because hold_retune will be 2 (or generally > > 1). > > --- > > So overall trying to summarize what I think are the differences > between your patch and my patch. > > 1. If we needed to re-tune _before_ calling brcmf_sdio_kso_control(), > with your patch we'll make sure that we don't actually attempt to > retune until brcmf_sdio_kso_control() finishes. > > 2. If we needed to retune during brcmf_sdio_kso_control() (because a > timer expired?) then we wouldn't trigger that retune while > brcmf_sdio_kso_control() is running. > > In the case of dw_mmc, which I'm most familiar with, we don't have any > sort of automated or timed-based retuning. ...so we'll only re-tune > when we see the CRC error. If I'm understanding things correctly then > that for dw_mmc my solution and yours behave the same. That means the > difference is how we deal with other retuning requests, either ones > that come about because of an interrupt that the host controller > provided or because of a timer. Did I get that right? > > ...and I guess the reason we have to deal specially with these cases > is because any time that SDIO card is "sleeping" we don't want to > retune because it won't work. Right? NOTE: the solution that would > come to my mind first to solve this would be to hold the retuning for > the whole time that the card was sleeping and then release it once the > card was awake again. ...but I guess we don't truly need to do that > because tuning only happens as a side effect of sending a command to > the card and the only command we send to the card is the "wake up" > command. That's why your solution to hold tuning while sending the > "wake up" command works, right? > > --- > > OK, so assuming all the above is correct, I feel like we're actually > solving two problems and in fact I believe we actually need both our > approaches to solve everything correctly. With just your patch in > place there's a problem because we will clobber any external retuning > requests that happened while we were waking up the card. AKA, imagine > this: > > A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0 > > B) We call sdio_retune_hold_now() > > C) A retuning timer goes off or the SD Host controller tells us to retune > > D) We get to the end of brcmf_sdio_kso_control() and clear the "retune > needed" since need_retune was 0 at the start. > > ...so we dropped the retuning request from C), right? True > > > What we truly need is: > > 1. CRC errors shouldn't trigger a retuning request when we're in > brcmf_sdio_kso_control() > > 2. A separate patch that holds any retuning requests while the SDIO > card is off. This patch _shouldn't_ do any clearing of retuning > requests, just defer them. > > > Does that make sense to you? If so, I can try to code it up... Sounds good :-)
On 7/06/19 8:12 AM, Arend Van Spriel wrote: > On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote: >> >> In the case of dw_mmc, which I'm most familiar with, we don't have any >> sort of automated or timed-based retuning. ...so we'll only re-tune >> when we see the CRC error. If I'm understanding things correctly then >> that for dw_mmc my solution and yours behave the same. That means the >> difference is how we deal with other retuning requests, either ones >> that come about because of an interrupt that the host controller >> provided or because of a timer. Did I get that right? > > Right. > >> ...and I guess the reason we have to deal specially with these cases >> is because any time that SDIO card is "sleeping" we don't want to >> retune because it won't work. Right? NOTE: the solution that would >> come to my mind first to solve this would be to hold the retuning for >> the whole time that the card was sleeping and then release it once the >> card was awake again. ...but I guess we don't truly need to do that >> because tuning only happens as a side effect of sending a command to >> the card and the only command we send to the card is the "wake up" >> command. That's why your solution to hold tuning while sending the >> "wake up" command works, right? > > Yup. > >> --- >> >> OK, so assuming all the above is correct, I feel like we're actually >> solving two problems and in fact I believe we actually need both our >> approaches to solve everything correctly. With just your patch in >> place there's a problem because we will clobber any external retuning >> requests that happened while we were waking up the card. AKA, imagine >> this: >> >> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0 >> >> B) We call sdio_retune_hold_now() >> >> C) A retuning timer goes off or the SD Host controller tells us to retune >> >> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune >> needed" since need_retune was 0 at the start. >> >> ...so we dropped the retuning request from C), right? >> >> >> What we truly need is: >> >> 1. CRC errors shouldn't trigger a retuning request when we're in >> brcmf_sdio_kso_control() >> >> 2. A separate patch that holds any retuning requests while the SDIO >> card is off. This patch _shouldn't_ do any clearing of retuning >> requests, just defer them. >> >> >> Does that make sense to you? If so, I can try to code it up... > > FWIW it does make sense to me. However, I am still not sure if our sdio > hardware supports retuning. Have to track down an asic designer who can tell > or dive into vhdl myself. The card supports re-tuning if is handles CMD19, which it does. It is not the card that does any tuning, only the host. The card just helps by providing a known data pattern in response to CMD19. It can be that a card provides good enough signals that the host should not need to re-tune. I don't know if that can be affected by the board design though.
On June 7, 2019 2:40:04 PM Adrian Hunter <adrian.hunter@intel.com> wrote: > On 7/06/19 8:12 AM, Arend Van Spriel wrote: >> On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote: >>> >>> In the case of dw_mmc, which I'm most familiar with, we don't have any >>> sort of automated or timed-based retuning. ...so we'll only re-tune >>> when we see the CRC error. If I'm understanding things correctly then >>> that for dw_mmc my solution and yours behave the same. That means the >>> difference is how we deal with other retuning requests, either ones >>> that come about because of an interrupt that the host controller >>> provided or because of a timer. Did I get that right? >> >> Right. >> >>> ...and I guess the reason we have to deal specially with these cases >>> is because any time that SDIO card is "sleeping" we don't want to >>> retune because it won't work. Right? NOTE: the solution that would >>> come to my mind first to solve this would be to hold the retuning for >>> the whole time that the card was sleeping and then release it once the >>> card was awake again. ...but I guess we don't truly need to do that >>> because tuning only happens as a side effect of sending a command to >>> the card and the only command we send to the card is the "wake up" >>> command. That's why your solution to hold tuning while sending the >>> "wake up" command works, right? >> >> Yup. >> >>> --- >>> >>> OK, so assuming all the above is correct, I feel like we're actually >>> solving two problems and in fact I believe we actually need both our >>> approaches to solve everything correctly. With just your patch in >>> place there's a problem because we will clobber any external retuning >>> requests that happened while we were waking up the card. AKA, imagine >>> this: >>> >>> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0 >>> >>> B) We call sdio_retune_hold_now() >>> >>> C) A retuning timer goes off or the SD Host controller tells us to retune >>> >>> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune >>> needed" since need_retune was 0 at the start. >>> >>> ...so we dropped the retuning request from C), right? >>> >>> >>> What we truly need is: >>> >>> 1. CRC errors shouldn't trigger a retuning request when we're in >>> brcmf_sdio_kso_control() >>> >>> 2. A separate patch that holds any retuning requests while the SDIO >>> card is off. This patch _shouldn't_ do any clearing of retuning >>> requests, just defer them. >>> >>> >>> Does that make sense to you? If so, I can try to code it up... >> >> FWIW it does make sense to me. However, I am still not sure if our sdio >> hardware supports retuning. Have to track down an asic designer who can tell >> or dive into vhdl myself. > > The card supports re-tuning if is handles CMD19, which it does. It is not > the card that does any tuning, only the host. The card just helps by > providing a known data pattern in response to CMD19. It can be that a card > provides good enough signals that the host should not need to re-tune. I > don't know if that can be affected by the board design though. Right. I know it supports initial tuning, but I'm not sure about subsequent retuning initiated by the host controller. Regards, Arend
Hi, On Fri, Jun 7, 2019 at 5:29 AM Adrian Hunter <adrian.hunter@intel.com> wrote: > > >> @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) > >> err_cnt = 0; > >> } > >> /* bail out upon subsequent access errors */ > >> - if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) > >> - break; > >> + if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) { > >> + if (!retune_release) > >> + break; > >> + /* > >> + * Allow one more retry with re-tuning released in case > >> + * it helps. > >> + */ > >> + sdio_retune_release(bus->sdiodev->func1); > >> + retune_release = false; > > > > I would be tempted to wait before adding this logic until we actually > > see that it's needed. Sure, doing one more transfer probably won't > > really hurt, but until we know that it actually helps it seems like > > we're just adding extra complexity? > > Depends, what is the downside of unnecessarily returning an error from > brcmf_sdio_kso_control() in that case? I'm not aware of any downside, but without any example of it being needed it's just untested code that might or might not fix a problem. For now I'm going to leave it out of my patch and if someone later finds it needed or if you're convinced that we really need it we can add it as a patch atop. -Doug
Hi, On Fri, Jun 7, 2019 at 6:32 AM Arend Van Spriel <arend.vanspriel@broadcom.com> wrote: > > On June 7, 2019 2:40:04 PM Adrian Hunter <adrian.hunter@intel.com> wrote: > > > On 7/06/19 8:12 AM, Arend Van Spriel wrote: > >> On June 6, 2019 11:37:22 PM Doug Anderson <dianders@chromium.org> wrote: > >>> > >>> In the case of dw_mmc, which I'm most familiar with, we don't have any > >>> sort of automated or timed-based retuning. ...so we'll only re-tune > >>> when we see the CRC error. If I'm understanding things correctly then > >>> that for dw_mmc my solution and yours behave the same. That means the > >>> difference is how we deal with other retuning requests, either ones > >>> that come about because of an interrupt that the host controller > >>> provided or because of a timer. Did I get that right? > >> > >> Right. > >> > >>> ...and I guess the reason we have to deal specially with these cases > >>> is because any time that SDIO card is "sleeping" we don't want to > >>> retune because it won't work. Right? NOTE: the solution that would > >>> come to my mind first to solve this would be to hold the retuning for > >>> the whole time that the card was sleeping and then release it once the > >>> card was awake again. ...but I guess we don't truly need to do that > >>> because tuning only happens as a side effect of sending a command to > >>> the card and the only command we send to the card is the "wake up" > >>> command. That's why your solution to hold tuning while sending the > >>> "wake up" command works, right? > >> > >> Yup. > >> > >>> --- > >>> > >>> OK, so assuming all the above is correct, I feel like we're actually > >>> solving two problems and in fact I believe we actually need both our > >>> approaches to solve everything correctly. With just your patch in > >>> place there's a problem because we will clobber any external retuning > >>> requests that happened while we were waking up the card. AKA, imagine > >>> this: > >>> > >>> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0 > >>> > >>> B) We call sdio_retune_hold_now() > >>> > >>> C) A retuning timer goes off or the SD Host controller tells us to retune > >>> > >>> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune > >>> needed" since need_retune was 0 at the start. > >>> > >>> ...so we dropped the retuning request from C), right? > >>> > >>> > >>> What we truly need is: > >>> > >>> 1. CRC errors shouldn't trigger a retuning request when we're in > >>> brcmf_sdio_kso_control() > >>> > >>> 2. A separate patch that holds any retuning requests while the SDIO > >>> card is off. This patch _shouldn't_ do any clearing of retuning > >>> requests, just defer them. > >>> > >>> > >>> Does that make sense to you? If so, I can try to code it up... > >> > >> FWIW it does make sense to me. However, I am still not sure if our sdio > >> hardware supports retuning. Have to track down an asic designer who can tell > >> or dive into vhdl myself. > > > > The card supports re-tuning if is handles CMD19, which it does. It is not > > the card that does any tuning, only the host. The card just helps by > > providing a known data pattern in response to CMD19. It can be that a card > > provides good enough signals that the host should not need to re-tune. I > > don't know if that can be affected by the board design though. > > Right. I know it supports initial tuning, but I'm not sure about subsequent > retuning initiated by the host controller. My evidence says that it supports subsequent tuning. In fact, without this series my logs would be filled with: dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ ...where the phase varied by a few degrees each time. AKA: it was retuning over and over again and getting sane results which implies that the tuning was working just fine. The whole point of this series is not that the retuning was actually broken or anything it was just pointless and blocking the bus while it happened. On rk3288 dw_mmc ports we also currently do pretty extensive tuning, trying _lots_ of phases. Thus the re-tuning was blocking the bus for a significant amount of time. -Doug
On June 7, 2019 8:06:30 PM Doug Anderson <dianders@chromium.org> wrote: > Hi, > > On Fri, Jun 7, 2019 at 6:32 AM Arend Van Spriel > <arend.vanspriel@broadcom.com> wrote: >> >> Right. I know it supports initial tuning, but I'm not sure about subsequent >> retuning initiated by the host controller. > > My evidence says that it supports subsequent tuning. In fact, without > this series my logs would be filled with: > > dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to XYZ > > ...where the phase varied by a few degrees each time. AKA: it was > retuning over and over again and getting sane results which implies > that the tuning was working just fine. Ok. Thanks for confirming this. Regards, Arend
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 4a750838d8cd..e0cfcc078a54 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -16,6 +16,7 @@ #include <linux/mmc/sdio_ids.h> #include <linux/mmc/sdio_func.h> #include <linux/mmc/card.h> +#include <linux/mmc/core.h> #include <linux/semaphore.h> #include <linux/firmware.h> #include <linux/module.h> @@ -697,6 +698,7 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) bmask = SBSDIO_FUNC1_SLEEPCSR_KSO_MASK; } + mmc_expect_errors_begin(bus->sdiodev->func1->card->host); do { /* reliable KSO bit set/clr: * the sdiod sleep write access is synced to PMU 32khz clk @@ -719,6 +721,7 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on) &err); } while (try_cnt++ < MAX_KSO_ATTEMPTS); + mmc_expect_errors_end(bus->sdiodev->func1->card->host); if (try_cnt > 2) brcmf_dbg(SDIO, "try_cnt=%d rd_val=0x%x err=%d\n", try_cnt,
There are certain cases, notably when transitioning between sleep and active state, when Broadcom SDIO WiFi cards will produce errors on the SDIO bus. This is evident from the source code where you can see that we try commands in a loop until we either get success or we've tried too many times. The comment in the code reinforces this by saying "just one write attempt may fail" Unfortunately these failures sometimes end up causing an "-EILSEQ" back to the core which triggers a retuning of the SDIO card and that blocks all traffic to the card until it's done. Let's disable retuning around the commands we expect might fail. Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v2: None drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 3 +++ 1 file changed, 3 insertions(+)