Message ID | 1413391385-4061-1-git-send-email-sbranden@broadcom.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/15/2014 10:43 AM, Scott Branden wrote: > Added quirk SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 present in controller. > Removed udelay in write ops by using shadow registers for 16 bit > accesses to 32-bit registers (where necessary). > Optimized 32-bit operations when doing 8/16 register accesses. That's 2 or 3 unrelated changes. They'd be better as separate patches, so that any issues that arise can be bisected down to the smaller changes. > diff --git a/drivers/mmc/host/sdhci-bcm2835.c b/drivers/mmc/host/sdhci-bcm2835.c > /* > * The Arasan has a bugette whereby it may lose the content of successive > + * writes to the same register that are within two SD-card clock cycles of > + * each other (a clock domain crossing problem). Problem does not happen with ^ The? See right >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ^ > + * data. Blank line to separate the paragraphs here, to be consistent with the other paragraph break below? > + * This wouldn't be a problem with the code except that we can only write the > + * controller with 32-bit writes. So two different 16-bit registers in the > + * written back to back creates the problem. > * > + * In reality, this only happens when a SDHCI_BLOCK_SIZE and SDHCI_BLOCK_COUNT > + * are written followed by SDHCI_TRANSFER_MODE and SDHCI_COMMAND. That seems like a rather risky assertion. Even if it's perfectly true with the MMC core code right now, does the MMC core document a guarantee that this will always be true? Even if we optimize the WAR for the issue as you've done, I think we should still have code that validates that the same register is never written back-to-back to detect this likely very hard-to-debug problem. > + * The BLOCK_SIZE and BLOCK_COUNT are meaningless until a command issued so > + * the work around can be further optimized. We can keep shadow values of > + * BLOCK_SIZE, BLOCK_COUNT, and TRANSFER_MODE until a COMMAND is issued. > + * Then, write the BLOCK_SIZE+BLOCK_COUNT in a single 32-bit write followed > + * by the TRANSFER+COMMAND in another 32-bit write. > */ After this patch, the entire WAR for this issue is contained within bcm2835_sdhci_writew(). It might be a good idea to move the comment next to that function so it's more at hand to explain the code that's there. Or at least add a comment to that function the to mention the location of the explanation for the complex code. > static inline u32 bcm2835_sdhci_readl(struct sdhci_host *host, int reg) > { > u32 val = readl(host->ioaddr + reg); > @@ -71,76 +57,83 @@ static inline u32 bcm2835_sdhci_readl(struct sdhci_host *host, int reg) > return val; > } > > -static void bcm2835_sdhci_writew(struct sdhci_host *host, u16 val, int reg) > -{ ... (entire function deleted) > -} This patch could be a lot smaller if it didn't re-order the functions at the same time. It makes the patch harder to understand. If you must re-order the functions, perhaps make that a separate patch that does nothing else, so that the actual code changes are easier to see? > static u16 bcm2835_sdhci_readw(struct sdhci_host *host, int reg) > { > - u32 val = bcm2835_sdhci_readl(host, (reg & ~3)); > - u32 word_num = (reg >> 1) & 1; > - u32 word_shift = word_num * 16; > - u32 word = (val >> word_shift) & 0xffff; > - > + u32 val = bcm2835_sdhci_readl(host->ioaddr, (reg & ~3)); The change from host to host->ioaddr ends up passing the wrong value to bcm2835_sdhci_readl(). This causes the kernel to crash during boot. The compiler doesn't warn about this because host->ioaddr is void, so can be automatically converted to struct sdhci_host *. > + u16 word = val >> (reg << 3 & 0x18) & 0xffff; > return word; > } To be honest, I think the existing code is a bit clearer, since it uses variables with names to explain all the intermediate values. Assuming the compiler is competent (which admittedly I haven't checked) I would expect the same code to be generated either way, or at least something pretty similar. Did you measure the benefit of the optimization? > +static void bcm2835_sdhci_writew(struct sdhci_host *host, u16 val, int reg) > { > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > + struct bcm2835_sdhci_host *bcm2835_host = pltfm_host->priv; > + u32 word_shift = reg << 3 & 0x18; > + u32 mask = 0xffff << word_shift; > + u32 oldval; > + u32 newval; > + > + if (reg == SDHCI_COMMAND) { > + if (bcm2835_host->shadow_blk != 0) { > + writel(bcm2835_host->shadow_blk, > + host->ioaddr + SDHCI_BLOCK_SIZE); > + bcm2835_host->shadow_blk = 0; > + } Is it absolutely guaranteed that there's never a need to write 0 to that register? I can see that no data transfer command is likely to transfer 0 blocks. I assume no other type of command uses that register as a parameter? -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Great review - thanks. On 14-10-17 07:37 PM, Stephen Warren wrote: > On 10/15/2014 10:43 AM, Scott Branden wrote: >> Added quirk SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 present in controller. >> Removed udelay in write ops by using shadow registers for 16 bit >> accesses to 32-bit registers (where necessary). >> Optimized 32-bit operations when doing 8/16 register accesses. > > That's 2 or 3 unrelated changes. They'd be better as separate patches, > so that any issues that arise can be bisected down to the smaller changes. OK - I will split into smaller patches to bisect and understand better. > >> diff --git a/drivers/mmc/host/sdhci-bcm2835.c b/drivers/mmc/host/sdhci-bcm2835.c > >> /* >> * The Arasan has a bugette whereby it may lose the content of successive >> + * writes to the same register that are within two SD-card clock cycles of >> + * each other (a clock domain crossing problem). Problem does not happen with > ^ The? > See right >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ^ > >> + * data. > > Blank line to separate the paragraphs here, to be consistent with the > other paragraph break below? I'll clean up the comment some more. > >> + * This wouldn't be a problem with the code except that we can only write the >> + * controller with 32-bit writes. So two different 16-bit registers in the >> + * written back to back creates the problem. >> * >> + * In reality, this only happens when a SDHCI_BLOCK_SIZE and SDHCI_BLOCK_COUNT >> + * are written followed by SDHCI_TRANSFER_MODE and SDHCI_COMMAND. > > That seems like a rather risky assertion. Even if it's perfectly true > with the MMC core code right now, does the MMC core document a guarantee > that this will always be true? Even if we optimize the WAR for the issue > as you've done, I think we should still have code that validates that > the same register is never written back-to-back to detect this likely > very hard-to-debug problem. You're right - nothing in life is guaranteed. We had test code for this. I'll add a config option (default on) that verifies back to back writes do not occur. > >> + * The BLOCK_SIZE and BLOCK_COUNT are meaningless until a command issued so >> + * the work around can be further optimized. We can keep shadow values of >> + * BLOCK_SIZE, BLOCK_COUNT, and TRANSFER_MODE until a COMMAND is issued. >> + * Then, write the BLOCK_SIZE+BLOCK_COUNT in a single 32-bit write followed >> + * by the TRANSFER+COMMAND in another 32-bit write. >> */ > > After this patch, the entire WAR for this issue is contained within > bcm2835_sdhci_writew(). It might be a good idea to move the comment next > to that function so it's more at hand to explain the code that's there. > Or at least add a comment to that function the to mention the location > of the explanation for the complex code. ok, I'll clean up the comment a little more too. > >> static inline u32 bcm2835_sdhci_readl(struct sdhci_host *host, int reg) >> { >> u32 val = readl(host->ioaddr + reg); >> @@ -71,76 +57,83 @@ static inline u32 bcm2835_sdhci_readl(struct sdhci_host *host, int reg) >> return val; >> } >> >> -static void bcm2835_sdhci_writew(struct sdhci_host *host, u16 val, int reg) >> -{ > ... (entire function deleted) >> -} > > This patch could be a lot smaller if it didn't re-order the functions at > the same time. It makes the patch harder to understand. If you must > re-order the functions, perhaps make that a separate patch that does > nothing else, so that the actual code changes are easier to see? ok > >> static u16 bcm2835_sdhci_readw(struct sdhci_host *host, int reg) >> { >> - u32 val = bcm2835_sdhci_readl(host, (reg & ~3)); >> - u32 word_num = (reg >> 1) & 1; >> - u32 word_shift = word_num * 16; >> - u32 word = (val >> word_shift) & 0xffff; >> - >> + u32 val = bcm2835_sdhci_readl(host->ioaddr, (reg & ~3)); > > The change from host to host->ioaddr ends up passing the wrong value to > bcm2835_sdhci_readl(). This causes the kernel to crash during boot. I see that now. Will fix - unfortunately I ported from an existing driver that doesn't need the bcm2835_shdci_readl function. > > The compiler doesn't warn about this because host->ioaddr is void, so > can be automatically converted to struct sdhci_host *. > >> + u16 word = val >> (reg << 3 & 0x18) & 0xffff; >> return word; >> } > > To be honest, I think the existing code is a bit clearer, since it uses > variables with names to explain all the intermediate values. Assuming > the compiler is competent (which admittedly I haven't checked) I would > expect the same code to be generated either way, or at least something > pretty similar. Did you measure the benefit of the optimization? By optimize I meant use the same bit calculation instead of doing different calculations for the same operation. I'll create a macro to make it clearer to see. > >> +static void bcm2835_sdhci_writew(struct sdhci_host *host, u16 val, int reg) >> { >> + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); >> + struct bcm2835_sdhci_host *bcm2835_host = pltfm_host->priv; >> + u32 word_shift = reg << 3 & 0x18; >> + u32 mask = 0xffff << word_shift; >> + u32 oldval; >> + u32 newval; >> + >> + if (reg == SDHCI_COMMAND) { >> + if (bcm2835_host->shadow_blk != 0) { >> + writel(bcm2835_host->shadow_blk, >> + host->ioaddr + SDHCI_BLOCK_SIZE); >> + bcm2835_host->shadow_blk = 0; >> + } > > Is it absolutely guaranteed that there's never a need to write 0 to that > register? I can see that no data transfer command is likely to transfer > 0 blocks. I assume no other type of command uses that register as a > parameter? Correct. > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/mmc/host/sdhci-bcm2835.c b/drivers/mmc/host/sdhci-bcm2835.c index 439d259..d967a4f 100644 --- a/drivers/mmc/host/sdhci-bcm2835.c +++ b/drivers/mmc/host/sdhci-bcm2835.c @@ -25,42 +25,28 @@ #include "sdhci-pltfm.h" /* - * 400KHz is max freq for card ID etc. Use that as min card clock. We need to - * know the min to enable static calculation of max BCM2835_SDHCI_WRITE_DELAY. - */ -#define MIN_FREQ 400000 - -/* * The Arasan has a bugette whereby it may lose the content of successive - * writes to registers that are within two SD-card clock cycles of each other - * (a clock domain crossing problem). It seems, however, that the data - * register does not have this problem, which is just as well - otherwise we'd - * have to nobble the DMA engine too. + * writes to the same register that are within two SD-card clock cycles of + * each other (a clock domain crossing problem). Problem does not happen with + * data. + * This wouldn't be a problem with the code except that we can only write the + * controller with 32-bit writes. So two different 16-bit registers in the + * written back to back creates the problem. * - * This should probably be dynamically calculated based on the actual card - * frequency. However, this is the longest we'll have to wait, and doesn't - * seem to slow access down too much, so the added complexity doesn't seem - * worth it for now. - * - * 1/MIN_FREQ is (max) time per tick of eMMC clock. - * 2/MIN_FREQ is time for two ticks. - * Multiply by 1000000 to get uS per two ticks. - * *1000000 for uSecs. - * +1 for hack rounding. + * In reality, this only happens when a SDHCI_BLOCK_SIZE and SDHCI_BLOCK_COUNT + * are written followed by SDHCI_TRANSFER_MODE and SDHCI_COMMAND. + * The BLOCK_SIZE and BLOCK_COUNT are meaningless until a command issued so + * the work around can be further optimized. We can keep shadow values of + * BLOCK_SIZE, BLOCK_COUNT, and TRANSFER_MODE until a COMMAND is issued. + * Then, write the BLOCK_SIZE+BLOCK_COUNT in a single 32-bit write followed + * by the TRANSFER+COMMAND in another 32-bit write. */ -#define BCM2835_SDHCI_WRITE_DELAY (((2 * 1000000) / MIN_FREQ) + 1) -struct bcm2835_sdhci { - u32 shadow; +struct bcm2835_sdhci_host { + u32 shadow_cmd; + u32 shadow_blk; }; -static void bcm2835_sdhci_writel(struct sdhci_host *host, u32 val, int reg) -{ - writel(val, host->ioaddr + reg); - - udelay(BCM2835_SDHCI_WRITE_DELAY); -} - static inline u32 bcm2835_sdhci_readl(struct sdhci_host *host, int reg) { u32 val = readl(host->ioaddr + reg); @@ -71,76 +57,83 @@ static inline u32 bcm2835_sdhci_readl(struct sdhci_host *host, int reg) return val; } -static void bcm2835_sdhci_writew(struct sdhci_host *host, u16 val, int reg) -{ - struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); - struct bcm2835_sdhci *bcm2835_host = pltfm_host->priv; - u32 oldval = (reg == SDHCI_COMMAND) ? bcm2835_host->shadow : - bcm2835_sdhci_readl(host, reg & ~3); - u32 word_num = (reg >> 1) & 1; - u32 word_shift = word_num * 16; - u32 mask = 0xffff << word_shift; - u32 newval = (oldval & ~mask) | (val << word_shift); - - if (reg == SDHCI_TRANSFER_MODE) - bcm2835_host->shadow = newval; - else - bcm2835_sdhci_writel(host, newval, reg & ~3); -} - static u16 bcm2835_sdhci_readw(struct sdhci_host *host, int reg) { - u32 val = bcm2835_sdhci_readl(host, (reg & ~3)); - u32 word_num = (reg >> 1) & 1; - u32 word_shift = word_num * 16; - u32 word = (val >> word_shift) & 0xffff; - + u32 val = bcm2835_sdhci_readl(host->ioaddr, (reg & ~3)); + u16 word = val >> (reg << 3 & 0x18) & 0xffff; return word; } -static void bcm2835_sdhci_writeb(struct sdhci_host *host, u8 val, int reg) +static u8 bcm2835_sdhci_readb(struct sdhci_host *host, int reg) { - u32 oldval = bcm2835_sdhci_readl(host, reg & ~3); - u32 byte_num = reg & 3; - u32 byte_shift = byte_num * 8; - u32 mask = 0xff << byte_shift; - u32 newval = (oldval & ~mask) | (val << byte_shift); + u32 val = bcm2835_sdhci_readl(host->ioaddr, (reg & ~3)); + u8 byte = val >> (reg << 3 & 0x18) & 0xff; + return byte; +} - bcm2835_sdhci_writel(host, newval, reg & ~3); +static void bcm2835_sdhci_writel(struct sdhci_host *host, u32 val, int reg) +{ + writel(val, host->ioaddr + reg); } -static u8 bcm2835_sdhci_readb(struct sdhci_host *host, int reg) +static void bcm2835_sdhci_writew(struct sdhci_host *host, u16 val, int reg) { - u32 val = bcm2835_sdhci_readl(host, (reg & ~3)); - u32 byte_num = reg & 3; - u32 byte_shift = byte_num * 8; - u32 byte = (val >> byte_shift) & 0xff; + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); + struct bcm2835_sdhci_host *bcm2835_host = pltfm_host->priv; + u32 word_shift = reg << 3 & 0x18; + u32 mask = 0xffff << word_shift; + u32 oldval; + u32 newval; + + if (reg == SDHCI_COMMAND) { + if (bcm2835_host->shadow_blk != 0) { + writel(bcm2835_host->shadow_blk, + host->ioaddr + SDHCI_BLOCK_SIZE); + bcm2835_host->shadow_blk = 0; + } + oldval = bcm2835_host->shadow_cmd; + } else if (reg == SDHCI_BLOCK_SIZE || reg == SDHCI_BLOCK_COUNT) { + oldval = bcm2835_host->shadow_blk; + } else { + oldval = readl(host->ioaddr + (reg & ~3)); + } + newval = (oldval & ~mask) | (val << word_shift); - return byte; + if (reg == SDHCI_TRANSFER_MODE) + bcm2835_host->shadow_cmd = newval; + else if (reg == SDHCI_BLOCK_SIZE || reg == SDHCI_BLOCK_COUNT) + bcm2835_host->shadow_blk = newval; + else + writel(newval, host->ioaddr + (reg & ~3)); } -static unsigned int bcm2835_sdhci_get_min_clock(struct sdhci_host *host) +static void bcm2835_sdhci_writeb(struct sdhci_host *host, u8 val, int reg) { - return MIN_FREQ; + u32 oldval = readl(host->ioaddr + (reg & ~3)); + u32 byte_shift = reg << 3 & 0x18; + u32 mask = 0xff << byte_shift; + u32 newval = (oldval & ~mask) | (val << byte_shift); + + writel(newval, host->ioaddr + (reg & ~3)); } static const struct sdhci_ops bcm2835_sdhci_ops = { - .write_l = bcm2835_sdhci_writel, - .write_w = bcm2835_sdhci_writew, - .write_b = bcm2835_sdhci_writeb, .read_l = bcm2835_sdhci_readl, .read_w = bcm2835_sdhci_readw, .read_b = bcm2835_sdhci_readb, + .write_l = bcm2835_sdhci_writel, + .write_w = bcm2835_sdhci_writew, + .write_b = bcm2835_sdhci_writeb, .set_clock = sdhci_set_clock, .get_max_clock = sdhci_pltfm_clk_get_max_clock, - .get_min_clock = bcm2835_sdhci_get_min_clock, .set_bus_width = sdhci_set_bus_width, .reset = sdhci_reset, .set_uhs_signaling = sdhci_set_uhs_signaling, }; static const struct sdhci_pltfm_data bcm2835_sdhci_pdata = { - .quirks = SDHCI_QUIRK_BROKEN_CARD_DETECTION | + .quirks = SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 | + SDHCI_QUIRK_BROKEN_CARD_DETECTION | SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK, .ops = &bcm2835_sdhci_ops, }; @@ -148,7 +141,7 @@ static const struct sdhci_pltfm_data bcm2835_sdhci_pdata = { static int bcm2835_sdhci_probe(struct platform_device *pdev) { struct sdhci_host *host; - struct bcm2835_sdhci *bcm2835_host; + struct bcm2835_sdhci_host *bcm2835_host; struct sdhci_pltfm_host *pltfm_host; int ret;
Added quirk SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 present in controller. Removed udelay in write ops by using shadow registers for 16 bit accesses to 32-bit registers (where necessary). Optimized 32-bit operations when doing 8/16 register accesses. Signed-off-by: Scott Branden <sbranden@broadcom.com> --- drivers/mmc/host/sdhci-bcm2835.c | 139 ++++++++++++++++++-------------------- 1 file changed, 66 insertions(+), 73 deletions(-)