Message ID | 20230106200809.330769-11-william.zhang@broadcom.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | spi: bcm63xx-hsspi: driver and doc updates | expand |
On Fri, Jan 06, 2023 at 12:08:02PM -0800, William Zhang wrote: > Due to the controller limitation to keep the chip select low during the > bus idle time between the transfer, a dummy cs workaround was used when > this driver was first upstreamed to the kernel. > > The workaround picks dummy_cs as !actual_cs and usually cs is 0 and > dummy cs is 1. But this does not work out in all the situations as > customer reported issues for their board design. Due to SPI block design > constrain, the controller works in different mode internally depending > on the clock. When clock is 30MHz or above, controller works in faster > mode and cs 1 signal is used internally as feedback from the pad. So cs > 1 pin must be brought out and configured as chip select function in > pinmux. When clock is below 30MHz, only cs 0 pin is used. In summary > when clock is below 30MHz, the workaround always works as only cs 0 is > involved. For clock faster than 30MHz, it require cs1 pin used in the > board design and configured as chip selection function. > In a typical usage of SPI flash on cs 0 that normally runs at 100MHz, > cs 1 pin must be used in the board design but this is not always the > case for many customer boards. > For voice daughtercard usage, the SoC alway uses internal cs 7 for > voice card SPI control. Then it requires cs 0 pin as the dummy cs but > board with voice design may not use cs 0 pin at all. So I think what this is trying to say is that operation over 30MHz with the existing workaround requires that the board be designed to bring out the chip select used as the dummy as a physical chip select (or potentially it has to be the actual chip select for the device?) but that likely won't have been done? And potentially this only works if CS 1 is the one brought out? I'm unclear if CS 1 is just the most common dummy chip select or if there's something else going on here. > The controller actually has a prepend feature that can combine multiple > SPI transfers in a SPI message into one single transfer when the > transfers meet certain requirements. So there is no need for keeping cs > low when the message only has one transfer. Most of the SPI devices > including SPI NOR, SPI NAND flash, Broadcom voice card and etc can use > this feature without the dummy cs workaround. > This patch makes the dummy cs workaround as an option based on the > dts flag brcm,use-cs-workaround. By default dummy cs workaround is > hard coded to enable. We will use the prepend feature and disable this > workaround as default in the next patch of this series unless this flag > is set in dts. ...and based on your other comments I gather it's difficult to disable the workaround per message? I'm also guessing that the overhead from always doing full duplex transfers is noticable so it's better to try the workaround. I wonder if we can't do this by selecting the workaround based on the configured device speed. If the device is configured to use more than 30MHz then disable the workaround, if it's set to lower then use it. In practice most devices don't change their speed after setup, and the driver could check for and handle that case I guess (eg, limit the speed configured if the workaround has been activated or rewrite the message to a single transfer if neded). That would be less error prone for users, there wouldn't be any possibility of them setting this flag incorrectly.
Hi Mark, Sorry for the late reply. Please see my comments below. On 01/12/2023 10:08 AM, Mark Brown wrote: > On Fri, Jan 06, 2023 at 12:08:02PM -0800, William Zhang wrote: > >> Due to the controller limitation to keep the chip select low during the >> bus idle time between the transfer, a dummy cs workaround was used when >> this driver was first upstreamed to the kernel. >> >> The workaround picks dummy_cs as !actual_cs and usually cs is 0 and >> dummy cs is 1. But this does not work out in all the situations as >> customer reported issues for their board design. Due to SPI block design >> constrain, the controller works in different mode internally depending >> on the clock. When clock is 30MHz or above, controller works in faster >> mode and cs 1 signal is used internally as feedback from the pad. So cs >> 1 pin must be brought out and configured as chip select function in >> pinmux. When clock is below 30MHz, only cs 0 pin is used. In summary >> when clock is below 30MHz, the workaround always works as only cs 0 is >> involved. For clock faster than 30MHz, it require cs1 pin used in the >> board design and configured as chip selection function. > >> In a typical usage of SPI flash on cs 0 that normally runs at 100MHz, >> cs 1 pin must be used in the board design but this is not always the >> case for many customer boards. > >> For voice daughtercard usage, the SoC alway uses internal cs 7 for >> voice card SPI control. Then it requires cs 0 pin as the dummy cs but >> board with voice design may not use cs 0 pin at all. > > So I think what this is trying to say is that operation over 30MHz with > the existing workaround requires that the board be designed to bring out > the chip select used as the dummy as a physical chip select (or > potentially it has to be the actual chip select for the device?) Yes as the physical chip select and its pinmux must set as spi slave chip select function. If that pin is configured as GPIO function, this workaround won't work. For the old SoCs, this dummy cs 1 pin are default to cs pinmux function so the issue was not revealed. For all new SoC, the cs 1 pin is default to GPIO function so it stops working. > but that likely won't have been done? Correct. This was never a requirement from Broadcom for any board design because the Broadcom in-house SPI bus driver in its initial software release was essentially based on prepend mode so no such hardware design requirement and it does not make sense to reserve the dummy cs1 pin while customer only have one spi device at cs0. So no customer board would do that unless they have two devices. We decided to switch to the upstream driver with prepend mode addition as default in our late software release couple years ago. It has been running pretty smooth in the field. > And potentially this only works if CS > 1 is the one brought out? I'm unclear if CS 1 is just the most common > dummy chip select or if there's something else going on here. > Yes with pinmux set correctly. It can be other cs but the code set dummy_cs = !actual_cs. So it can be either 1 or 0. In most common case, board only have one spi device on cs 0. So dummy is mostly cs1. >> The controller actually has a prepend feature that can combine multiple >> SPI transfers in a SPI message into one single transfer when the >> transfers meet certain requirements. So there is no need for keeping cs >> low when the message only has one transfer. Most of the SPI devices >> including SPI NOR, SPI NAND flash, Broadcom voice card and etc can use >> this feature without the dummy cs workaround. > >> This patch makes the dummy cs workaround as an option based on the >> dts flag brcm,use-cs-workaround. By default dummy cs workaround is >> hard coded to enable. We will use the prepend feature and disable this >> workaround as default in the next patch of this series unless this flag >> is set in dts. > > ...and based on your other comments I gather it's difficult to disable > the workaround per message? I'm also guessing that the overhead from > always doing full duplex transfers is noticable so it's better to try > the workaround. > The main issue the max_message/transfer_size setup during the driver probe function. prepend mode has a max size of 512 bytes while dummy cs does not. I don't think it is safe to update these function ptr at runtime and it will be too late for the driver to update. But I think we can always set the max size as in prepend mode. It does looses some performance for dummy cs (about 15% slower on throughput when test on 100MHz SPI NAND) because the size limit. The prepend buffer copy and other handling is insignificant. But since dummy cs does not work on 100MHz in general, it is still much better throughput when doing prepend mode at 100Mhz with size limit comparing dummy cs at 30MHz. > I wonder if we can't do this by selecting the workaround based on the > configured device speed. If the device is configured to use more than > 30MHz then disable the workaround, if it's set to lower then use it. In > practice most devices don't change their speed after setup, and the > driver could check for and handle that case I guess (eg, limit the speed > configured if the workaround has been activated or rewrite the message > to a single transfer if neded). That would be less error prone for > users, there wouldn't be any possibility of them setting this flag > incorrectly. > I agree. We can get rid of the cs workaround dts flag and make run time decision based on the device speed. And for the cut off frequency, we want to make it bit more conservative. According to our VSLI designer, the cut off freq depends on the chip floor plan - the distance from the SPI peripheral block to the SPI pins. So for some chips, the freq is 25Mhz, for some is 30Mhz. So we think 25MHz is good number to use. As our SDK release uses prepend mode as default and it has been field tested with different and multiple SPI devices simultaneously in our large customer bases, I don't feel like to change the default mode to dummy cs mode and I am more comfortable to stick with prepend mode. So my preferred choice is to use prepend mode as default. In the unlikely event that SPI transfers can not be merged for prepend mode, we switch to dummy cs workaround mode and force the transfer to run at 25MHz. This make sure everything works under the hood automatically and no dts setting is required. I can add sysfs knobs to override this behavior and force to use either mode just in case. Does this sounds good to you?
On Wed, Jan 18, 2023 at 03:09:39PM -0800, William Zhang wrote: > So my preferred choice is to use prepend mode as default. In the unlikely > event that SPI transfers can not be merged for prepend mode, we switch to > dummy cs workaround mode and force the transfer to run at 25MHz. This make > sure everything works under the hood automatically and no dts setting is > required. I can add sysfs knobs to override this behavior and force to use > either mode just in case. Does this sounds good to you? I think so.
diff --git a/drivers/spi/spi-bcm63xx-hsspi.c b/drivers/spi/spi-bcm63xx-hsspi.c index 398c412dcc3e..b5043251edec 100644 --- a/drivers/spi/spi-bcm63xx-hsspi.c +++ b/drivers/spi/spi-bcm63xx-hsspi.c @@ -111,6 +111,7 @@ struct bcm63xx_hsspi { u32 speed_hz; u8 cs_polarity; + bool use_cs_workaround; int irq; }; @@ -161,16 +162,18 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) { struct bcm63xx_hsspi *bs = spi_master_get_devdata(spi->master); unsigned int chip_select = spi->chip_select; + unsigned int hw_cs; u16 opcode = 0; int pending = t->len; int step_size = HSSPI_BUFFER_LEN; const u8 *tx = t->tx_buf; u8 *rx = t->rx_buf; - u32 val = 0; unsigned long limit; + u32 reg = 0; bcm63xx_hsspi_set_clk(bs, spi, t->speed_hz); - bcm63xx_hsspi_set_cs(bs, spi->chip_select, true); + if (bs->use_cs_workaround) + bcm63xx_hsspi_set_cs(bs, spi->chip_select, true); if (tx && rx) opcode = HSSPI_OP_READ_WRITE; @@ -187,14 +190,15 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) opcode |= HSSPI_OP_MULTIBIT; if (t->rx_nbits == SPI_NBITS_DUAL) - val |= 1 << MODE_CTRL_MULTIDATA_RD_SIZE_SHIFT; + reg |= 1 << MODE_CTRL_MULTIDATA_RD_SIZE_SHIFT; if (t->tx_nbits == SPI_NBITS_DUAL) - val |= 1 << MODE_CTRL_MULTIDATA_WR_SIZE_SHIFT; + reg |= 1 << MODE_CTRL_MULTIDATA_WR_SIZE_SHIFT; } - __raw_writel(val | 0xff, + __raw_writel(reg | 0xff, bs->regs + HSSPI_PROFILE_MODE_CTRL_REG(chip_select)); + hw_cs = bs->use_cs_workaround ? !chip_select : chip_select; while (pending > 0) { int curr_step = min_t(int, step_size, pending); @@ -211,11 +215,10 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) __raw_writel(HSSPI_PINGx_CMD_DONE(0), bs->regs + HSSPI_INT_MASK_REG); - /* start the transfer */ - __raw_writel(!chip_select << PINGPONG_CMD_SS_SHIFT | - chip_select << PINGPONG_CMD_PROFILE_SHIFT | - PINGPONG_COMMAND_START_NOW, - bs->regs + HSSPI_PINGPONG_COMMAND_REG(0)); + reg = hw_cs << PINGPONG_CMD_SS_SHIFT | + chip_select << PINGPONG_CMD_PROFILE_SHIFT | + PINGPONG_COMMAND_START_NOW; + __raw_writel(reg, bs->regs + HSSPI_PINGPONG_COMMAND_REG(0)); if (bs->irq > 0) { if (wait_for_completion_timeout(&bs->done, HZ) == 0) @@ -223,14 +226,15 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) } else { /* polling mode checks for status busy bit */ limit = jiffies + msecs_to_jiffies(HSSPI_POLL_STATUS_TIMEOUT_MS); + while (!time_after(jiffies, limit)) { - val = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0)); - if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY) + reg = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0)); + if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY) cpu_relax(); else break; } - if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY) + if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY) goto err_timeout; } @@ -311,8 +315,10 @@ static int bcm63xx_hsspi_transfer_one(struct spi_master *master, * e. At the end restore the polarities again to their default values. */ - dummy_cs = !spi->chip_select; - bcm63xx_hsspi_set_cs(bs, dummy_cs, true); + if (bs->use_cs_workaround) { + dummy_cs = !spi->chip_select; + bcm63xx_hsspi_set_cs(bs, dummy_cs, true); + } list_for_each_entry(t, &msg->transfers, transfer_list) { status = bcm63xx_hsspi_do_txrx(spi, t); @@ -343,9 +349,11 @@ static int bcm63xx_hsspi_transfer_one(struct spi_master *master, restore_polarity = !t->cs_change; } - bcm63xx_hsspi_set_cs(bs, dummy_cs, false); - if (restore_polarity) - bcm63xx_hsspi_set_cs(bs, spi->chip_select, false); + if (bs->use_cs_workaround) { + bcm63xx_hsspi_set_cs(bs, dummy_cs, false); + if (restore_polarity) + bcm63xx_hsspi_set_cs(bs, spi->chip_select, false); + } msg->status = status; spi_finalize_current_message(master); @@ -445,8 +453,17 @@ static int bcm63xx_hsspi_probe(struct platform_device *pdev) init_completion(&bs->done); master->dev.of_node = dev->of_node; - if (!dev->of_node) + if (!dev->of_node) { master->bus_num = HSSPI_BUS_NUM; + /* use dummy cs workaround on old mips dev with no dts support */ + bs->use_cs_workaround = true; + } else { + /* check if dummy cs workaround is needed from device tree */ + bs->use_cs_workaround = of_property_read_bool( + dev->of_node, "brcm,use-cs-workaround"); + } + /* tmp hack. hard code to use cs workaround before prepend mode is added */ + bs->use_cs_workaround = true; of_property_read_u32(dev->of_node, "num-cs", &num_cs); if (num_cs > 8) {
Due to the controller limitation to keep the chip select low during the bus idle time between the transfer, a dummy cs workaround was used when this driver was first upstreamed to the kernel. The workaround picks dummy_cs as !actual_cs and usually cs is 0 and dummy cs is 1. But this does not work out in all the situations as customer reported issues for their board design. Due to SPI block design constrain, the controller works in different mode internally depending on the clock. When clock is 30MHz or above, controller works in faster mode and cs 1 signal is used internally as feedback from the pad. So cs 1 pin must be brought out and configured as chip select function in pinmux. When clock is below 30MHz, only cs 0 pin is used. In summary when clock is below 30MHz, the workaround always works as only cs 0 is involved. For clock faster than 30MHz, it require cs1 pin used in the board design and configured as chip selection function. In a typical usage of SPI flash on cs 0 that normally runs at 100MHz, cs 1 pin must be used in the board design but this is not always the case for many customer boards. For voice daughtercard usage, the SoC alway uses internal cs 7 for voice card SPI control. Then it requires cs 0 pin as the dummy cs but board with voice design may not use cs 0 pin at all. The controller actually has a prepend feature that can combine multiple SPI transfers in a SPI message into one single transfer when the transfers meet certain requirements. So there is no need for keeping cs low when the message only has one transfer. Most of the SPI devices including SPI NOR, SPI NAND flash, Broadcom voice card and etc can use this feature without the dummy cs workaround. This patch makes the dummy cs workaround as an option based on the dts flag brcm,use-cs-workaround. By default dummy cs workaround is hard coded to enable. We will use the prepend feature and disable this workaround as default in the next patch of this series unless this flag is set in dts. Signed-off-by: William Zhang <william.zhang@broadcom.com> --- drivers/spi/spi-bcm63xx-hsspi.c | 55 +++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 19 deletions(-)