Message ID | 20230106200809.330769-8-william.zhang@broadcom.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | spi: bcm63xx-hsspi: driver and doc updates | expand |
On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote: > Polling mode provides better throughput in general by avoiding the > interrupt overhead as the maximum data size one interrupt can handle is > only 512 bytes. > When interrupt is not defined in the HSSPI dts node, driver switches to > polling mode for controller SPI message processing. Also add driver > banner message when the driver is loaded successfully. This should not be something the user selects via the DT, if the polling mode is better then the driver should just use it regardless of there being an interrupt wired up. Generally there's some point at which the benefits of polling become minimal (and the costs more impactful) but if the DMA setup is as bad as it sounds then the driver should just ignore the interrupt.
On 01/06/2023 01:47 PM, Mark Brown wrote: > On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote: > >> Polling mode provides better throughput in general by avoiding the >> interrupt overhead as the maximum data size one interrupt can handle is >> only 512 bytes. > >> When interrupt is not defined in the HSSPI dts node, driver switches to >> polling mode for controller SPI message processing. Also add driver >> banner message when the driver is loaded successfully. > > This should not be something the user selects via the DT, if the polling > mode is better then the driver should just use it regardless of there > being an interrupt wired up. Generally there's some point at which the > benefits of polling become minimal (and the costs more impactful) but if > the DMA setup is as bad as it sounds then the driver should just ignore > the interrupt. > I agree but this is more for backward compatibility as the original driver uses interrupt to work with MIPS based SoC and I do not want to change that behavior. For newer devices I added, they work in polling mode by default with the dts I supplied. IMHO it is also good to have this fallback option to use interrupt in case user is sensitive to cpu usage.
On Fri, Jan 06, 2023 at 07:35:59PM -0800, William Zhang wrote: > On 01/06/2023 01:47 PM, Mark Brown wrote: > > On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote: > > > When interrupt is not defined in the HSSPI dts node, driver switches to > > > polling mode for controller SPI message processing. Also add driver > > > banner message when the driver is loaded successfully. > > This should not be something the user selects via the DT, if the polling > > mode is better then the driver should just use it regardless of there > > being an interrupt wired up. Generally there's some point at which the > > benefits of polling become minimal (and the costs more impactful) but if > > the DMA setup is as bad as it sounds then the driver should just ignore > > the interrupt. > I agree but this is more for backward compatibility as the original driver > uses interrupt to work with MIPS based SoC and I do not want to change that > behavior. For newer devices I added, they work in polling mode by default > with the dts I supplied. IMHO it is also good to have this fallback option > to use interrupt in case user is sensitive to cpu usage. You can put whatever logic is needed in the code - for something like this an architecture based define isn't ideal but is probably good enough if need be (though I'd not be surprised if it turned out that there was also some performance benefit for the MIPS systems too, at least for smaller transfers).
On 01/09/2023 11:06 AM, Mark Brown wrote: > On Fri, Jan 06, 2023 at 07:35:59PM -0800, William Zhang wrote: >> On 01/06/2023 01:47 PM, Mark Brown wrote: >>> On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote: > >>>> When interrupt is not defined in the HSSPI dts node, driver switches to >>>> polling mode for controller SPI message processing. Also add driver >>>> banner message when the driver is loaded successfully. > >>> This should not be something the user selects via the DT, if the polling >>> mode is better then the driver should just use it regardless of there >>> being an interrupt wired up. Generally there's some point at which the >>> benefits of polling become minimal (and the costs more impactful) but if >>> the DMA setup is as bad as it sounds then the driver should just ignore >>> the interrupt. > >> I agree but this is more for backward compatibility as the original driver >> uses interrupt to work with MIPS based SoC and I do not want to change that >> behavior. For newer devices I added, they work in polling mode by default >> with the dts I supplied. IMHO it is also good to have this fallback option >> to use interrupt in case user is sensitive to cpu usage. > > You can put whatever logic is needed in the code - for something like > this an architecture based define isn't ideal but is probably good > enough if need be (though I'd not be surprised if it turned out that > there was also some performance benefit for the MIPS systems too, at > least for smaller transfers). > I just don't know what other logic I can put in the driver to select interrupt or polling mode. Only the end user know if performance or cpu usage is more important to their application.
On Mon, Jan 09, 2023 at 12:10:30PM -0800, William Zhang wrote: > On 01/09/2023 11:06 AM, Mark Brown wrote: > > You can put whatever logic is needed in the code - for something like > > this an architecture based define isn't ideal but is probably good > > enough if need be (though I'd not be surprised if it turned out that > > there was also some performance benefit for the MIPS systems too, at > > least for smaller transfers). > I just don't know what other logic I can put in the driver to select > interrupt or polling mode. Only the end user know if performance or cpu > usage is more important to their application. Usually you can take a reasonable guess as to what would be a good point to start switching, typically for short enough transfers the overhead of setting up DMA, waiting for interrupts and tearing things down is very much larger than the cost of just doing PIO - a bunch of other drivers have pick a number logic of some kind, often things like FIFO sizes are a good key for where to look. A lot of the time this is good enough, and it means that users have much better facilities for making tradeoffs if they have a range of transfer sizes available - it's not an either/or thing but based on some features of the individual message/transfer. It is true that for people with heavy SPI traffic or otherwise very demanding requirements for a specific system and software stack additional tuning might produce better results, exposing some sysfs knobs to allow tuning of parameters at runtime would be helpful for them and I'd certainly be happy to see that added.
On 01/10/2023 02:49 PM, Mark Brown wrote: > On Mon, Jan 09, 2023 at 12:10:30PM -0800, William Zhang wrote: >> On 01/09/2023 11:06 AM, Mark Brown wrote: > >>> You can put whatever logic is needed in the code - for something like >>> this an architecture based define isn't ideal but is probably good >>> enough if need be (though I'd not be surprised if it turned out that >>> there was also some performance benefit for the MIPS systems too, at >>> least for smaller transfers). > >> I just don't know what other logic I can put in the driver to select >> interrupt or polling mode. Only the end user know if performance or cpu >> usage is more important to their application. > > Usually you can take a reasonable guess as to what would be a good point > to start switching, typically for short enough transfers the overhead of > setting up DMA, waiting for interrupts and tearing things down is very > much larger than the cost of just doing PIO - a bunch of other drivers > have pick a number logic of some kind, often things like FIFO sizes are > a good key for where to look. A lot of the time this is good enough, > and it means that users have much better facilities for making tradeoffs > if they have a range of transfer sizes available - it's not an either/or > thing but based on some features of the individual message/transfer. > > It is true that for people with heavy SPI traffic or otherwise very > demanding requirements for a specific system and software stack > additional tuning might produce better results, exposing some sysfs > knobs to allow tuning of parameters at runtime would be helpful for them > and I'd certainly be happy to see that added. > Thanks for the explanation. I saw the spi-uniphier.c and spi-bcm2835.c doing the trick you mentioned(thanks Kursad for pointing out). In our case, even the maximum fifo size usage(512bytes), the polling still have better performance than interrupt. The MTD test result included in this patch is based on maximum fifo usage. So there is no benefit to switch to interrupt based on transfer size. Does the spi framework has any requirement on how much time that the driver's transfer_one function can spend? I can see the polling function might take quite some time in busy loop if the clock is low, for example, at 100Hz(slowest clock this controller can go), it takes 512x8/100Hz ~= 41ms to complete. If this is something in concern, I can do the interrupt switch based on a time limit value if interrupt is available.
On Wed, Jan 11, 2023 at 12:13:43PM -0800, William Zhang wrote: > Thanks for the explanation. I saw the spi-uniphier.c and spi-bcm2835.c doing > the trick you mentioned(thanks Kursad for pointing out). In our case, even > the maximum fifo size usage(512bytes), the polling still have better > performance than interrupt. The MTD test result included in this patch is > based on maximum fifo usage. So there is no benefit to switch to interrupt > based on transfer size. > Does the spi framework has any requirement on how much time that the > driver's transfer_one function can spend? I can see the polling function > might take quite some time in busy loop if the clock is low, for example, at > 100Hz(slowest clock this controller can go), it takes 512x8/100Hz ~= 41ms to > complete. If this is something in concern, I can do the interrupt switch > based on a time limit value if interrupt is available. There's no fixed limit, some hardware just doesn't have DMA. Some doesn't even have interrupts which is even better. If there's always a clear benefit for using PIO then just do that, perhaps creating a sysfs hook to allow people to switch to DMA if they need it (rather than requiring people to update their DT, this is really a runtime performance tradeoff rather than a description of the hardware). If there's a point at which the performance is very similar then it's probably worth switching to DMA there to lower the CPU usage, but if it's always faster to use PIO then just defaulting to PIO seems like the best option.
On 01/11/2023 02:41 PM, Mark Brown wrote: > On Wed, Jan 11, 2023 at 12:13:43PM -0800, William Zhang wrote: > >> Thanks for the explanation. I saw the spi-uniphier.c and spi-bcm2835.c doing >> the trick you mentioned(thanks Kursad for pointing out). In our case, even >> the maximum fifo size usage(512bytes), the polling still have better >> performance than interrupt. The MTD test result included in this patch is >> based on maximum fifo usage. So there is no benefit to switch to interrupt >> based on transfer size. > >> Does the spi framework has any requirement on how much time that the >> driver's transfer_one function can spend? I can see the polling function >> might take quite some time in busy loop if the clock is low, for example, at >> 100Hz(slowest clock this controller can go), it takes 512x8/100Hz ~= 41ms to >> complete. If this is something in concern, I can do the interrupt switch >> based on a time limit value if interrupt is available. > > There's no fixed limit, some hardware just doesn't have DMA. Some > doesn't even have interrupts which is even better. If there's always a > clear benefit for using PIO then just do that, perhaps creating a sysfs > hook to allow people to switch to DMA if they need it (rather than > requiring people to update their DT, this is really a runtime > performance tradeoff rather than a description of the hardware). If > there's a point at which the performance is very similar then it's > probably worth switching to DMA there to lower the CPU usage, but if > it's always faster to use PIO then just defaulting to PIO seems like the > best option. > Sure. I will add the sysfs option. Interrupt node is now required in dts but by default it will still runs at polling mode until sysfs option is set to use interrupt.
diff --git a/drivers/spi/spi-bcm63xx-hsspi.c b/drivers/spi/spi-bcm63xx-hsspi.c index 63682c8ff955..2b4cdf7e7002 100644 --- a/drivers/spi/spi-bcm63xx-hsspi.c +++ b/drivers/spi/spi-bcm63xx-hsspi.c @@ -57,6 +57,7 @@ #define PINGPONG_CMD_SS_SHIFT 12 #define HSSPI_PINGPONG_STATUS_REG(x) (0x84 + (x) * 0x40) +#define HSSPI_PINGPONG_STATUS_SRC_BUSY BIT(1) #define HSSPI_PROFILE_CLK_CTRL_REG(x) (0x100 + (x) * 0x20) #define CLK_CTRL_FREQ_CTRL_MASK 0x0000ffff @@ -96,6 +97,7 @@ #define HSSPI_SPI_MAX_CS 8 #define HSSPI_BUS_NUM 1 /* 0 is legacy SPI */ +#define HSSPI_POLL_STATUS_TIMEOUT_MS 100 struct bcm63xx_hsspi { struct completion done; @@ -109,6 +111,7 @@ struct bcm63xx_hsspi { u32 speed_hz; u8 cs_polarity; + int irq; }; static void bcm63xx_hsspi_set_cs(struct bcm63xx_hsspi *bs, unsigned int cs, @@ -163,6 +166,8 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) int step_size = HSSPI_BUFFER_LEN; const u8 *tx = t->tx_buf; u8 *rx = t->rx_buf; + u32 val; + unsigned long limit; bcm63xx_hsspi_set_clk(bs, spi, t->speed_hz); bcm63xx_hsspi_set_cs(bs, spi->chip_select, true); @@ -197,8 +202,9 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) __raw_writew(cpu_to_be16(opcode | curr_step), bs->fifo); /* enable interrupt */ - __raw_writel(HSSPI_PINGx_CMD_DONE(0), - bs->regs + HSSPI_INT_MASK_REG); + if (bs->irq > 0) + __raw_writel(HSSPI_PINGx_CMD_DONE(0), + bs->regs + HSSPI_INT_MASK_REG); /* start the transfer */ __raw_writel(!chip_select << PINGPONG_CMD_SS_SHIFT | @@ -206,9 +212,21 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) PINGPONG_COMMAND_START_NOW, bs->regs + HSSPI_PINGPONG_COMMAND_REG(0)); - if (wait_for_completion_timeout(&bs->done, HZ) == 0) { - dev_err(&bs->pdev->dev, "transfer timed out!\n"); - return -ETIMEDOUT; + if (bs->irq > 0) { + if (wait_for_completion_timeout(&bs->done, HZ) == 0) + goto err_timeout; + } else { + /* polling mode checks for status busy bit */ + limit = jiffies + msecs_to_jiffies(HSSPI_POLL_STATUS_TIMEOUT_MS); + while (!time_after(jiffies, limit)) { + val = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0)); + if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY) + cpu_relax(); + else + break; + } + if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY) + goto err_timeout; } if (rx) { @@ -220,6 +238,10 @@ static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t) } return 0; + +err_timeout: + dev_err(&bs->pdev->dev, "transfer timed out!\n"); + return -ETIMEDOUT; } static int bcm63xx_hsspi_setup(struct spi_device *spi) @@ -338,8 +360,8 @@ static int bcm63xx_hsspi_probe(struct platform_device *pdev) u32 reg, rate, num_cs = HSSPI_SPI_MAX_CS; struct reset_control *reset; - irq = platform_get_irq(pdev, 0); - if (irq < 0) + irq = platform_get_irq_optional(pdev, 0); + if (irq < 0 && irq != -ENXIO) return irq; regs = devm_platform_ioremap_resource(pdev, 0); @@ -398,6 +420,7 @@ static int bcm63xx_hsspi_probe(struct platform_device *pdev) bs->regs = regs; bs->speed_hz = rate; bs->fifo = (u8 __iomem *)(bs->regs + HSSPI_FIFO_REG(0)); + bs->irq = irq; mutex_init(&bs->bus_mutex); init_completion(&bs->done); @@ -434,11 +457,13 @@ static int bcm63xx_hsspi_probe(struct platform_device *pdev) __raw_writel(reg | GLOBAL_CTRL_CLK_GATE_SSOFF, bs->regs + HSSPI_GLOBAL_CTRL_REG); - ret = devm_request_irq(dev, irq, bcm63xx_hsspi_interrupt, IRQF_SHARED, - pdev->name, bs); + if (bs->irq > 0) { + ret = devm_request_irq(dev, irq, bcm63xx_hsspi_interrupt, IRQF_SHARED, + pdev->name, bs); - if (ret) - goto out_put_master; + if (ret) + goto out_put_master; + } pm_runtime_enable(&pdev->dev); @@ -447,6 +472,8 @@ static int bcm63xx_hsspi_probe(struct platform_device *pdev) if (ret) goto out_pm_disable; + dev_info(dev, "Broadcom 63XX High Speed SPI Controller driver"); + return 0; out_pm_disable:
Polling mode provides better throughput in general by avoiding the interrupt overhead as the maximum data size one interrupt can handle is only 512 bytes. When interrupt is not defined in the HSSPI dts node, driver switches to polling mode for controller SPI message processing. Also add driver banner message when the driver is loaded successfully. When test on a Broadcom BCM47622(ARM A7 dual core) reference board with WINBOND W25N01GV SPI NAND chip at 100MHz SPI clock using the MTD speed test suite, it shows about 15% improvement on the write and 30% on the read: ** Interrupt mode ** mtd_speedtest: MTD device: 0 count: 16 mtd_speedtest: MTD device size 134217728, eraseblock size 131072, page size 2048, count of eraseblocks 1024, pages per eraseblock 64, OOB size 64 mtd_test: scanning for bad eraseblocks mtd_test: scanned 16 eraseblocks, 0 are bad mtd_speedtest: testing eraseblock write speed mtd_speedtest: eraseblock write speed is 3072 KiB/s mtd_speedtest: testing eraseblock read speed mtd_speedtest: eraseblock read speed is 6690 KiB/s mtd_speedtest: testing page write speed mtd_speedtest: page write speed is 3066 KiB/s mtd_speedtest: testing page read speed mtd_speedtest: page read speed is 6762 KiB/s mtd_speedtest: testing 2 page write speed mtd_speedtest: 2 page write speed is 3071 KiB/s mtd_speedtest: testing 2 page read speed mtd_speedtest: 2 page read speed is 6772 KiB/s ** Polling mode ** mtd_speedtest: MTD device: 0 count: 16 mtd_speedtest: MTD device size 134217728, eraseblock size 131072, page size 2048, count of eraseblocks 1024, pages per eraseblock 64, OOB size 64 mtd_test: scanning for bad eraseblocks mtd_test: scanned 16 eraseblocks, 0 are bad mtd_speedtest: testing eraseblock write speed mtd_speedtest: eraseblock write speed is 3542 KiB/s mtd_speedtest: testing eraseblock read speed mtd_speedtest: eraseblock read speed is 8825 KiB/s mtd_speedtest: testing page write speed mtd_speedtest: page write speed is 3563 KiB/s mtd_speedtest: testing page read speed mtd_speedtest: page read speed is 8787 KiB/s mtd_speedtest: testing 2 page write speed mtd_speedtest: 2 page write speed is 3572 KiB/s mtd_speedtest: testing 2 page read speed mtd_speedtest: 2 page read speed is 8806 KiB/s Signed-off-by: William Zhang <william.zhang@broadcom.com> --- drivers/spi/spi-bcm63xx-hsspi.c | 49 +++++++++++++++++++++++++-------- 1 file changed, 38 insertions(+), 11 deletions(-)