Message ID | 20230616141225.2790073-1-miquel.raynal@bootlin.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | spi: atmel: Prevent false timeouts on long transfers | expand |
On Fri, Jun 16, 2023 at 04:12:25PM +0200, Miquel Raynal wrote: > -#define SPI_DMA_TIMEOUT (msecs_to_jiffies(1000)) > +#define SPI_DMA_MIN_TIMEOUT (msecs_to_jiffies(1000)) > +#define SPI_DMA_TIMEOUT_PER_10K (msecs_to_jiffies(4)) Given that we know the bus speed can't we just calculate this like other drivers do (we should probably add a helper TBH)?
Hi Mark, broonie@kernel.org wrote on Fri, 16 Jun 2023 15:20:27 +0100: > On Fri, Jun 16, 2023 at 04:12:25PM +0200, Miquel Raynal wrote: > > > -#define SPI_DMA_TIMEOUT (msecs_to_jiffies(1000)) > > +#define SPI_DMA_MIN_TIMEOUT (msecs_to_jiffies(1000)) > > +#define SPI_DMA_TIMEOUT_PER_10K (msecs_to_jiffies(4)) > > Given that we know the bus speed can't we just calculate this like other > drivers do (we should probably add a helper TBH)? I agree we should probably have some kind of easy-to-use helper to derive a decent timeout value. How do sound the heuristics proposed here to you ? That would be: timeout = 1s + 4ms/10k Thanks, Miquèl
On Fri, Jun 16, 2023 at 06:15:35PM +0200, Miquel Raynal wrote: > broonie@kernel.org wrote on Fri, 16 Jun 2023 15:20:27 +0100: > > On Fri, Jun 16, 2023 at 04:12:25PM +0200, Miquel Raynal wrote: > > > -#define SPI_DMA_TIMEOUT (msecs_to_jiffies(1000)) > > > +#define SPI_DMA_MIN_TIMEOUT (msecs_to_jiffies(1000)) > > > +#define SPI_DMA_TIMEOUT_PER_10K (msecs_to_jiffies(4)) > > Given that we know the bus speed can't we just calculate this like other > > drivers do (we should probably add a helper TBH)? > I agree we should probably have some kind of easy-to-use helper to > derive a decent timeout value. How do sound the heuristics > proposed here to you ? That would be: > timeout = 1s + 4ms/10k Like I say we should know the transfer speed so we can do better than 4ms/10k - we know how long it takes to clock out each byte, we can just multiply that by the size of the transfer then add some fudge factor for setup/teardown overhead. 1s feels pretty generous too. The sun6i driver for example does max(tfr->len * 8 * 2 / (tfr->speed_hz / 1000), 100U) and just doubles the length based timeout with a minimum of 100ms which seems reasonable.
Hi Mark, broonie@kernel.org wrote on Fri, 16 Jun 2023 17:43:06 +0100: > On Fri, Jun 16, 2023 at 06:15:35PM +0200, Miquel Raynal wrote: > > broonie@kernel.org wrote on Fri, 16 Jun 2023 15:20:27 +0100: > > > > On Fri, Jun 16, 2023 at 04:12:25PM +0200, Miquel Raynal wrote: > > > > > -#define SPI_DMA_TIMEOUT (msecs_to_jiffies(1000)) > > > > +#define SPI_DMA_MIN_TIMEOUT (msecs_to_jiffies(1000)) > > > > +#define SPI_DMA_TIMEOUT_PER_10K (msecs_to_jiffies(4)) > > > > Given that we know the bus speed can't we just calculate this like other > > > drivers do (we should probably add a helper TBH)? > > > I agree we should probably have some kind of easy-to-use helper to > > derive a decent timeout value. How do sound the heuristics > > proposed here to you ? That would be: > > > timeout = 1s + 4ms/10k > > Like I say we should know the transfer speed so we can do better than > 4ms/10k - we know how long it takes to clock out each byte, we can just > multiply that by the size of the transfer then add some fudge factor for > setup/teardown overhead. 1s feels pretty generous too. The sun6i > driver for example does > > max(tfr->len * 8 * 2 / (tfr->speed_hz / 1000), 100U) > > and just doubles the length based timeout with a minimum of 100ms which > seems reasonable. I already had issues with ~0.1s timeouts on NAND controllers, just because the machine was heavily loaded. I believe we should avoid too small timeouts, it does not make sense and make things worse under load. I'll have a look. Thanks, Miquèl
On Fri, Jun 16, 2023 at 06:59:06PM +0200, Miquel Raynal wrote: > broonie@kernel.org wrote on Fri, 16 Jun 2023 17:43:06 +0100: > > On Fri, Jun 16, 2023 at 06:15:35PM +0200, Miquel Raynal wrote: > > > broonie@kernel.org wrote on Fri, 16 Jun 2023 15:20:27 +0100: > > Like I say we should know the transfer speed so we can do better than > > 4ms/10k - we know how long it takes to clock out each byte, we can just > > multiply that by the size of the transfer then add some fudge factor for > > setup/teardown overhead. 1s feels pretty generous too. The sun6i > > driver for example does > > max(tfr->len * 8 * 2 / (tfr->speed_hz / 1000), 100U) > > and just doubles the length based timeout with a minimum of 100ms which > > seems reasonable. > I already had issues with ~0.1s timeouts on NAND controllers, just > because the machine was heavily loaded. I believe we should avoid too > small timeouts, it does not make sense and make things worse under load. Well, we can raise that minimum if it's causing issues - 500ms say? 1s does feel a bit extreme for short transfers (and note that we'll use more than 100ms for long enough transfers).
Hi Mark, broonie@kernel.org wrote on Fri, 16 Jun 2023 18:43:51 +0100: > On Fri, Jun 16, 2023 at 06:59:06PM +0200, Miquel Raynal wrote: > > broonie@kernel.org wrote on Fri, 16 Jun 2023 17:43:06 +0100: > > > On Fri, Jun 16, 2023 at 06:15:35PM +0200, Miquel Raynal wrote: > > > > broonie@kernel.org wrote on Fri, 16 Jun 2023 15:20:27 +0100: > > > > Like I say we should know the transfer speed so we can do better than > > > 4ms/10k - we know how long it takes to clock out each byte, we can just > > > multiply that by the size of the transfer then add some fudge factor for > > > setup/teardown overhead. 1s feels pretty generous too. The sun6i > > > driver for example does > > > > max(tfr->len * 8 * 2 / (tfr->speed_hz / 1000), 100U) > > > > and just doubles the length based timeout with a minimum of 100ms which > > > seems reasonable. > > > I already had issues with ~0.1s timeouts on NAND controllers, just > > because the machine was heavily loaded. I believe we should avoid too > > small timeouts, it does not make sense and make things worse under load. > > Well, we can raise that minimum if it's causing issues - 500ms say? 1s > does feel a bit extreme for short transfers (and note that we'll use > more than 100ms for long enough transfers). Sounds reasonable. I believe it's worth the try. Cheers, Miquèl
diff --git a/drivers/spi/spi-atmel.c b/drivers/spi/spi-atmel.c index c4f22d50dba5..00f269f955ef 100644 --- a/drivers/spi/spi-atmel.c +++ b/drivers/spi/spi-atmel.c @@ -233,7 +233,8 @@ */ #define DMA_MIN_BYTES 16 -#define SPI_DMA_TIMEOUT (msecs_to_jiffies(1000)) +#define SPI_DMA_MIN_TIMEOUT (msecs_to_jiffies(1000)) +#define SPI_DMA_TIMEOUT_PER_10K (msecs_to_jiffies(4)) #define AUTOSUSPEND_TIMEOUT 2000 @@ -1280,6 +1281,7 @@ static int atmel_spi_one_transfer(struct spi_master *master, int timeout; int ret; unsigned long dma_timeout; + long ret_timeout; as = spi_master_get_devdata(master); @@ -1308,6 +1310,11 @@ static int atmel_spi_one_transfer(struct spi_master *master, as->current_remaining_bytes = xfer->len; while (as->current_remaining_bytes) { reinit_completion(&as->xfer_completion); + /* If transfer is bigger than 10kiB, enlarge the timeout */ + dma_timeout = SPI_DMA_MIN_TIMEOUT; + if (as->current_remaining_bytes > 0x2800) + dma_timeout += (as->current_remaining_bytes / 0x2800) * + SPI_DMA_TIMEOUT_PER_10K; if (as->use_pdc) { atmel_spi_lock(as); @@ -1333,11 +1340,12 @@ static int atmel_spi_one_transfer(struct spi_master *master, atmel_spi_unlock(as); } - dma_timeout = wait_for_completion_timeout(&as->xfer_completion, - SPI_DMA_TIMEOUT); - if (WARN_ON(dma_timeout == 0)) { - dev_err(&spi->dev, "spi transfer timeout\n"); - as->done_status = -EIO; + ret_timeout = wait_for_completion_interruptible_timeout(&as->xfer_completion, + dma_timeout); + if (ret_timeout <= 0) { + dev_err(&spi->dev, "spi transfer %s\n", + !ret_timeout ? "timeout" : "canceled"); + as->done_status = ret_timeout < 0 ? ret_timeout : -EIO; } if (as->done_status)
A slow SPI bus clocks at ~20MHz, which means it would transfer about 2500 bytes per second with a single data line. Big transfers, like when dealing with flashes can easily reach a few MiB. The current DMA timeout is set to 1 second, which means any working transfer of about 4MiB will always be cancelled. With the above derivations, on a slow bus, we can assume every byte will take at most 0.4ms. Said otherwise, we could add 4ms to the 1-second timeout delay every 10kiB. On a 4MiB transfer, it would bring the timeout delay up to 2.6s which still seems rather acceptable for a timeout. The consequence of this is that long transfers might be allowed, which hence requires the need to interrupt the transfer if wanted by the user. We can hence switch to the _interruptible variant of wait_for_completion. This leads to a little bit more handling to also handle the interrupted case but looks really acceptable overall. While at it, we drop the useless, noisy and redundant WARN_ON() call. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- drivers/spi/spi-atmel.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)