Message ID | 20200615040557.2011-5-mark.tomlinson@alliedtelesis.co.nz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improvements to spi-bcm-qspi | expand |
On Mon, Jun 15, 2020 at 04:05:56PM +1200, Mark Tomlinson wrote: > When needing to send/receive data in small chunks, make this interrupt > driven rather than waiting for a completion event for each small section > of data. Again was this done for a reason and if so do we understand why doing this from interrupt context is safe - how long can the interrupts be when stuffing the FIFO from interrupt context? > @@ -731,12 +733,14 @@ static inline u16 read_rxram_slot_u16(struct bcm_qspi *qspi, int slot) > ((bcm_qspi_read(qspi, MSPI, msb_offset) & 0xff) << 8); > } > > -static void read_from_hw(struct bcm_qspi *qspi, int slots) > +static void read_from_hw(struct bcm_qspi *qspi) > { Things might be clearer if this refactoring were split out into a separate patch. > @@ -960,24 +966,21 @@ static int bcm_qspi_transfer_one(struct spi_master *master, > struct spi_transfer *trans) > { > struct bcm_qspi *qspi = spi_master_get_devdata(master); > - int slots; > - unsigned long timeo = msecs_to_jiffies(100); > + unsigned long timeo = msecs_to_jiffies(1000); That's a randomly chosen value - if we're now doing the entire transfer then we should be trying to estimate the length of time the transfer will take, for a very large transfer on a slow bus it's possible that even a second won't be enough. > - complete(&qspi->mspi_done); > + > + read_from_hw(qspi); > + > + if (qspi->trans_pos.trans) { > + write_to_hw(qspi); > + } else { > + complete(&qspi->mspi_done); > + spi_finalize_current_transfer(qspi->master); > + } > + This is adding a spi_finalize_current_transfer() which we didn't have before, and still leaving us doing cleanup work in the driver in another thread. This is confused, the driver should only need to finalize the transfer explicitly if it returned a timeout from transfer_one() but nothing's changed there.
On Mon, 2020-06-15 at 15:32 +0100, Mark Brown wrote: > On Mon, Jun 15, 2020 at 04:05:56PM +1200, Mark Tomlinson wrote: > > > When needing to send/receive data in small chunks, make this interrupt > > driven rather than waiting for a completion event for each small section > > of data. > > Again was this done for a reason and if so do we understand why doing > this from interrupt context is safe - how long can the interrupts be > when stuffing the FIFO from interrupt context? As I'm porting a Broadcom patch, I'm hoping someone else can add something to this. From the history it appears there was a hard limit (no small chunks), and this was changed to doing it in chunks with patch 345309fa7c0c92, apparently to improve performance. I believe this change further improves performance, but as the patch arrived without any documentation, I'm not certain. > > @@ -731,12 +733,14 @@ static inline u16 read_rxram_slot_u16(struct bcm_qspi *qspi, int slot) > > ((bcm_qspi_read(qspi, MSPI, msb_offset) & 0xff) << 8); > > } > > > > -static void read_from_hw(struct bcm_qspi *qspi, int slots) > > +static void read_from_hw(struct bcm_qspi *qspi) > > { > > Things might be clearer if this refactoring were split out into a > separate patch. Done. > > @@ -960,24 +966,21 @@ static int bcm_qspi_transfer_one(struct spi_master *master, > > struct spi_transfer *trans) > > { > > struct bcm_qspi *qspi = spi_master_get_devdata(master); > > - int slots; > > - unsigned long timeo = msecs_to_jiffies(100); > > + unsigned long timeo = msecs_to_jiffies(1000); > > That's a randomly chosen value - if we're now doing the entire transfer > then we should be trying to estimate the length of time the transfer > will take, for a very large transfer on a slow bus it's possible that > even a second won't be enough. > Again, the value came from Broadcom. Using the data length as an estimate sounds like a good idea. > > - complete(&qspi->mspi_done); > > + > > + read_from_hw(qspi); > > + > > + if (qspi->trans_pos.trans) { > > + write_to_hw(qspi); > > + } else { > > + complete(&qspi->mspi_done); > > + spi_finalize_current_transfer(qspi->master); > > + } > > + > > This is adding a spi_finalize_current_transfer() which we didn't have > before, and still leaving us doing cleanup work in the driver in another > thread. This is confused, the driver should only need to finalize the > transfer explicitly if it returned a timeout from transfer_one() but > nothing's changed there. I can remove the call to spi_finalize_current_transfer() from this patch. I'll try to check what does happen in the timeout case.
On Tue, Jun 16, 2020 at 03:07:17AM +0000, Mark Tomlinson wrote: > On Mon, 2020-06-15 at 15:32 +0100, Mark Brown wrote: > > Again was this done for a reason and if so do we understand why doing > > this from interrupt context is safe - how long can the interrupts be > > when stuffing the FIFO from interrupt context? > As I'm porting a Broadcom patch, I'm hoping someone else can add > something to this. From the history it appears there was a hard limit If you didn't write this code then it should have at least one signoff from the original source, I can't do anything with this without signoffs - please see Documentation/process/submitting-patches.rst for what this is and why it's important. > (no small chunks), and this was changed to doing it in chunks with > patch 345309fa7c0c92, apparently to improve performance. I believe this > change further improves performance, but as the patch arrived without > any documentation, I'm not certain. Have you tested the impact on performance?
diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c index ce30ebf27f06..0cc51bcda300 100644 --- a/drivers/spi/spi-bcm-qspi.c +++ b/drivers/spi/spi-bcm-qspi.c @@ -200,12 +200,14 @@ struct bcm_qspi_dev_id { struct qspi_trans { struct spi_transfer *trans; int byte; + int slots; bool mspi_last_trans; }; struct bcm_qspi { struct platform_device *pdev; struct spi_master *master; + struct spi_device *spi_dev; struct clk *clk; u32 base_clk; u32 max_speed_hz; @@ -731,12 +733,14 @@ static inline u16 read_rxram_slot_u16(struct bcm_qspi *qspi, int slot) ((bcm_qspi_read(qspi, MSPI, msb_offset) & 0xff) << 8); } -static void read_from_hw(struct bcm_qspi *qspi, int slots) +static void read_from_hw(struct bcm_qspi *qspi) { struct qspi_trans tp; - int slot; + int slot, slots; bcm_qspi_disable_bspi(qspi); + tp = qspi->trans_pos; + slots = tp.slots; if (slots > MSPI_NUM_CDRAM) { /* should never happen */ @@ -744,8 +748,6 @@ static void read_from_hw(struct bcm_qspi *qspi, int slots) return; } - tp = qspi->trans_pos; - for (slot = 0; slot < slots; slot++) { if (tp.trans->rx_buf) { if (tp.trans->bits_per_word <= 8) { @@ -803,11 +805,12 @@ static inline void write_cdram_slot(struct bcm_qspi *qspi, int slot, u32 val) } /* Return number of slots written */ -static int write_to_hw(struct bcm_qspi *qspi, struct spi_device *spi) +static int write_to_hw(struct bcm_qspi *qspi) { struct qspi_trans tp; int slot = 0, tstatus = 0; u32 mspi_cdram = 0; + struct spi_device *spi = qspi->spi_dev; bcm_qspi_disable_bspi(qspi); tp = qspi->trans_pos; @@ -846,6 +849,9 @@ static int write_to_hw(struct bcm_qspi *qspi, struct spi_device *spi) slot++; } + /* save slot number for read_from_hw() */ + qspi->trans_pos.slots = slot; + if (!slot) { dev_err(&qspi->pdev->dev, "%s: no data to send?", __func__); goto done; @@ -960,24 +966,21 @@ static int bcm_qspi_transfer_one(struct spi_master *master, struct spi_transfer *trans) { struct bcm_qspi *qspi = spi_master_get_devdata(master); - int slots; - unsigned long timeo = msecs_to_jiffies(100); + unsigned long timeo = msecs_to_jiffies(1000); if (!spi->cs_gpiod) bcm_qspi_chip_select(qspi, spi->chip_select); qspi->trans_pos.trans = trans; qspi->trans_pos.byte = 0; + qspi->spi_dev = spi; - while (qspi->trans_pos.byte < trans->len) { - reinit_completion(&qspi->mspi_done); + reinit_completion(&qspi->mspi_done); - slots = write_to_hw(qspi, spi); - if (!wait_for_completion_timeout(&qspi->mspi_done, timeo)) { - dev_err(&qspi->pdev->dev, "timeout waiting for MSPI\n"); - return -ETIMEDOUT; - } + write_to_hw(qspi); - read_from_hw(qspi, slots); + if (!wait_for_completion_timeout(&qspi->mspi_done, timeo)) { + dev_err(&qspi->pdev->dev, "timeout waiting for MSPI\n"); + return -ETIMEDOUT; } bcm_qspi_enable_bspi(qspi); @@ -1092,7 +1095,16 @@ static irqreturn_t bcm_qspi_mspi_l2_isr(int irq, void *dev_id) bcm_qspi_write(qspi, MSPI, MSPI_MSPI_STATUS, status); if (qspi->soc_intc) soc_intc->bcm_qspi_int_ack(soc_intc, MSPI_DONE); - complete(&qspi->mspi_done); + + read_from_hw(qspi); + + if (qspi->trans_pos.trans) { + write_to_hw(qspi); + } else { + complete(&qspi->mspi_done); + spi_finalize_current_transfer(qspi->master); + } + return IRQ_HANDLED; }