Message ID | 20200618080459.v4.5.Ib1e6855405fc9c99916ab7c7dee84d73a8bf3d68@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | spi: spi-geni-qcom: Fixes / perf improvements | expand |
Quoting Douglas Anderson (2020-06-18 08:06:26) > The variable "cur_mcmd" kept track of our current state (idle, xfer, > cs, cancel). We don't really need it, so get rid of it. Instead: > * Use separate condition variables for "chip select done", "cancel > done", and "abort done". This is important so that if a "done" > comes through (perhaps some previous interrupt finally came through) > it can't confuse the cancel/abort function. > * Use the "done" interrupt only for when a chip select or transfer is > done and we can tell the difference by looking at whether "cur_xfer" > is NULL. > > This is mostly a no-op change. However, it is possible it could fix > an issue where a super delayed interrupt for a cancel command could > have confused our waiting for an abort command. > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Quoting Douglas Anderson (2020-06-18 08:06:26) > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi, > struct geni_se *se = &mas->se; > > spin_lock_irq(&mas->lock); > - reinit_completion(&mas->xfer_done); > - mas->cur_mcmd = CMD_CANCEL; > - geni_se_cancel_m_cmd(se); > + reinit_completion(&mas->cancel_done); > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > + mas->cur_xfer = NULL; BTW, is this necessary? It's subtlely placed here without a comment why. > + mas->tx_rem_bytes = mas->rx_rem_bytes = 0; > + geni_se_cancel_m_cmd(se); > spin_unlock_irq(&mas->lock); > - time_left = wait_for_completion_timeout(&mas->xfer_done, HZ);
Hi, On Thu, Jun 18, 2020 at 11:05 AM Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Douglas Anderson (2020-06-18 08:06:26) > > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi, > > struct geni_se *se = &mas->se; > > > > spin_lock_irq(&mas->lock); > > - reinit_completion(&mas->xfer_done); > > - mas->cur_mcmd = CMD_CANCEL; > > - geni_se_cancel_m_cmd(se); > > + reinit_completion(&mas->cancel_done); > > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > > + mas->cur_xfer = NULL; > > BTW, is this necessary? It's subtlely placed here without a comment why. I believe so. Now that we don't have the "cur_mcmd" we rely on cur_xfer being NULL to tell the difference between a "done" for chip select vs. a "done" for transfer. * When we start a transfer we set "cur_xfer" to a non-NULL pointer. When the transfer finishes we set it to NULL again. * When we start a chip select transfer we _don't_ explicitly set it to NULL because it should already be NULL. * When we are aborting a transfer we need to NULL so we can handle the chip select that will come next. I suppose it's possible that we could get by without without NULLing it because I believe when the "abort" IRQ finally fires then it will include a "DONE" and that would presumably NULL it out. ...but I guess if both the cancel and abort timed out and no IRQ ever fired then nothing would have NULLed it and the next chip select would be confused. Prior to getting rid of "cur_mcmd" this all wasn't needed because "cur_xfer" was only ever looked at if "cur_mcmd" was set to "CMD_XFER". One part of my change that is technically not related to the removal of "cur_mcmd" is the part where I do "mas->tx_rem_bytes = mas->rx_rem_bytes = 0;". I can split that as a separate change if you want but it seemed fine to just clean up this extra bit of state here. -Doug
Quoting Doug Anderson (2020-06-18 13:09:47) > On Thu, Jun 18, 2020 at 11:05 AM Stephen Boyd <swboyd@chromium.org> wrote: > > > > Quoting Douglas Anderson (2020-06-18 08:06:26) > > > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi, > > > struct geni_se *se = &mas->se; > > > > > > spin_lock_irq(&mas->lock); > > > - reinit_completion(&mas->xfer_done); > > > - mas->cur_mcmd = CMD_CANCEL; > > > - geni_se_cancel_m_cmd(se); > > > + reinit_completion(&mas->cancel_done); > > > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > > > + mas->cur_xfer = NULL; > > > > BTW, is this necessary? It's subtlely placed here without a comment why. > > I believe so. Now that we don't have the "cur_mcmd" we rely on > cur_xfer being NULL to tell the difference between a "done" for chip > select vs. a "done" for transfer. > > * When we start a transfer we set "cur_xfer" to a non-NULL pointer. > When the transfer finishes we set it to NULL again. > > * When we start a chip select transfer we _don't_ explicitly set it to > NULL because it should already be NULL. > > * When we are aborting a transfer we need to NULL so we can handle the > chip select that will come next. > > I suppose it's possible that we could get by without without NULLing > it because I believe when the "abort" IRQ finally fires then it will > include a "DONE" and that would presumably NULL it out. ...but I > guess if both the cancel and abort timed out and no IRQ ever fired > then nothing would have NULLed it and the next chip select would be > confused. I was going to say that we should set it NULL when starting CS but that is not as important as clearing it out when a cancel/abort is processing so that a stale transfer isn't kept around. > > Prior to getting rid of "cur_mcmd" this all wasn't needed because > "cur_xfer" was only ever looked at if "cur_mcmd" was set to > "CMD_XFER". > > > One part of my change that is technically not related to the removal > of "cur_mcmd" is the part where I do "mas->tx_rem_bytes = > mas->rx_rem_bytes = 0;". I can split that as a separate change if you > want but it seemed fine to just clean up this extra bit of state here. > How about a comment like this? -----8<---- diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c index d8f03ffb8594..670f83793aa4 100644 --- a/drivers/spi/spi-geni-qcom.c +++ b/drivers/spi/spi-geni-qcom.c @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi, spin_lock_irq(&mas->lock); reinit_completion(&mas->cancel_done); writel(0, se->base + SE_GENI_TX_WATERMARK_REG); + /* + * Make sure we don't finalize a spi transfer that timed out but + * came in while cancelling. + */ mas->cur_xfer = NULL; mas->tx_rem_bytes = mas->rx_rem_bytes = 0; geni_se_cancel_m_cmd(se);
Hi, On Thu, Jun 18, 2020 at 2:52 PM Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Doug Anderson (2020-06-18 13:09:47) > > On Thu, Jun 18, 2020 at 11:05 AM Stephen Boyd <swboyd@chromium.org> wrote: > > > > > > Quoting Douglas Anderson (2020-06-18 08:06:26) > > > > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi, > > > > struct geni_se *se = &mas->se; > > > > > > > > spin_lock_irq(&mas->lock); > > > > - reinit_completion(&mas->xfer_done); > > > > - mas->cur_mcmd = CMD_CANCEL; > > > > - geni_se_cancel_m_cmd(se); > > > > + reinit_completion(&mas->cancel_done); > > > > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > > > > + mas->cur_xfer = NULL; > > > > > > BTW, is this necessary? It's subtlely placed here without a comment why. > > > > I believe so. Now that we don't have the "cur_mcmd" we rely on > > cur_xfer being NULL to tell the difference between a "done" for chip > > select vs. a "done" for transfer. > > > > * When we start a transfer we set "cur_xfer" to a non-NULL pointer. > > When the transfer finishes we set it to NULL again. > > > > * When we start a chip select transfer we _don't_ explicitly set it to > > NULL because it should already be NULL. > > > > * When we are aborting a transfer we need to NULL so we can handle the > > chip select that will come next. > > > > I suppose it's possible that we could get by without without NULLing > > it because I believe when the "abort" IRQ finally fires then it will > > include a "DONE" and that would presumably NULL it out. ...but I > > guess if both the cancel and abort timed out and no IRQ ever fired > > then nothing would have NULLed it and the next chip select would be > > confused. > > I was going to say that we should set it NULL when starting CS but that > is not as important as clearing it out when a cancel/abort is processing > so that a stale transfer isn't kept around. > > > > > Prior to getting rid of "cur_mcmd" this all wasn't needed because > > "cur_xfer" was only ever looked at if "cur_mcmd" was set to > > "CMD_XFER". > > > > > > One part of my change that is technically not related to the removal > > of "cur_mcmd" is the part where I do "mas->tx_rem_bytes = > > mas->rx_rem_bytes = 0;". I can split that as a separate change if you > > want but it seemed fine to just clean up this extra bit of state here. > > > > How about a comment like this? > > -----8<---- > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > index d8f03ffb8594..670f83793aa4 100644 > --- a/drivers/spi/spi-geni-qcom.c > +++ b/drivers/spi/spi-geni-qcom.c > @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi, > spin_lock_irq(&mas->lock); > reinit_completion(&mas->cancel_done); > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > + /* > + * Make sure we don't finalize a spi transfer that timed out but > + * came in while cancelling. > + */ > mas->cur_xfer = NULL; > mas->tx_rem_bytes = mas->rx_rem_bytes = 0; > geni_se_cancel_m_cmd(se); Sure. It gets the point across, though spi_finalize_current_transfer() is actually pretty harmless if you call it while cancelling. It just calls a completion. I'd rather say something like "If we're here because the SPI controller was calling handle_err() then the transfer is done and we shouldn't hold onto it anymore". -Doug
Quoting Doug Anderson (2020-06-18 15:00:10) > On Thu, Jun 18, 2020 at 2:52 PM Stephen Boyd <swboyd@chromium.org> wrote: > > > > -----8<---- > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > > index d8f03ffb8594..670f83793aa4 100644 > > --- a/drivers/spi/spi-geni-qcom.c > > +++ b/drivers/spi/spi-geni-qcom.c > > @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi, > > spin_lock_irq(&mas->lock); > > reinit_completion(&mas->cancel_done); > > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > > + /* > > + * Make sure we don't finalize a spi transfer that timed out but > > + * came in while cancelling. > > + */ > > mas->cur_xfer = NULL; > > mas->tx_rem_bytes = mas->rx_rem_bytes = 0; > > geni_se_cancel_m_cmd(se); > > Sure. It gets the point across, though > spi_finalize_current_transfer() is actually pretty harmless if you > call it while cancelling. It just calls a completion. I'd rather say > something like "If we're here because the SPI controller was calling > handle_err() then the transfer is done and we shouldn't hold onto it > anymore". > Agreed it's mostly harmless. I thought the concern was that 'cur_xfer' may reference a freed piece of memory so it's best to remove ownership of the pointer from here so that the irq handler doesn't try to finalize a transfer that may no longer exist. "Shouldn't hold onto it anymore" doesn't tell us why it shouldn't be held onto, leaving it to the reader to figure out why, which isn't good.
Hi, On Thu, Jun 18, 2020 at 4:37 PM Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Doug Anderson (2020-06-18 15:00:10) > > On Thu, Jun 18, 2020 at 2:52 PM Stephen Boyd <swboyd@chromium.org> wrote: > > > > > > -----8<---- > > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > > > index d8f03ffb8594..670f83793aa4 100644 > > > --- a/drivers/spi/spi-geni-qcom.c > > > +++ b/drivers/spi/spi-geni-qcom.c > > > @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi, > > > spin_lock_irq(&mas->lock); > > > reinit_completion(&mas->cancel_done); > > > writel(0, se->base + SE_GENI_TX_WATERMARK_REG); > > > + /* > > > + * Make sure we don't finalize a spi transfer that timed out but > > > + * came in while cancelling. > > > + */ > > > mas->cur_xfer = NULL; > > > mas->tx_rem_bytes = mas->rx_rem_bytes = 0; > > > geni_se_cancel_m_cmd(se); > > > > Sure. It gets the point across, though > > spi_finalize_current_transfer() is actually pretty harmless if you > > call it while cancelling. It just calls a completion. I'd rather say > > something like "If we're here because the SPI controller was calling > > handle_err() then the transfer is done and we shouldn't hold onto it > > anymore". > > > > Agreed it's mostly harmless. I thought the concern was that 'cur_xfer' > may reference a freed piece of memory so it's best to remove ownership > of the pointer from here so that the irq handler doesn't try to finalize > a transfer that may no longer exist. "Shouldn't hold onto it anymore" > doesn't tell us why it shouldn't be held onto, leaving it to the reader > to figure out why, which isn't good. Right. The point is that 'cur_xfer' isn't valid anymore after handle_err() finishes so we shouldn't hold the pointer. I'm OK with your wording and am happy if Mark squashes it when he applies or I can send out a new version soon. -Doug
diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c index 5b8ca8b93b06..0c534d151370 100644 --- a/drivers/spi/spi-geni-qcom.c +++ b/drivers/spi/spi-geni-qcom.c @@ -63,13 +63,6 @@ #define TIMESTAMP_AFTER BIT(3) #define POST_CMD_DELAY BIT(4) -enum spi_m_cmd_opcode { - CMD_NONE, - CMD_XFER, - CMD_CS, - CMD_CANCEL, -}; - struct spi_geni_master { struct geni_se se; struct device *dev; @@ -81,10 +74,11 @@ struct spi_geni_master { unsigned int tx_rem_bytes; unsigned int rx_rem_bytes; const struct spi_transfer *cur_xfer; - struct completion xfer_done; + struct completion cs_done; + struct completion cancel_done; + struct completion abort_done; unsigned int oversampling; spinlock_t lock; - enum spi_m_cmd_opcode cur_mcmd; int irq; }; @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi, struct geni_se *se = &mas->se; spin_lock_irq(&mas->lock); - reinit_completion(&mas->xfer_done); - mas->cur_mcmd = CMD_CANCEL; - geni_se_cancel_m_cmd(se); + reinit_completion(&mas->cancel_done); writel(0, se->base + SE_GENI_TX_WATERMARK_REG); + mas->cur_xfer = NULL; + mas->tx_rem_bytes = mas->rx_rem_bytes = 0; + geni_se_cancel_m_cmd(se); spin_unlock_irq(&mas->lock); - time_left = wait_for_completion_timeout(&mas->xfer_done, HZ); + + time_left = wait_for_completion_timeout(&mas->cancel_done, HZ); if (time_left) return; spin_lock_irq(&mas->lock); - reinit_completion(&mas->xfer_done); + reinit_completion(&mas->abort_done); geni_se_abort_m_cmd(se); spin_unlock_irq(&mas->lock); - time_left = wait_for_completion_timeout(&mas->xfer_done, HZ); + + time_left = wait_for_completion_timeout(&mas->abort_done, HZ); if (!time_left) dev_err(mas->dev, "Failed to cancel/abort m_cmd\n"); } @@ -156,15 +153,14 @@ static void spi_geni_set_cs(struct spi_device *slv, bool set_flag) set_flag = !set_flag; spin_lock_irq(&mas->lock); - reinit_completion(&mas->xfer_done); - mas->cur_mcmd = CMD_CS; + reinit_completion(&mas->cs_done); if (set_flag) geni_se_setup_m_cmd(se, SPI_CS_ASSERT, 0); else geni_se_setup_m_cmd(se, SPI_CS_DEASSERT, 0); spin_unlock_irq(&mas->lock); - time_left = wait_for_completion_timeout(&mas->xfer_done, HZ); + time_left = wait_for_completion_timeout(&mas->cs_done, HZ); if (!time_left) handle_fifo_timeout(spi, NULL); @@ -383,7 +379,6 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, mas->rx_rem_bytes = xfer->len; } writel(spi_tx_cfg, se->base + SE_SPI_TRANS_CFG); - mas->cur_mcmd = CMD_XFER; /* * Lock around right before we start the transfer since our @@ -520,11 +515,13 @@ static irqreturn_t geni_spi_isr(int irq, void *data) geni_spi_handle_tx(mas); if (m_irq & M_CMD_DONE_EN) { - if (mas->cur_mcmd == CMD_XFER) + if (mas->cur_xfer) { spi_finalize_current_transfer(spi); - else if (mas->cur_mcmd == CMD_CS) - complete(&mas->xfer_done); - mas->cur_mcmd = CMD_NONE; + mas->cur_xfer = NULL; + } else { + complete(&mas->cs_done); + } + /* * If this happens, then a CMD_DONE came before all the Tx * buffer bytes were sent out. This is unusual, log this @@ -546,10 +543,10 @@ static irqreturn_t geni_spi_isr(int irq, void *data) mas->rx_rem_bytes, mas->cur_bits_per_word); } - if ((m_irq & M_CMD_CANCEL_EN) || (m_irq & M_CMD_ABORT_EN)) { - mas->cur_mcmd = CMD_NONE; - complete(&mas->xfer_done); - } + if (m_irq & M_CMD_CANCEL_EN) + complete(&mas->cancel_done); + if (m_irq & M_CMD_ABORT_EN) + complete(&mas->abort_done); /* * It's safe or a good idea to Ack all of our our interrupts at the @@ -617,7 +614,9 @@ static int spi_geni_probe(struct platform_device *pdev) spi->handle_err = handle_fifo_timeout; spi->set_cs = spi_geni_set_cs; - init_completion(&mas->xfer_done); + init_completion(&mas->cs_done); + init_completion(&mas->cancel_done); + init_completion(&mas->abort_done); spin_lock_init(&mas->lock); pm_runtime_enable(dev);
The variable "cur_mcmd" kept track of our current state (idle, xfer, cs, cancel). We don't really need it, so get rid of it. Instead: * Use separate condition variables for "chip select done", "cancel done", and "abort done". This is important so that if a "done" comes through (perhaps some previous interrupt finally came through) it can't confuse the cancel/abort function. * Use the "done" interrupt only for when a chip select or transfer is done and we can tell the difference by looking at whether "cur_xfer" is NULL. This is mostly a no-op change. However, it is possible it could fix an issue where a super delayed interrupt for a cancel command could have confused our waiting for an abort command. Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v4: None Changes in v3: - ("spi: spi-geni-qcom: Don't keep a local state variable") new in v3. Changes in v2: None drivers/spi/spi-geni-qcom.c | 55 ++++++++++++++++++------------------- 1 file changed, 27 insertions(+), 28 deletions(-)