Message ID | 20200522000806.7381-1-Sergey.Semin@baikalelectronics.ru (mailing list archive) |
---|---|
Headers | show |
Series | spi: dw: Add generic DW DMA controller support | expand |
On Fri, May 22, 2020 at 03:07:50AM +0300, Serge Semin wrote: > Since DMA transfers are performed asynchronously with actual SPI > transaction, then even if DMA transfers are finished it doesn't mean > all data is actually pushed to the SPI bus. Some data might still be > in the controller FIFO. This is specifically true for Tx-only > transfers. In this case if the next SPI transfer is recharged while > a tail of the previous one is still in FIFO, we'll loose that tail > data. In order to fix this lets add the wait procedure of the Tx/Rx > SPI transfers completion after the corresponding DMA transactions > are finished. ... > Fixes: 7063c0d942a1 ("spi/dw_spi: add DMA support") Usually we put this before any other tags. > Cc: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru> > Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> > Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> > Cc: Paul Burton <paulburton@kernel.org> > Cc: Ralf Baechle <ralf@linux-mips.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Rob Herring <robh+dt@kernel.org> Are you sure Rob needs this to see? You really need to shrink Cc lists of the patches to send them on common sense basis. > Cc: linux-mips@vger.kernel.org > Cc: devicetree@vger.kernel.org Ditto. ... > Changelog v4: > - Get back ndelay() method to wait for an SPI transfer completion. > spi_delay_exec() isn't suitable for the atomic context. OTOH we may teach spi_delay_exec() to perform atomic sleeps. ... > + while (dw_spi_dma_tx_busy(dws) && retry--) > + ndelay(ns); I might be mistaken, but I think I told that this one misses to keep power management in mind. Have you read Documentation/process/volatile-considered-harmful.rst ? ... > + while (dw_spi_dma_rx_busy(dws) && retry--) > + ndelay(ns); Ditto.
On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote: > On Fri, May 22, 2020 at 02:13:40PM +0300, Andy Shevchenko wrote: > > > Changelog v4: > > > - Get back ndelay() method to wait for an SPI transfer completion. > > > spi_delay_exec() isn't suitable for the atomic context. > > OTOH we may teach spi_delay_exec() to perform atomic sleeps. > Please, see it's implementation. It does atomic delay when the delay value > is less than 10us. But selectively gets to the usleep_range() if value is > greater than that. Yes, I hadn't realised this was in atomic context - _delay_exec() is just not safe to use there, it'll swich to a sleeping delay if the time is long enough. > > > + while (dw_spi_dma_tx_busy(dws) && retry--) > > > + ndelay(ns); > > I might be mistaken, but I think I told that this one misses to keep power > > management in mind. > Here we already in nearly atomic context due to the callback executed in the > tasklet. What power management could be during a tasklet execution? Again we > can't call sleeping methods in here. What do you suggest in substitution? You'd typically have a cpu_relax() in there as well as the ndelay().
On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote: > On Fri, May 22, 2020 at 02:13:40PM +0300, Andy Shevchenko wrote: > > On Fri, May 22, 2020 at 03:07:50AM +0300, Serge Semin wrote: > > > Since DMA transfers are performed asynchronously with actual SPI > > > transaction, then even if DMA transfers are finished it doesn't mean > > > all data is actually pushed to the SPI bus. Some data might still be > > > in the controller FIFO. This is specifically true for Tx-only > > > transfers. In this case if the next SPI transfer is recharged while > > > a tail of the previous one is still in FIFO, we'll loose that tail > > > data. In order to fix this lets add the wait procedure of the Tx/Rx > > > SPI transfers completion after the corresponding DMA transactions > > > are finished. ... > > > Changelog v4: > > > - Get back ndelay() method to wait for an SPI transfer completion. > > > spi_delay_exec() isn't suitable for the atomic context. > > > > OTOH we may teach spi_delay_exec() to perform atomic sleeps. > > Please, see it's implementation. It does atomic delay when the delay value > is less than 10us. But selectively gets to the usleep_range() if value is > greater than that. Oh, than it means we may do a very long busy loop here which is not good at all. If we have 10Hz clock, it might take seconds of doing nothing! ... > > > + while (dw_spi_dma_tx_busy(dws) && retry--) > > > + ndelay(ns); > > > > I might be mistaken, but I think I told that this one misses to keep power > > management in mind. > > Here we already in nearly atomic context due to the callback executed in the > tasklet. What power management could be during a tasklet execution? Again we > can't call sleeping methods in here. What do you suggest in substitution? > > > Have you read Documentation/process/volatile-considered-harmful.rst ? > > That's mentoring tone is redundant. Please, stop it. I simple gave you pointers to where you may read about power management in busy loops. Yes, I admit that documentation title and the relation to busy loops is not obvious.
On Fri, May 22, 2020 at 03:12:21PM +0300, Andy Shevchenko wrote: > On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote: > > Please, see it's implementation. It does atomic delay when the delay value > > is less than 10us. But selectively gets to the usleep_range() if value is > > greater than that. > Oh, than it means we may do a very long busy loop here which is not good at > all. If we have 10Hz clock, it might take seconds of doing nothing! Realistically it seems unlikely that the clock will be even as slow as double digit kHz though, and if we do I'd not be surprised to see other problems kicking in. It's definitely good to handle such things if we can but so long as everything is OK for realistic use cases I'm not sure it should be a blocker.
On Fri, May 22, 2020 at 01:18:20PM +0100, Mark Brown wrote: > On Fri, May 22, 2020 at 03:12:21PM +0300, Andy Shevchenko wrote: > > On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote: > > > > Please, see it's implementation. It does atomic delay when the delay value > > > is less than 10us. But selectively gets to the usleep_range() if value is > > > greater than that. > > > Oh, than it means we may do a very long busy loop here which is not good at > > all. If we have 10Hz clock, it might take seconds of doing nothing! > > Realistically it seems unlikely that the clock will be even as slow as > double digit kHz though, and if we do I'd not be surprised to see other > problems kicking in. It's definitely good to handle such things if we > can but so long as everything is OK for realistic use cases I'm not sure > it should be a blocker. Perhaps some kind of warning? Funny that using spi_delay_exec() will issue such a warning as a side effect of its implementation.
On Fri, May 22, 2020 at 05:00:25PM +0300, Serge Semin wrote: > On Fri, May 22, 2020 at 04:27:43PM +0300, Serge Semin wrote: > > On Fri, May 22, 2020 at 02:10:13PM +0100, Mark Brown wrote: > > > On Fri, May 22, 2020 at 03:44:06PM +0300, Serge Semin wrote: > > > > On Fri, May 22, 2020 at 03:34:27PM +0300, Andy Shevchenko wrote: ... > > > > > > Realistically it seems unlikely that the clock will be even as slow as > > > > > > double digit kHz though, and if we do I'd not be surprised to see other > > > > > > problems kicking in. It's definitely good to handle such things if we > > > > > > can but so long as everything is OK for realistic use cases I'm not sure > > > > > > it should be a blocker. > > > > > > > As I see it the only way to fix the problem for any use-case is to move the > > > > busy-wait loop out from the tasklet's callback, add a completion variable to the > > > > DW SPI data and wait for all the DMA transfers completion in the > > > > dw_spi_dma_transfer() method. Then execute both busy-wait loops (there we can > > > > use spi_delay_exec() since it's a work-thread) and call > > > > spi_finalize_current_transfer() after it. What do you think? > > > > > > I'm concerned that this will add latency for the common case to handle a > > > potential issue for unrealistically slow buses but yeah, if it's an > > > issue kicking up to task context is how you'd handle it. > > > > I am not that worried about the latency (most likely it'll be the same as > > before), but I am mostly concerned regarding a most likely need to re-implement > > a local version spi_transfer_wait(). We can't afford wait for the completion > > indefinitely here, so the wait_for_completion_timeout() should be used, for which > > I would have to calculate a decent timeout based on the transfer capabilities, > > etc. So basically it would mean to partly copy the spi_transfer_wait() to this > > module.( > > I'd also wait for Andy's suggestion regarding this, since he's been worried > about the delay length in the first place. So he may come up with a better > solution in this regard. The completion approach sounds quite heavy to me. Since we haven't got any report for such an issue, I prefer as simplest as possible approach. If we add might_sleep() wouldn't it be basically reimplementation of the spi_delay_exec() again? And second question, do you experience this warning on your system? My point is: let's warn and see if anybody comes with a bug report. We will solve an issue when it appears.
On Fri, May 22, 2020 at 05:45:42PM +0300, Serge Semin wrote: > On Fri, May 22, 2020 at 05:36:39PM +0300, Andy Shevchenko wrote: > > My point is: let's warn and see if anybody comes with a bug report. We will > > solve an issue when it appears. > In my environment the stack trace happened (strictly speaking it has been a > BUG() invoked due to the sleep_range() called within the tasklet) when SPI bus > had been enabled to work with !8MHz! clock. It's quite normal bus speed. > So we'll get the bug report pretty soon.) Right, that definitely needs to be fixed then - 8MHz is indeed a totally normal clock rate for SPI so people will hit it. I guess if there's a noticable performance hit to defer to thread then we could implement both and look at how long the delay is going to be to decide which to use, that's annoyingly complicated though so if the overhead is small enough we could just not bother.
On Fri, May 22, 2020 at 04:22:41PM +0100, Mark Brown wrote: > On Fri, May 22, 2020 at 05:45:42PM +0300, Serge Semin wrote: > > On Fri, May 22, 2020 at 05:36:39PM +0300, Andy Shevchenko wrote: > > > > My point is: let's warn and see if anybody comes with a bug report. We will > > > solve an issue when it appears. > > > In my environment the stack trace happened (strictly speaking it has been a > > BUG() invoked due to the sleep_range() called within the tasklet) when SPI bus > > had been enabled to work with !8MHz! clock. It's quite normal bus speed. > > So we'll get the bug report pretty soon.) > > Right, that definitely needs to be fixed then - 8MHz is indeed a totally > normal clock rate for SPI so people will hit it. I guess if there's a > noticable performance hit to defer to thread then we could implement > both and look at how long the delay is going to be to decide which to > use, that's annoyingly complicated though so if the overhead is small > enough we could just not bother. As I suggested before we can implement a solution without performance drop. Just wait for the DMA completion locally in the dw_spi_dma_transfer() method and return 0 instead of 1 from the transfer_one() callback. In that function we'll wait while DMA finishes its business, after that we can check the Tx/Rx FIFO emptiness and wait for the data to be completely transferred with delays or sleeps or whatever. There are several drawback of the solution: 1) We need to alter the dw_spi_transfer_one() method in a way one would return 0 instead of 1 (for DMA) so the generic spi_transfer_one_message() method would be aware that the transfer has been finished and it doesn't need to wait by calling the spi_transfer_wait() method. 2) Locally in the dw_spi_dma_transfer() I have to implement a method similar to the spi_transfer_wait(). It won't be that similar though. We can predict a completion timeout better in here due to using a more exact SPI bus frequency. Anyway in the rest of aspects the functions will be nearly the same. 3) Not using spi_transfer_wait() means we also have to locally add the SPI timeout statistics incremental. So to speak the local wait method will be like this: +static int dw_spi_dma_wait(struct dw_spi *dws, struct spi_transfer *xfer) +{ + struct spi_statistics *statm = &dws->master->statistics; + struct spi_statistics *stats = &dws->master->cur_msg->spi->statistics; + unsigned long ms = 1; + + ms = xfer->len * MSEC_PER_SEC * BITS_PER_BYTE; + ms /= xfer->effective_speed_hz; + ms += ms + 200; + + ms = wait_for_completion_timeout(&dws->xfer_completion, + msecs_to_jiffies(ms)); + + if (ms == 0) { + SPI_STATISTICS_INCREMENT_FIELD(statm, timedout); + SPI_STATISTICS_INCREMENT_FIELD(stats, timedout); + dev_err(&dws->master->cur_msg->spi->dev, + "SPI transfer timed out\n"); + return -ETIMEDOUT; + } +} NOTE Currently the DW APB SSI driver doesn't set xfer->effective_speed_hz, though as far as I can see that field exists there to be initialized by the SPI controller driver, right? If so, strange it isn't done in any SPI drivers... Then we can use that method to wait for the DMA transfers completion: +static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) +{ + ... + /* DMA channels/buffers preparation and the transfers execution */ + ... + + ret = dw_spi_dma_wait(dws, xfer); + if (ret) + return ret; + + ret = dw_spi_dma_wait_tx_done(dws); + if (ret) + return ret; + + ret = dw_spi_dma_wait_rx_done(dws); + if (ret) + return ret; + + return 0; +} What do think about this? If you don't mind I'll send this fixup separately from the patchset we discuss here, since it's going to be a series of patches. What would be better for you: implement it based on the current DW APB SSI driver, or on top of this patchset "spi: dw: Add generic DW DMA controller support" (it's being under review in this email thread) ? Anyway, if the fixup is getting to be that complicated, will it have to be backported to another stable kernels? -Sergey
On Sat, May 23, 2020 at 11:34:10AM +0300, Serge Semin wrote: > On Fri, May 22, 2020 at 04:22:41PM +0100, Mark Brown wrote: > > Right, that definitely needs to be fixed then - 8MHz is indeed a totally > > normal clock rate for SPI so people will hit it. I guess if there's a > > noticable performance hit to defer to thread then we could implement > > both and look at how long the delay is going to be to decide which to > > use, that's annoyingly complicated though so if the overhead is small > > enough we could just not bother. > As I suggested before we can implement a solution without performance drop. > Just wait for the DMA completion locally in the dw_spi_dma_transfer() method and > return 0 instead of 1 from the transfer_one() callback. In that function we'll > wait while DMA finishes its business, after that we can check the Tx/Rx FIFO > emptiness and wait for the data to be completely transferred with delays or > sleeps or whatever. No extra context switches there at least, that's the main issue. > NOTE Currently the DW APB SSI driver doesn't set xfer->effective_speed_hz, though as > far as I can see that field exists there to be initialized by the SPI controller > driver, right? If so, strange it isn't done in any SPI drivers... Yes. Not that many people are concerned about the exact timing it turns out, the work that was being used for never fully made it upstream. > What do think about this? Sure. > patchset "spi: dw: Add generic DW DMA controller support" (it's being under > review in this email thread) ? Anyway, if the fixup is getting to be that > complicated, will it have to be backported to another stable kernels? No, if it's too invasive it shouldn't be (though the stable people might decide they want it anyway these days :/ ).
On Mon, May 25, 2020 at 12:41:32PM +0100, Mark Brown wrote: > On Sat, May 23, 2020 at 11:34:10AM +0300, Serge Semin wrote: > > On Fri, May 22, 2020 at 04:22:41PM +0100, Mark Brown wrote: > > > > Right, that definitely needs to be fixed then - 8MHz is indeed a totally > > > normal clock rate for SPI so people will hit it. I guess if there's a > > > noticable performance hit to defer to thread then we could implement > > > both and look at how long the delay is going to be to decide which to > > > use, that's annoyingly complicated though so if the overhead is small > > > enough we could just not bother. > > > As I suggested before we can implement a solution without performance drop. > > Just wait for the DMA completion locally in the dw_spi_dma_transfer() method and > > return 0 instead of 1 from the transfer_one() callback. In that function we'll > > wait while DMA finishes its business, after that we can check the Tx/Rx FIFO > > emptiness and wait for the data to be completely transferred with delays or > > sleeps or whatever. > > No extra context switches there at least, that's the main issue. Right. There won't be extra context switch. > > > NOTE Currently the DW APB SSI driver doesn't set xfer->effective_speed_hz, though as > > far as I can see that field exists there to be initialized by the SPI controller > > driver, right? If so, strange it isn't done in any SPI drivers... > > Yes. Not that many people are concerned about the exact timing it turns > out, the work that was being used for never fully made it upstream. > > > What do think about this? > > Sure. Great. I'll send a new patchset soon. It'll fix the Tx/Rx non-empty issue in accordance with the proposed design. -Sergey > > > patchset "spi: dw: Add generic DW DMA controller support" (it's being under > > review in this email thread) ? Anyway, if the fixup is getting to be that > > complicated, will it have to be backported to another stable kernels? > > No, if it's too invasive it shouldn't be (though the stable people might > decide they want it anyway these days :/ ).