mbox series

[v4,00/16] spi: dw: Add generic DW DMA controller support

Message ID 20200522000806.7381-1-Sergey.Semin@baikalelectronics.ru (mailing list archive)
Headers show
Series spi: dw: Add generic DW DMA controller support | expand

Message

Serge Semin May 22, 2020, 12:07 a.m. UTC
Baikal-T1 SoC provides a DW DMA controller to perform low-speed peripherals
Mem-to-Dev and Dev-to-Mem transaction. This is also applicable to the DW
APB SSI devices embedded into the SoC. Currently the DMA-based transfers
are supported by the DW APB SPI driver only as a middle layer code for
Intel MID/Elkhart PCI devices. Seeing the same code can be used for normal
platform DMAC device we introduced a set of patches to fix it within this
series.

First of all we need to add the Tx and Rx DMA channels support into the DW
APB SSI binding. Then there are several fixes and cleanups provided as a
initial preparation for the Generic DMA support integration: add Tx/Rx
finish wait methods, clear DMAC register when done or stopped, Fix native
CS being unset, enable interrupts in accordance with DMA xfer mode,
discard static DW DMA slave structures, discard unused void priv pointer
and dma_width member of the dw_spi structure, provide the DMA Tx/Rx burst
length parametrisation and make sure it's optionally set in accordance
with the DMA max-burst capability.

In order to have the DW APB SSI MMIO driver working with DMA we need to
initialize the paddr field with the physical base address of the DW APB SSI
registers space. Then we unpin the Intel MID specific code from the
generic DMA one and placed it into the spi-dw-pci.c driver, which is a
better place for it anyway. After that the naming cleanups are performed
since the code is going to be used for a generic DMAC device. Finally the
Generic DMA initialization can be added to the generic version of the
DW APB SSI IP.

Last but not least we traditionally convert the legacy plain text-based
dt-binding file with yaml-based one and as a cherry on a cake replace
the manually written DebugFS registers read method with a ready-to-use
for the same purpose regset32 DebugFS interface usage.

This patchset is rebased and tested on the spi/for-next (5.7-rc5):
base-commit: fe9fce6b2cf3 ("Merge remote-tracking branch 'spi/for-5.8' into spi-next")

Link: https://lore.kernel.org/linux-spi/20200508132943.9826-1-Sergey.Semin@baikalelectronics.ru/
Changelog v2:
- Rebase on top of the spi repository for-next branch.
- Move bindings conversion patch to the tail of the series.
- Move fixes to the head of the series.
- Apply as many changes as possible to be applied the Generic DMA
  functionality support is added and the spi-dw-mid is moved to the
  spi-dw-dma driver.
- Discard patch "spi: dw: Fix dma_slave_config used partly uninitialized"
  since the problem has already been fixed.
- Add new patch "spi: dw: Discard unused void priv pointer".
- Add new patch "spi: dw: Discard dma_width member of the dw_spi structure".
  n_bytes member of the DW SPI data can be used instead.
- Build the DMA functionality into the DW APB SSI core if required instead
  of creating a separate kernel module.
- Use conditional statement instead of the ternary operator in the ref
  clock getter.

Link: https://lore.kernel.org/linux-spi/20200515104758.6934-1-Sergey.Semin@baikalelectronics.ru/
Changelog v3:
- Use spi_delay_exec() method to wait for the DMA operation completion.
- Explicitly initialize the dw_dma_slave members on stack.
- Discard the dws->fifo_len utilization in the Tx FIFO DMA threshold
  setting from the patch where we just add the default burst length
  constants.
- Use min() method to calculate the optimal burst values.
- Add new patch which moves the spi-dw.c source file to spi-dw-core.c in
  order to preserve the DW APB SSI core driver name.
- Add commas in the debugfs_reg32 structure initializer and after the last
  entry of the dw_spi_dbgfs_regs array.

Link: https://lore.kernel.org/linux-spi/20200521012206.14472-1-Sergey.Semin@baikalelectronics.ru
Changelog v4:
- Get back ndelay() method to wait for an SPI transfer completion.
  spi_delay_exec() isn't suitable for the atomic context.

Co-developed-by: Georgy Vlasov <Georgy.Vlasov@baikalelectronics.ru>
Signed-off-by: Georgy Vlasov <Georgy.Vlasov@baikalelectronics.ru>
Co-developed-by: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru>
Signed-off-by: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Maxim Kaurkin <Maxim.Kaurkin@baikalelectronics.ru>
Cc: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>
Cc: Ekaterina Skachko <Ekaterina.Skachko@baikalelectronics.ru>
Cc: Vadim Vlasov <V.Vlasov@baikalelectronics.ru>
Cc: Alexey Kolotnikov <Alexey.Kolotnikov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-spi@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Serge Semin (16):
  spi: dw: Add Tx/Rx finish wait methods to the MID DMA
  spi: dw: Enable interrupts in accordance with DMA xfer mode
  spi: dw: Discard static DW DMA slave structures
  spi: dw: Discard unused void priv pointer
  spi: dw: Discard dma_width member of the dw_spi structure
  spi: dw: Parameterize the DMA Rx/Tx burst length
  spi: dw: Use DMA max burst to set the request thresholds
  spi: dw: Fix Rx-only DMA transfers
  spi: dw: Add core suffix to the DW APB SSI core source file
  spi: dw: Move Non-DMA code to the DW PCIe-SPI driver
  spi: dw: Remove DW DMA code dependency from DW_DMAC_PCI
  spi: dw: Add DW SPI DMA/PCI/MMIO dependency on the DW SPI core
  spi: dw: Cleanup generic DW DMA code namings
  spi: dw: Add DMA support to the DW SPI MMIO driver
  spi: dw: Use regset32 DebugFS method to create regdump file
  dt-bindings: spi: Convert DW SPI binding to DT schema

 .../bindings/spi/snps,dw-apb-ssi.txt          |  44 ---
 .../bindings/spi/snps,dw-apb-ssi.yaml         | 127 +++++++++
 .../devicetree/bindings/spi/spi-dw.txt        |  24 --
 drivers/spi/Kconfig                           |  15 +-
 drivers/spi/Makefile                          |   5 +-
 drivers/spi/{spi-dw.c => spi-dw-core.c}       |  88 ++----
 drivers/spi/{spi-dw-mid.c => spi-dw-dma.c}    | 261 ++++++++++--------
 drivers/spi/spi-dw-mmio.c                     |   4 +
 drivers/spi/spi-dw-pci.c                      |  50 +++-
 drivers/spi/spi-dw.h                          |  33 ++-
 10 files changed, 392 insertions(+), 259 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt
 create mode 100644 Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml
 delete mode 100644 Documentation/devicetree/bindings/spi/spi-dw.txt
 rename drivers/spi/{spi-dw.c => spi-dw-core.c} (82%)
 rename drivers/spi/{spi-dw-mid.c => spi-dw-dma.c} (55%)

Comments

Andy Shevchenko May 22, 2020, 11:13 a.m. UTC | #1
On Fri, May 22, 2020 at 03:07:50AM +0300, Serge Semin wrote:
> Since DMA transfers are performed asynchronously with actual SPI
> transaction, then even if DMA transfers are finished it doesn't mean
> all data is actually pushed to the SPI bus. Some data might still be
> in the controller FIFO. This is specifically true for Tx-only
> transfers. In this case if the next SPI transfer is recharged while
> a tail of the previous one is still in FIFO, we'll loose that tail
> data. In order to fix this lets add the wait procedure of the Tx/Rx
> SPI transfers completion after the corresponding DMA transactions
> are finished.

...

> Fixes: 7063c0d942a1 ("spi/dw_spi: add DMA support")

Usually we put this before any other tags.

> Cc: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru>
> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: Paul Burton <paulburton@kernel.org>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Cc: Rob Herring <robh+dt@kernel.org>

Are you sure Rob needs this to see?
You really need to shrink Cc lists of the patches to send them on common sense basis.

> Cc: linux-mips@vger.kernel.org

> Cc: devicetree@vger.kernel.org

Ditto.

...

> Changelog v4:
> - Get back ndelay() method to wait for an SPI transfer completion.
>   spi_delay_exec() isn't suitable for the atomic context.

OTOH we may teach spi_delay_exec() to perform atomic sleeps.

...

> +	while (dw_spi_dma_tx_busy(dws) && retry--)
> +		ndelay(ns);

I might be mistaken, but I think I told that this one misses to keep power
management in mind.

Have you read Documentation/process/volatile-considered-harmful.rst ?

...

> +	while (dw_spi_dma_rx_busy(dws) && retry--)
> +		ndelay(ns);

Ditto.
Mark Brown May 22, 2020, 12:10 p.m. UTC | #2
On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote:
> On Fri, May 22, 2020 at 02:13:40PM +0300, Andy Shevchenko wrote:

> > > Changelog v4:
> > > - Get back ndelay() method to wait for an SPI transfer completion.
> > >   spi_delay_exec() isn't suitable for the atomic context.

> > OTOH we may teach spi_delay_exec() to perform atomic sleeps.

> Please, see it's implementation. It does atomic delay when the delay value
> is less than 10us. But selectively gets to the usleep_range() if value is
> greater than that.

Yes, I hadn't realised this was in atomic context - _delay_exec() is
just not safe to use there, it'll swich to a sleeping delay if the time
is long enough.

> > > +	while (dw_spi_dma_tx_busy(dws) && retry--)
> > > +		ndelay(ns);

> > I might be mistaken, but I think I told that this one misses to keep power
> > management in mind.

> Here we already in nearly atomic context due to the callback executed in the
> tasklet. What power management could be during a tasklet execution? Again we
> can't call sleeping methods in here. What do you suggest in substitution?

You'd typically have a cpu_relax() in there as well as the ndelay().
Andy Shevchenko May 22, 2020, 12:12 p.m. UTC | #3
On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote:
> On Fri, May 22, 2020 at 02:13:40PM +0300, Andy Shevchenko wrote:
> > On Fri, May 22, 2020 at 03:07:50AM +0300, Serge Semin wrote:
> > > Since DMA transfers are performed asynchronously with actual SPI
> > > transaction, then even if DMA transfers are finished it doesn't mean
> > > all data is actually pushed to the SPI bus. Some data might still be
> > > in the controller FIFO. This is specifically true for Tx-only
> > > transfers. In this case if the next SPI transfer is recharged while
> > > a tail of the previous one is still in FIFO, we'll loose that tail
> > > data. In order to fix this lets add the wait procedure of the Tx/Rx
> > > SPI transfers completion after the corresponding DMA transactions
> > > are finished.

...

> > > Changelog v4:
> > > - Get back ndelay() method to wait for an SPI transfer completion.
> > >   spi_delay_exec() isn't suitable for the atomic context.
> > 
> > OTOH we may teach spi_delay_exec() to perform atomic sleeps.
> 
> Please, see it's implementation. It does atomic delay when the delay value
> is less than 10us. But selectively gets to the usleep_range() if value is
> greater than that.

Oh, than it means we may do a very long busy loop here which is not good at
all. If we have 10Hz clock, it might take seconds of doing nothing!

...

> > > +	while (dw_spi_dma_tx_busy(dws) && retry--)
> > > +		ndelay(ns);
> > 
> > I might be mistaken, but I think I told that this one misses to keep power
> > management in mind.
> 
> Here we already in nearly atomic context due to the callback executed in the
> tasklet. What power management could be during a tasklet execution? Again we
> can't call sleeping methods in here. What do you suggest in substitution?
> 
> > Have you read Documentation/process/volatile-considered-harmful.rst ?
> 
> That's mentoring tone is redundant. Please, stop it.

I simple gave you pointers to where you may read about power management in busy
loops. Yes, I admit that documentation title and the relation to busy loops is
not obvious.
Mark Brown May 22, 2020, 12:18 p.m. UTC | #4
On Fri, May 22, 2020 at 03:12:21PM +0300, Andy Shevchenko wrote:
> On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote:

> > Please, see it's implementation. It does atomic delay when the delay value
> > is less than 10us. But selectively gets to the usleep_range() if value is
> > greater than that.

> Oh, than it means we may do a very long busy loop here which is not good at
> all. If we have 10Hz clock, it might take seconds of doing nothing!

Realistically it seems unlikely that the clock will be even as slow as
double digit kHz though, and if we do I'd not be surprised to see other
problems kicking in.  It's definitely good to handle such things if we
can but so long as everything is OK for realistic use cases I'm not sure
it should be a blocker.
Andy Shevchenko May 22, 2020, 12:34 p.m. UTC | #5
On Fri, May 22, 2020 at 01:18:20PM +0100, Mark Brown wrote:
> On Fri, May 22, 2020 at 03:12:21PM +0300, Andy Shevchenko wrote:
> > On Fri, May 22, 2020 at 02:52:35PM +0300, Serge Semin wrote:
> 
> > > Please, see it's implementation. It does atomic delay when the delay value
> > > is less than 10us. But selectively gets to the usleep_range() if value is
> > > greater than that.
> 
> > Oh, than it means we may do a very long busy loop here which is not good at
> > all. If we have 10Hz clock, it might take seconds of doing nothing!
> 
> Realistically it seems unlikely that the clock will be even as slow as
> double digit kHz though, and if we do I'd not be surprised to see other
> problems kicking in.  It's definitely good to handle such things if we
> can but so long as everything is OK for realistic use cases I'm not sure
> it should be a blocker.

Perhaps some kind of warning? Funny that using spi_delay_exec() will issue such
a warning as a side effect of its implementation.
Andy Shevchenko May 22, 2020, 2:36 p.m. UTC | #6
On Fri, May 22, 2020 at 05:00:25PM +0300, Serge Semin wrote:
> On Fri, May 22, 2020 at 04:27:43PM +0300, Serge Semin wrote:
> > On Fri, May 22, 2020 at 02:10:13PM +0100, Mark Brown wrote:
> > > On Fri, May 22, 2020 at 03:44:06PM +0300, Serge Semin wrote:
> > > > On Fri, May 22, 2020 at 03:34:27PM +0300, Andy Shevchenko wrote:

...

> > > > > > Realistically it seems unlikely that the clock will be even as slow as
> > > > > > double digit kHz though, and if we do I'd not be surprised to see other
> > > > > > problems kicking in.  It's definitely good to handle such things if we
> > > > > > can but so long as everything is OK for realistic use cases I'm not sure
> > > > > > it should be a blocker.
> > > 
> > > > As I see it the only way to fix the problem for any use-case is to move the
> > > > busy-wait loop out from the tasklet's callback, add a completion variable to the
> > > > DW SPI data and wait for all the DMA transfers completion in the
> > > > dw_spi_dma_transfer() method. Then execute both busy-wait loops (there we can
> > > > use spi_delay_exec() since it's a work-thread) and call
> > > > spi_finalize_current_transfer() after it. What do you think?
> > > 
> > > I'm concerned that this will add latency for the common case to handle a
> > > potential issue for unrealistically slow buses but yeah, if it's an
> > > issue kicking up to task context is how you'd handle it.
> > 
> > I am not that worried about the latency (most likely it'll be the same as
> > before), but I am mostly concerned regarding a most likely need to re-implement
> > a local version spi_transfer_wait(). We can't afford wait for the completion
> > indefinitely here, so the wait_for_completion_timeout() should be used, for which
> > I would have to calculate a decent timeout based on the transfer capabilities,
> > etc. So basically it would mean to partly copy the spi_transfer_wait() to this
> > module.(
> 
> I'd also wait for Andy's suggestion regarding this, since he's been worried
> about the delay length in the first place. So he may come up with a better
> solution in this regard.

The completion approach sounds quite heavy to me.

Since we haven't got any report for such an issue, I prefer as simplest as
possible approach.

If we add might_sleep() wouldn't it be basically reimplementation of the
spi_delay_exec() again?

And second question, do you experience this warning on your system?

My point is: let's warn and see if anybody comes with a bug report. We will
solve an issue when it appears.
Mark Brown May 22, 2020, 3:22 p.m. UTC | #7
On Fri, May 22, 2020 at 05:45:42PM +0300, Serge Semin wrote:
> On Fri, May 22, 2020 at 05:36:39PM +0300, Andy Shevchenko wrote:

> > My point is: let's warn and see if anybody comes with a bug report. We will
> > solve an issue when it appears.

> In my environment the stack trace happened (strictly speaking it has been a
> BUG() invoked due to the sleep_range() called within the tasklet) when SPI bus
> had been enabled to work with !8MHz! clock. It's quite normal bus speed.
> So we'll get the bug report pretty soon.)

Right, that definitely needs to be fixed then - 8MHz is indeed a totally
normal clock rate for SPI so people will hit it.  I guess if there's a
noticable performance hit to defer to thread then we could implement
both and look at how long the delay is going to be to decide which to
use, that's annoyingly complicated though so if the overhead is small
enough we could just not bother.
Serge Semin May 23, 2020, 8:34 a.m. UTC | #8
On Fri, May 22, 2020 at 04:22:41PM +0100, Mark Brown wrote:
> On Fri, May 22, 2020 at 05:45:42PM +0300, Serge Semin wrote:
> > On Fri, May 22, 2020 at 05:36:39PM +0300, Andy Shevchenko wrote:
> 
> > > My point is: let's warn and see if anybody comes with a bug report. We will
> > > solve an issue when it appears.
> 
> > In my environment the stack trace happened (strictly speaking it has been a
> > BUG() invoked due to the sleep_range() called within the tasklet) when SPI bus
> > had been enabled to work with !8MHz! clock. It's quite normal bus speed.
> > So we'll get the bug report pretty soon.)
> 
> Right, that definitely needs to be fixed then - 8MHz is indeed a totally
> normal clock rate for SPI so people will hit it.  I guess if there's a
> noticable performance hit to defer to thread then we could implement
> both and look at how long the delay is going to be to decide which to
> use, that's annoyingly complicated though so if the overhead is small
> enough we could just not bother.

As I suggested before we can implement a solution without performance drop.
Just wait for the DMA completion locally in the dw_spi_dma_transfer() method and
return 0 instead of 1 from the transfer_one() callback. In that function we'll
wait while DMA finishes its business, after that we can check the Tx/Rx FIFO
emptiness and wait for the data to be completely transferred with delays or
sleeps or whatever.

There are several drawback of the solution:
1) We need to alter the dw_spi_transfer_one() method in a way one would return
0 instead of 1 (for DMA) so the generic spi_transfer_one_message() method would
be aware that the transfer has been finished and it doesn't need to wait by
calling the spi_transfer_wait() method.
2) Locally in the dw_spi_dma_transfer() I have to implement a method similar
to the spi_transfer_wait(). It won't be that similar though. We can predict a
completion timeout better in here due to using a more exact SPI bus frequency.
Anyway in the rest of aspects the functions will be nearly the same. 
3) Not using spi_transfer_wait() means we also have to locally add the SPI
timeout statistics incremental.

So to speak the local wait method will be like this:

+static int dw_spi_dma_wait(struct dw_spi *dws, struct spi_transfer *xfer)
+{
+ 	struct spi_statistics *statm = &dws->master->statistics;
+	struct spi_statistics *stats = &dws->master->cur_msg->spi->statistics;
+	unsigned long ms = 1;
+
+	ms = xfer->len * MSEC_PER_SEC * BITS_PER_BYTE;
+	ms /= xfer->effective_speed_hz;
+	ms += ms + 200;
+
+	ms = wait_for_completion_timeout(&dws->xfer_completion,
+					msecs_to_jiffies(ms));
+
+	if (ms == 0) {
+		SPI_STATISTICS_INCREMENT_FIELD(statm, timedout);
+		SPI_STATISTICS_INCREMENT_FIELD(stats, timedout);
+		dev_err(&dws->master->cur_msg->spi->dev,
+			"SPI transfer timed out\n");
+			return -ETIMEDOUT;
+	}
+}

NOTE Currently the DW APB SSI driver doesn't set xfer->effective_speed_hz, though as
far as I can see that field exists there to be initialized by the SPI controller
driver, right? If so, strange it isn't done in any SPI drivers...

Then we can use that method to wait for the DMA transfers completion:

+static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer)
+{
+	...
+	/* DMA channels/buffers preparation and the transfers execution */
+	...
+
+	ret = dw_spi_dma_wait(dws, xfer);
+	if (ret)
+		return ret;
+
+	ret = dw_spi_dma_wait_tx_done(dws);
+	if (ret)
+		return ret;
+
+	ret = dw_spi_dma_wait_rx_done(dws);
+	if (ret)
+		return ret;
+
+	return 0;
+}

What do think about this?

If you don't mind I'll send this fixup separately from the patchset we discuss
here, since it's going to be a series of patches. What would be better for you:
implement it based on the current DW APB SSI driver, or on top of this
patchset "spi: dw: Add generic DW DMA controller support" (it's being under
review in this email thread) ? Anyway, if the fixup is getting to be that
complicated, will it have to be backported to another stable kernels?

-Sergey
Mark Brown May 25, 2020, 11:41 a.m. UTC | #9
On Sat, May 23, 2020 at 11:34:10AM +0300, Serge Semin wrote:
> On Fri, May 22, 2020 at 04:22:41PM +0100, Mark Brown wrote:

> > Right, that definitely needs to be fixed then - 8MHz is indeed a totally
> > normal clock rate for SPI so people will hit it.  I guess if there's a
> > noticable performance hit to defer to thread then we could implement
> > both and look at how long the delay is going to be to decide which to
> > use, that's annoyingly complicated though so if the overhead is small
> > enough we could just not bother.

> As I suggested before we can implement a solution without performance drop.
> Just wait for the DMA completion locally in the dw_spi_dma_transfer() method and
> return 0 instead of 1 from the transfer_one() callback. In that function we'll
> wait while DMA finishes its business, after that we can check the Tx/Rx FIFO
> emptiness and wait for the data to be completely transferred with delays or
> sleeps or whatever.

No extra context switches there at least, that's the main issue.

> NOTE Currently the DW APB SSI driver doesn't set xfer->effective_speed_hz, though as
> far as I can see that field exists there to be initialized by the SPI controller
> driver, right? If so, strange it isn't done in any SPI drivers...

Yes.  Not that many people are concerned about the exact timing it turns
out, the work that was being used for never fully made it upstream.

> What do think about this?

Sure.

> patchset "spi: dw: Add generic DW DMA controller support" (it's being under
> review in this email thread) ? Anyway, if the fixup is getting to be that
> complicated, will it have to be backported to another stable kernels?

No, if it's too invasive it shouldn't be (though the stable people might
decide they want it anyway these days :/ ).
Serge Semin May 25, 2020, 9:36 p.m. UTC | #10
On Mon, May 25, 2020 at 12:41:32PM +0100, Mark Brown wrote:
> On Sat, May 23, 2020 at 11:34:10AM +0300, Serge Semin wrote:
> > On Fri, May 22, 2020 at 04:22:41PM +0100, Mark Brown wrote:
> 
> > > Right, that definitely needs to be fixed then - 8MHz is indeed a totally
> > > normal clock rate for SPI so people will hit it.  I guess if there's a
> > > noticable performance hit to defer to thread then we could implement
> > > both and look at how long the delay is going to be to decide which to
> > > use, that's annoyingly complicated though so if the overhead is small
> > > enough we could just not bother.
> 
> > As I suggested before we can implement a solution without performance drop.
> > Just wait for the DMA completion locally in the dw_spi_dma_transfer() method and
> > return 0 instead of 1 from the transfer_one() callback. In that function we'll
> > wait while DMA finishes its business, after that we can check the Tx/Rx FIFO
> > emptiness and wait for the data to be completely transferred with delays or
> > sleeps or whatever.
> 
> No extra context switches there at least, that's the main issue.

Right. There won't be extra context switch.

> 
> > NOTE Currently the DW APB SSI driver doesn't set xfer->effective_speed_hz, though as
> > far as I can see that field exists there to be initialized by the SPI controller
> > driver, right? If so, strange it isn't done in any SPI drivers...
> 
> Yes.  Not that many people are concerned about the exact timing it turns
> out, the work that was being used for never fully made it upstream.
> 
> > What do think about this?
> 
> Sure.

Great. I'll send a new patchset soon. It'll fix the Tx/Rx non-empty issue in
accordance with the proposed design.

-Sergey

> 
> > patchset "spi: dw: Add generic DW DMA controller support" (it's being under
> > review in this email thread) ? Anyway, if the fixup is getting to be that
> > complicated, will it have to be backported to another stable kernels?
> 
> No, if it's too invasive it shouldn't be (though the stable people might
> decide they want it anyway these days :/ ).