diff mbox

SPI: bcm2835: move to the transfer_one driver model

Message ID 30057D30-6A2D-4517-B374-76FF2448E455@martin.sperl.org (mailing list archive)
State Accepted
Commit e34ff011c70e5f4ef219141711142d5111ae6ebb
Headers show

Commit Message

Martin Sperl March 26, 2015, 10:08 a.m. UTC
This also allows for GPIO-CS to get used removing the limitation of
2/3 SPI devises on the SPI bus.

Fixes: spi-cs-high with native CS with multiple devices on the spi-bus
resetting the chip selects to "normal" polarity after a finished
transfer.

No other functionality/improvements added.

Tested with the following 4 devices on the spi-bus:
* mcp2515 with native CS
* mcp2515 with gpio CS
* fb_st7735r with native CS
    (plus spi-cs-high via transistor inverting polarity)
* enc28j60 with gpio-CS
Tested-by: Martin Sperl <kernel@martin.sperl.org>

Signed-off-by: Martin Sperl <kernel@martin.sperl.org>

---

Note that there is quite a bit of complexity involved to make the native 
CS work correctly.
Also a few future optimizations in the pipeline will only work reliably 
with gpio CS. 

So the question is if we should depreciate native chip-selects for this
driver with one of those future improvements listed below.

Here a list of planned future improvements:
* initially fill the FIFO without triggering an interrupt to do that work
* use polling for "short" transfers (<20us)
* implement DMA for longer transfers (>48 bytes) if tx_dma/rx_dma are set.
* implement can_dma et.al. for dma_mapping of transfers in the framework
* multiple spi_transfers handled in interrupt alone without waking up the
  worker-thread (for some transfers) to reduce context switching
  overheads and the corresponding latencies.

As for testing: I have also tried to test with mmc_spi, but I have not
been able to make that driver work reliably in any recent kernel
versions.
Most of the time I see timeouts - and with lots of different SD-cards...

IIRC the last time I tested it successfully was with 3.12.

So if someone has experience how to make it work on the RPI with a 
recent kernel, please let me know, so that I can extend my setup to add
this device/driver also to my test-scenario to increase diversity of the
test.

Martin

 drivers/spi/spi-bcm2835.c |  212 ++++++++++++++++++++++++++-------------------
 1 file changed, 124 insertions(+), 88 deletions(-)

Comments

Mark Brown March 26, 2015, 5:35 p.m. UTC | #1
On Thu, Mar 26, 2015 at 11:08:36AM +0100, Martin Sperl wrote:
> This also allows for GPIO-CS to get used removing the limitation of
> 2/3 SPI devises on the SPI bus.

This doesn't apply against current code, please check and resend.  As
previously mentioned please use subject lines matching the style for the
subsystem.  A few other things below but it's basically all good.

> So the question is if we should depreciate native chip-selects for this
> driver with one of those future improvements listed below.

Given that this driver is only going to be used with a single SoC (or
perhaps a very limited set of SoCs, I don't know if it's the same driver
in the Pi 2) even if the DT specifies a hardware chip select we should
be able to look up which pin that's brought out to and put it into GPIO
mode if that's the most sensible thing for the driver.

> * multiple spi_transfers handled in interrupt alone without waking up the
>   worker-thread (for some transfers) to reduce context switching
>   overheads and the corresponding latencies.

This in particular is something the framework should have: it's
generally useful and we should be able to do it with either a new
transfer_irq_safe() (or something) operation or a refactoring of the
existing one.

> +	/* error in the case of native CS requested with CS-id > 2 */
> +	dev_err(&spi->dev,
> +		"setup: only three native chip-selects are supported\n"
> +		);

The indentation here is weird - at least the last ); should be with the
string.
Stephen Warren March 26, 2015, 5:47 p.m. UTC | #2
On 03/26/2015 11:35 AM, Mark Brown wrote:
> On Thu, Mar 26, 2015 at 11:08:36AM +0100, Martin Sperl wrote:
>> This also allows for GPIO-CS to get used removing the limitation of
>> 2/3 SPI devises on the SPI bus.
>
> This doesn't apply against current code, please check and resend.  As
> previously mentioned please use subject lines matching the style for the
> subsystem.  A few other things below but it's basically all good.
>
>> So the question is if we should depreciate native chip-selects for this
>> driver with one of those future improvements listed below.
>
> Given that this driver is only going to be used with a single SoC (or
> perhaps a very limited set of SoCs, I don't know if it's the same driver
> in the Pi 2) even if the DT specifies a hardware chip select we should
> be able to look up which pin that's brought out to and put it into GPIO
> mode if that's the most sensible thing for the driver.

The Pi and Pi2 are essentially identical with the exception of the CPU, 
so yes this driver should run there too (and the GPIO and pinmux are 
identical too AFAIK, so any interaction there ought to work the same).
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Sperl March 26, 2015, 7:15 p.m. UTC | #3
> On 26.03.2015, at 18:35, Mark Brown <broonie@kernel.org> wrote:
> 
> On Thu, Mar 26, 2015 at 11:08:36AM +0100, Martin Sperl wrote:
>> This also allows for GPIO-CS to get used removing the limitation of
>> 2/3 SPI devises on the SPI bus.
> 
> This doesn't apply against current code, please check and resend.  As
> previously mentioned please use subject lines matching the style for the
> subsystem.  A few other things below but it's basically all good.

I have applied it against "for-next" (8befd715f1c1684b3).

The reason I chose for-next is because there all the other patches
I have submitted so far have been added there already...

Against which one do you want to apply it instead?

You want me to merge all those patches into a single patch instead?

>> So the question is if we should depreciate native chip-selects for this
>> driver with one of those future improvements listed below.
> 
> Given that this driver is only going to be used with a single SoC (or
> perhaps a very limited set of SoCs, I don't know if it's the same driver
> in the Pi 2) even if the DT specifies a hardware chip select we should
> be able to look up which pin that's brought out to and put it into GPIO
> mode if that's the most sensible thing for the driver.
The point here is that we need to change the alternate function for that
pin to output make it work.

That is why I was recommending the "simple" approach to "depreciate" 
that function if acceptable - less code to maintain...

Any yes: RPI2 uses the same SPI block and the same driver - the main 
difference is the arms

> 
>> * multiple spi_transfers handled in interrupt alone without waking up the
>>  worker-thread (for some transfers) to reduce context switching
>>  overheads and the corresponding latencies.
> 
> This in particular is something the framework should have: it's
> generally useful and we should be able to do it with either a new
> transfer_irq_safe() (or something) operation or a refactoring of the
> existing one.

Let me see what I can do with the current setup and then we can generalize
from that.

For most parts I see that I would use transfer_one and return 0 immediately
if we can "chain" it in interrupts - only the last transfer would trigger
a complete in the interrupt handler and return. We could even run the 
callback in the irq, if it is permitted...

But as I see there would be a few cases that would also need a complete.
These would be "delays" and maybe a CS-change - there mostly because of
the hard-coded "udelay(10)", but then it might be acceptable to sleep
10us in the irq, because the overhead of the irq handler releasing the CPU
is arround 5us and then you got another 3us inside the scheduler before
the worker-thread wakes up. If you take all that then you may as well sleep
in the interrupt handler - those at least are my measurements for
a RPI.

> 
>> +	/* error in the case of native CS requested with CS-id > 2 */
>> +	dev_err(&spi->dev,
>> +		"setup: only three native chip-selects are supported\n"
>> +		);
> 
> The indentation here is weird - at least the last ); should be with the
> string.

Will fix that when you tell me which branch you want it to get patch to 
apply against.

Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Brown March 27, 2015, 1:32 a.m. UTC | #4
On Thu, Mar 26, 2015 at 08:15:26PM +0100, Martin Sperl wrote:
> > On 26.03.2015, at 18:35, Mark Brown <broonie@kernel.org> wrote:

> > On Thu, Mar 26, 2015 at 11:08:36AM +0100, Martin Sperl wrote:
> >> This also allows for GPIO-CS to get used removing the limitation of
> >> 2/3 SPI devises on the SPI bus.

> > This doesn't apply against current code, please check and resend.  As
> > previously mentioned please use subject lines matching the style for the
> > subsystem.  A few other things below but it's basically all good.

> I have applied it against "for-next" (8befd715f1c1684b3).

> The reason I chose for-next is because there all the other patches
> I have submitted so far have been added there already...

> Against which one do you want to apply it instead?

I would expect to apply patches against the relevant topic branch if one
exists, if they don't apply there then mention the dependencies.  I've
managed to figure that out given the above and applied the patch, though
in addition to the issue with the subject line that I mentioned earlier
please (as I think also mentioned several times now) do not send patches
in reply to existing threads.

> You want me to merge all those patches into a single patch instead?

Please never do this, neither resend already applied patches nor combine
unrelated patches into a single patch.  Resending already applied
patches wastes people's time and combining unrelated patches makes
review harder and goes against the good practice covered in
SubmittingPatches.

> >> So the question is if we should depreciate native chip-selects for this
> >> driver with one of those future improvements listed below.

> > Given that this driver is only going to be used with a single SoC (or
> > perhaps a very limited set of SoCs, I don't know if it's the same driver
> > in the Pi 2) even if the DT specifies a hardware chip select we should
> > be able to look up which pin that's brought out to and put it into GPIO
> > mode if that's the most sensible thing for the driver.

> The point here is that we need to change the alternate function for that
> pin to output make it work.

Yes, I understand that.  To repeat we should (given that this only needs
to cope with a very limited range of systems) be able to figure this out
without DT.

> > This in particular is something the framework should have: it's
> > generally useful and we should be able to do it with either a new
> > transfer_irq_safe() (or something) operation or a refactoring of the
> > existing one.

> Let me see what I can do with the current setup and then we can generalize
> from that.

Pushing the next transfer immediately from the completion seems pretty
general?

> For most parts I see that I would use transfer_one and return 0 immediately
> if we can "chain" it in interrupts - only the last transfer would trigger
> a complete in the interrupt handler and return. We could even run the 
> callback in the irq, if it is permitted...

You can't rely on an existing transfer_one() being interrupt safe.

> But as I see there would be a few cases that would also need a complete.
> These would be "delays" and maybe a CS-change - there mostly because of
> the hard-coded "udelay(10)", but then it might be acceptable to sleep
> 10us in the irq, because the overhead of the irq handler releasing the CPU
> is arround 5us and then you got another 3us inside the scheduler before
> the worker-thread wakes up. If you take all that then you may as well sleep
> in the interrupt handler - those at least are my measurements for
> a RPI.

Yes, delays and /CS bouncing need to be punted to threads.

> >> +	/* error in the case of native CS requested with CS-id > 2 */
> >> +	dev_err(&spi->dev,
> >> +		"setup: only three native chip-selects are supported\n"
> >> +		);

> > The indentation here is weird - at least the last ); should be with the
> > string.

> Will fix that when you tell me which branch you want it to get patch to 
> apply against.

Please send an incremental patch.
Stephen Warren March 28, 2015, 4:09 a.m. UTC | #5
On 03/26/2015 04:08 AM, Martin Sperl wrote:
...
> ---
> 
> Note that there is quite a bit of complexity involved to make the native 
> CS work correctly.
> Also a few future optimizations in the pipeline will only work reliably 
> with gpio CS.

Can you expand on that a bit more?

Are you planning on implementing code in the driver so it always uses
GPIO CS even when GPIOs aren't specified in the DT, or disabling those
optimizations when native CS is in use?

> So the question is if we should depreciate native chip-selects for this
> driver with one of those future improvements listed below.

Only if you can make the driver transparently use GPIO CS mode even when
no GPIOs are specified in the DT. DT is an ABI, and old DTs need to
continue to work on newer kernels.

I haven't had a chance to look at the code in this patch yet.

> As for testing: I have also tried to test with mmc_spi, but I have not
> been able to make that driver work reliably in any recent kernel
> versions.
> Most of the time I see timeouts - and with lots of different SD-cards...
> 
> IIRC the last time I tested it successfully was with 3.12.

It'd be great if you could use "git bisect" to track down the change
that broke this.
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Sperl March 28, 2015, 6:11 p.m. UTC | #6
> On 28.03.2015, at 05:09, Stephen Warren <swarren@wwwdotorg.org> wrote:
> Can you expand on that a bit more?
> 
> Are you planning on implementing code in the driver so it always uses
> GPIO CS even when GPIOs aren't specified in the DT, or disabling those
> optimizations when native CS is in use?

>> So the question is if we should depreciate native chip-selects for this
>> driver with one of those future improvements listed below.
> 
> Only if you can make the driver transparently use GPIO CS mode even when
> no GPIOs are specified in the DT. DT is an ABI, and old DTs need to
> continue to work on newer kernels.
No I think I will just have a case where some optimization only get used
when using gpio - even though it adds a bit of complexity/code to maintain.

The code to alter the pin-mode from ALT1 to OUT is probably more complex
than just adding a "dev_warn" during probe to indicate that it is 
depreciated/no longer recommended and an if statement to discriminate the
situation.

If you know how to change the Mode of a pin on the fly, then we can add
that at a some point.

> 
>> As for testing: I have also tried to test with mmc_spi, but I have not
>> been able to make that driver work reliably in any recent kernel
>> versions.
>> Most of the time I see timeouts - and with lots of different SD-cards...
>> 
>> IIRC the last time I tested it successfully was with 3.12.
> 
> It'd be great if you could use "git bisect" to track down the change
> that broke this.

The problem is that it is a lot of kernel-versions I would have to test
to figure out why and with the rpi used for compiling the kernel it 
becomes a prohibitive long exercise...

Also I may have a slightly different setup now compared to when I did
the test initially, so this may be the trigger as well. 
Hence I was asking if someone had similar issues/has a working setup 
with an up to date kernel. (Also I think the last time I tried it was
still based on the foundation kernels before an upstream kernel existed)

I will give it a try running an old kernel, but it may take some time...



--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Warren March 29, 2015, 3:24 a.m. UTC | #7
On 03/28/2015 12:11 PM, Martin Sperl wrote:
>> On 28.03.2015, at 05:09, Stephen Warren <swarren@wwwdotorg.org> wrote:
...
>>> As for testing: I have also tried to test with mmc_spi, but I have not
>>> been able to make that driver work reliably in any recent kernel
>>> versions.
>>> Most of the time I see timeouts - and with lots of different SD-cards...
>>>
>>> IIRC the last time I tested it successfully was with 3.12.
>>
>> It'd be great if you could use "git bisect" to track down the change
>> that broke this.
> 
> The problem is that it is a lot of kernel-versions I would have to test
> to figure out why and with the rpi used for compiling the kernel it 
> becomes a prohibitive long exercise...

Why not just cross-compile from a faster machine? It probably only takes
a few minutes per commit to test (and git bisect is pretty optimal
w.r.t. the number of commits you need to test).

> Also I may have a slightly different setup now compared to when I did
> the test initially, so this may be the trigger as well. 

That's also pretty easy to test; just go back and retest the old kernel
version that you believe was working. If it's still working but the
latest kernel is broken, there's been a regression. If the old kernel
also doesn't work, then something has changed in your setup.
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Sperl March 29, 2015, 10:19 a.m. UTC | #8
> On 29.03.2015, at 05:24, Stephen Warren <swarren@wwwdotorg.org> wrote:
> Why not just cross-compile from a faster machine? It probably only takes
> a few minutes per commit to test (and git bisect is pretty optimal
> w.r.t. the number of commits you need to test).
This never was high on my priority and I never felt the need...

> That's also pretty easy to test; just go back and retest the old kernel
> version that you believe was working. If it's still working but the
> latest kernel is broken, there's been a regression. If the old kernel
> also doesn't work, then something has changed in your setup.
all obviously an option but time consuming and an effort...
Let us see if I find some time to test...

Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Sperl March 29, 2015, 9:44 p.m. UTC | #9
Hi Stephen!

> all obviously an option but time consuming and an effort...
> Let us see if I find some time to test...

I started with the foundation kernels (as I know I was working
with those at the time):
3.8.13  - d996a1b91b2bf3dc06f4f4f822a56f4496457aa1 - WORKS
3.9.11  - d5572370289f698b101f3d0198b1c99f17f0d278 - WORKS
3.10.38 - 1b49b450222df26e4abf7abb6d9302f72b2ed386 - FAIL
3.12.36 - ee9b8c7d46f2b1787b1e64604acafc70f70191cf - FAIL

I will now try the same with the mainline kernel comparing 3.9
and 3.10 and maybe I can come up with something...
(unfortunately this is still quite time-consuming - 
even cross-compiling)

Note that some comments in commit messages of rpi-linux talk 
about some backport of code from 3.13, so this may be the real
reason...

Here the commit inside the foundation kernel that I refer to:

commit cff74f34502d46a6160dba6572db5f936cfa80ae
Author: ghollingworth <gordon@raspberrypi.org>
Date:   Fri Mar 21 12:14:41 2014 +0000

    Support eMMC 5.1 cards

    Already upstream and in 3.13


From the diff of drivers/mmc it also looks as if there is more 
error-handling code in the generic mmc code which may trigger
now...

One other issue with the upstream kernel is that for 3.9 there is
no spi-bcm2835 driver, so testing against does not work - it only
got included with 3.10, so testing is futile...

I will try to see if the commit 0385850e67359838f7d63 which is
right before cff74f34502d46a6160db is working or not and then
check if that commit introduced the issue.

Martin



--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 3f93718..05be082 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -3,6 +3,7 @@ 
  *
  * Copyright (C) 2012 Chris Boot
  * Copyright (C) 2013 Stephen Warren
+ * Copyright (C) 2015 Martin Sperl
  *
  * This driver is inspired by:
  * spi-ath79.c, Copyright (C) 2009-2011 Gabor Juhos <juhosg@openwrt.org>
@@ -29,6 +30,7 @@ 
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/of_irq.h>
+#include <linux/of_gpio.h>
 #include <linux/of_device.h>
 #include <linux/spi/spi.h>
 
@@ -76,10 +78,10 @@  struct bcm2835_spi {
 	void __iomem *regs;
 	struct clk *clk;
 	int irq;
-	struct completion done;
 	const u8 *tx_buf;
 	u8 *rx_buf;
-	int len;
+	int tx_len;
+	int rx_len;
 };
 
 static inline u32 bcm2835_rd(struct bcm2835_spi *bs, unsigned reg)
@@ -96,10 +98,12 @@  static inline void bcm2835_rd_fifo(struct bcm2835_spi *bs)
 {
 	u8 byte;
 
-	while (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_RXD) {
+	while ((bs->rx_len) &&
+	       (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_RXD)) {
 		byte = bcm2835_rd(bs, BCM2835_SPI_FIFO);
 		if (bs->rx_buf)
 			*bs->rx_buf++ = byte;
+		bs->rx_len--;
 	}
 }
 
@@ -107,47 +111,60 @@  static inline void bcm2835_wr_fifo(struct bcm2835_spi *bs)
 {
 	u8 byte;
 
-	while ((bs->len) &&
+	while ((bs->tx_len) &&
 	       (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_TXD)) {
 		byte = bs->tx_buf ? *bs->tx_buf++ : 0;
 		bcm2835_wr(bs, BCM2835_SPI_FIFO, byte);
-		bs->len--;
+		bs->tx_len--;
 	}
 }
 
+static void bcm2835_spi_reset_hw(struct spi_master *master)
+{
+	struct bcm2835_spi *bs = spi_master_get_devdata(master);
+	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
+
+	/* Disable SPI interrupts and transfer */
+	cs &= ~(BCM2835_SPI_CS_INTR |
+		BCM2835_SPI_CS_INTD |
+		BCM2835_SPI_CS_TA);
+	/* and reset RX/TX FIFOS */
+	cs |= BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX;
+
+	/* and reset the SPI_HW */
+	bcm2835_wr(bs, BCM2835_SPI_CS, cs);
+}
+
 static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 {
 	struct spi_master *master = dev_id;
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
-	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
 
 	/* Read as many bytes as possible from FIFO */
 	bcm2835_rd_fifo(bs);
-
-	if (bs->len) { /* there is more data to transmit */
-		bcm2835_wr_fifo(bs);
-	} else { /* Transfer complete */
-		/* Disable SPI interrupts */
-		cs &= ~(BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD);
-		bcm2835_wr(bs, BCM2835_SPI_CS, cs);
-
-		/*
-		 * Wake up bcm2835_spi_transfer_one(), which will call
-		 * bcm2835_spi_finish_transfer(), to drain the RX FIFO.
-		 */
-		complete(&bs->done);
+	/* Write as many bytes as possible to FIFO */
+	bcm2835_wr_fifo(bs);
+
+	/* based on flags decide if we can finish the transfer */
+	if (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_DONE) {
+		/* Transfer complete - reset SPI HW */
+		bcm2835_spi_reset_hw(master);
+		/* wake up the framework */
+		complete(&master->xfer_completion);
 	}
 
 	return IRQ_HANDLED;
 }
 
-static int bcm2835_spi_start_transfer(struct spi_device *spi,
-				      struct spi_transfer *tfr)
+static int bcm2835_spi_transfer_one(struct spi_master *master,
+				    struct spi_device *spi,
+				    struct spi_transfer *tfr)
 {
-	struct bcm2835_spi *bs = spi_master_get_devdata(spi->master);
+	struct bcm2835_spi *bs = spi_master_get_devdata(master);
 	unsigned long spi_hz, clk_hz, cdiv;
-	u32 cs = BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD | BCM2835_SPI_CS_TA;
+	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
 
+	/* set clock */
 	spi_hz = tfr->speed_hz;
 	clk_hz = clk_get_rate(bs->clk);
 
@@ -163,100 +180,118 @@  static int bcm2835_spi_start_transfer(struct spi_device *spi,
 	} else {
 		cdiv = 0; /* 0 is the slowest we can go */
 	}
+	bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
 
+	/* handle all the modes */
 	if ((spi->mode & SPI_3WIRE) && (tfr->rx_buf))
 		cs |= BCM2835_SPI_CS_REN;
-
 	if (spi->mode & SPI_CPOL)
 		cs |= BCM2835_SPI_CS_CPOL;
 	if (spi->mode & SPI_CPHA)
 		cs |= BCM2835_SPI_CS_CPHA;
 
-	if (!(spi->mode & SPI_NO_CS)) {
-		if (spi->mode & SPI_CS_HIGH) {
-			cs |= BCM2835_SPI_CS_CSPOL;
-			cs |= BCM2835_SPI_CS_CSPOL0 << spi->chip_select;
-		}
-
-		cs |= spi->chip_select;
-	}
+	/* for gpio_cs set dummy CS so that no HW-CS get changed
+	 * we can not run this in bcm2835_spi_set_cs, as it does
+	 * not get called for cs_gpio cases, so we need to do it here
+	 */
+	if (gpio_is_valid(spi->cs_gpio) || (spi->mode & SPI_NO_CS))
+		cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
 
-	reinit_completion(&bs->done);
+	/* set transmit buffers and length */
 	bs->tx_buf = tfr->tx_buf;
 	bs->rx_buf = tfr->rx_buf;
-	bs->len = tfr->len;
+	bs->tx_len = tfr->len;
+	bs->rx_len = tfr->len;
 
-	bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
 	/*
 	 * Enable the HW block. This will immediately trigger a DONE (TX
 	 * empty) interrupt, upon which we will fill the TX FIFO with the
 	 * first TX bytes. Pre-filling the TX FIFO here to avoid the
 	 * interrupt doesn't work:-(
 	 */
+	cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD | BCM2835_SPI_CS_TA;
 	bcm2835_wr(bs, BCM2835_SPI_CS, cs);
 
-	return 0;
+	/* signal that we need to wait for completion */
+	return 1;
 }
 
-static int bcm2835_spi_finish_transfer(struct spi_device *spi,
-				       struct spi_transfer *tfr,
-				       bool cs_change)
+static void bcm2835_spi_handle_err(struct spi_master *master,
+				   struct spi_message *msg)
 {
-	struct bcm2835_spi *bs = spi_master_get_devdata(spi->master);
-	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
-
-	if (tfr->delay_usecs)
-		udelay(tfr->delay_usecs);
-
-	if (cs_change)
-		/* Clear TA flag */
-		bcm2835_wr(bs, BCM2835_SPI_CS, cs & ~BCM2835_SPI_CS_TA);
-
-	return 0;
+	bcm2835_spi_reset_hw(master);
 }
 
-static int bcm2835_spi_transfer_one(struct spi_master *master,
-				    struct spi_message *mesg)
+static void bcm2835_spi_set_cs(struct spi_device *spi, bool gpio_level)
 {
+	/*
+	 * we can assume that we are "native" as per spi_set_cs
+	 *   calling us ONLY when cs_gpio is not set
+	 * we can also assume that we are CS < 3 as per bcm2835_spi_setup
+	 *   we would not get called because of error handling there.
+	 * the level passed is the electrical level not enabled/disabled
+	 *   so it has to get translated back to enable/disable
+	 *   see spi_set_cs in spi.c for the implementation
+	 */
+
+	struct spi_master *master = spi->master;
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
-	struct spi_transfer *tfr;
-	struct spi_device *spi = mesg->spi;
-	int err = 0;
-	unsigned int timeout;
-	bool cs_change;
-
-	list_for_each_entry(tfr, &mesg->transfers, transfer_list) {
-		err = bcm2835_spi_start_transfer(spi, tfr);
-		if (err)
-			goto out;
-
-		timeout = wait_for_completion_timeout(
-			&bs->done,
-			msecs_to_jiffies(BCM2835_SPI_TIMEOUT_MS)
-			);
-		if (!timeout) {
-			err = -ETIMEDOUT;
-			goto out;
-		}
+	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
+	bool enable;
 
-		cs_change = tfr->cs_change ||
-			list_is_last(&tfr->transfer_list, &mesg->transfers);
+	/* calculate the enable flag from the passed gpio_level */
+	enable = (spi->mode & SPI_CS_HIGH) ? gpio_level : !gpio_level;
 
-		err = bcm2835_spi_finish_transfer(spi, tfr, cs_change);
-		if (err)
-			goto out;
+	/* set flags for "reverse" polarity in the registers */
+	if (spi->mode & SPI_CS_HIGH) {
+		/* set the correct CS-bits */
+		cs |= BCM2835_SPI_CS_CSPOL;
+		cs |= BCM2835_SPI_CS_CSPOL0 << spi->chip_select;
+	} else {
+		/* clean the CS-bits */
+		cs &= ~BCM2835_SPI_CS_CSPOL;
+		cs &= ~(BCM2835_SPI_CS_CSPOL0 << spi->chip_select);
+	}
 
-		mesg->actual_length += (tfr->len - bs->len);
+	/* select the correct chip_select depending on disabled/enabled */
+	if (enable) {
+		/* set cs correctly */
+		if (spi->mode & SPI_NO_CS) {
+			/* use the "undefined" chip-select */
+			cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
+		} else {
+			/* set the chip select */
+			cs &= ~(BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01);
+			cs |= spi->chip_select;
+		}
+	} else {
+		/* disable CSPOL which puts HW-CS into deselected state */
+		cs &= ~BCM2835_SPI_CS_CSPOL;
+		/* use the "undefined" chip-select as precaution */
+		cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
 	}
 
-out:
-	/* Clear FIFOs, and disable the HW block */
-	bcm2835_wr(bs, BCM2835_SPI_CS,
-		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
-	mesg->status = err;
-	spi_finalize_current_message(master);
+	/* finally set the calculated flags in SPI_CS */
+	bcm2835_wr(bs, BCM2835_SPI_CS, cs);
+}
 
-	return 0;
+static int bcm2835_spi_setup(struct spi_device *spi)
+{
+	/*
+	 * sanity checking the native-chipselects
+	 */
+	if (spi->mode & SPI_NO_CS)
+		return 0;
+	if (gpio_is_valid(spi->cs_gpio))
+		return 0;
+	if (spi->chip_select < 3)
+		return 0;
+
+	/* error in the case of native CS requested with CS-id > 2 */
+	dev_err(&spi->dev,
+		"setup: only three native chip-selects are supported\n"
+		);
+	return -EINVAL;
 }
 
 static int bcm2835_spi_probe(struct platform_device *pdev)
@@ -277,13 +312,14 @@  static int bcm2835_spi_probe(struct platform_device *pdev)
 	master->mode_bits = BCM2835_SPI_MODE_BITS;
 	master->bits_per_word_mask = SPI_BPW_MASK(8);
 	master->num_chipselect = 3;
-	master->transfer_one_message = bcm2835_spi_transfer_one;
+	master->setup = bcm2835_spi_setup;
+	master->set_cs = bcm2835_spi_set_cs;
+	master->transfer_one = bcm2835_spi_transfer_one;
+	master->handle_err = bcm2835_spi_handle_err;
 	master->dev.of_node = pdev->dev.of_node;
 
 	bs = spi_master_get_devdata(master);
 
-	init_completion(&bs->done);
-
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	bs->regs = devm_ioremap_resource(&pdev->dev, res);
 	if (IS_ERR(bs->regs)) {
@@ -314,7 +350,7 @@  static int bcm2835_spi_probe(struct platform_device *pdev)
 		goto out_clk_disable;
 	}
 
-	/* initialise the hardware */
+	/* initialise the hardware with the default polarities */
 	bcm2835_wr(bs, BCM2835_SPI_CS,
 		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);