diff mbox

[1/3] spi: xilinx: Detect stall with Unknown commands

Message ID 20171121090904.6901-1-ricardo.ribalda@gmail.com (mailing list archive)
State Accepted
Commit 5a1314fa697fc65cefaba64cd4699bfc3e6882a6
Headers show

Commit Message

Ricardo Ribalda Delgado Nov. 21, 2017, 9:09 a.m. UTC
When the core is configured in C_SPI_MODE > 0, it integrates a
lookup table that automatically configures the core in dual or quad mode
based on the command (first byte on the tx fifo).

Unfortunately, that list mode_?_memoy_*.mif does not contain all the
supported commands by the flash.

Since 4.14 spi-nor automatically tries to probe the flash using SFDP
(command 0x5a), and that command is not part of the list_mode table.

Whit the right combination of C_SPI_MODE and C_SPI_MEMORY this leads
into a stall that can only be recovered with a soft rest.

This patch detects this kind of stall and returns -EIO to the caller on
those commands. spi-nor can handle this error properly:

m25p80 spi0.0: Detected stall. Check C_SPI_MODE and C_SPI_MEMORY. 0x21 0x2404
m25p80 spi0.0: SPI transfer failed: -5
spi_master spi0: failed to transfer one message from queue
m25p80 spi0.0: s25sl064p (8192 Kbytes)

Cc: Mark Brown <broonie@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
---
 drivers/spi/spi-xilinx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Mark Brown Nov. 22, 2017, 11:12 a.m. UTC | #1
On Tue, Nov 21, 2017 at 10:09:02AM +0100, Ricardo Ribalda Delgado wrote:

> +		stalled = 10;
>  		while (rx_words) {
> +			if (rx_words == n_words && !(stalled--) &&
> +			    !(sr & XSPI_SR_TX_EMPTY_MASK) &&
> +			    (sr & XSPI_SR_RX_EMPTY_MASK)) {

10 seems like a small number for what is essentially just a busy spin -
are we sure that we won't reasonably hit a case where the CPU is
sufficiently fast and the bus sufficiently slow we falsely detect a
stall?  Where did this number come from?
Ricardo Ribalda Delgado Nov. 22, 2017, 7:25 p.m. UTC | #2
Hi Mark


On Wed, Nov 22, 2017 at 12:12 PM, Mark Brown <broonie@kernel.org> wrote:
> On Tue, Nov 21, 2017 at 10:09:02AM +0100, Ricardo Ribalda Delgado wrote:
>
>> +             stalled = 10;
>>               while (rx_words) {
>> +                     if (rx_words == n_words && !(stalled--) &&
>> +                         !(sr & XSPI_SR_TX_EMPTY_MASK) &&
>> +                         (sr & XSPI_SR_RX_EMPTY_MASK)) {
>
> 10 seems like a small number for what is essentially just a busy spin -
> are we sure that we won't reasonably hit a case where the CPU is
> sufficiently fast and the bus sufficiently slow we falsely detect a
> stall?  Where did this number come from?

This code is executed after all the data has been pushed to the out
fifo. Since we are not inhibitng tx, the moment the empty mask is
evaluated,  at least one byte should be out.

After 1 day of use I have been able to locate one event when stalled
was two. So 10 is very conservative number.

In other words: 10 is one order of magnitude bigger than the worst
measured case.

Thanks for your review :)
Ricardo Ribalda Delgado Nov. 22, 2017, 7:27 p.m. UTC | #3
Hi again

On Wed, Nov 22, 2017 at 8:25 PM, Ricardo Ribalda Delgado
<ricardo.ribalda@gmail.com> wrote:
> Hi Mark
>
>
> On Wed, Nov 22, 2017 at 12:12 PM, Mark Brown <broonie@kernel.org> wrote:
>> On Tue, Nov 21, 2017 at 10:09:02AM +0100, Ricardo Ribalda Delgado wrote:
>>
>>> +             stalled = 10;
>>>               while (rx_words) {
>>> +                     if (rx_words == n_words && !(stalled--) &&
>>> +                         !(sr & XSPI_SR_TX_EMPTY_MASK) &&
>>> +                         (sr & XSPI_SR_RX_EMPTY_MASK)) {
>>
>> 10 seems like a small number for what is essentially just a busy spin -
>> are we sure that we won't reasonably hit a case where the CPU is
>> sufficiently fast and the bus sufficiently slow we falsely detect a
>> stall?  Where did this number come from?
>
> This code is executed after all the data has been pushed to the out
> fifo. Since we are not inhibitng tx, the moment the empty mask is
> evaluated,  at least one byte should be out.
>
> After 1 day of use I have been able to locate one event when stalled
> was two. So 10 is very conservative number.
>
> In other words: 10 is one order of magnitude bigger than the worst
> measured case.


If it makes you more comfortable I could add something like

if (n_tries < 8)
   msleep(10);

Regards!

>
> Thanks for your review :)
>
>
> --
> Ricardo Ribalda
diff mbox

Patch

diff --git a/drivers/spi/spi-xilinx.c b/drivers/spi/spi-xilinx.c
index bc7100b93dfc..e0b9fe1d0e37 100644
--- a/drivers/spi/spi-xilinx.c
+++ b/drivers/spi/spi-xilinx.c
@@ -271,6 +271,7 @@  static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t)
 	while (remaining_words) {
 		int n_words, tx_words, rx_words;
 		u32 sr;
+		int stalled;
 
 		n_words = min(remaining_words, xspi->buffer_size);
 
@@ -299,7 +300,17 @@  static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t)
 
 		/* Read out all the data from the Rx FIFO */
 		rx_words = n_words;
+		stalled = 10;
 		while (rx_words) {
+			if (rx_words == n_words && !(stalled--) &&
+			    !(sr & XSPI_SR_TX_EMPTY_MASK) &&
+			    (sr & XSPI_SR_RX_EMPTY_MASK)) {
+				dev_err(&spi->dev,
+					"Detected stall. Check C_SPI_MODE and C_SPI_MEMORY\n");
+				xspi_init_hw(xspi);
+				return -EIO;
+			}
+
 			if ((sr & XSPI_SR_TX_EMPTY_MASK) && (rx_words > 1)) {
 				xilinx_spi_rx(xspi);
 				rx_words--;