spidev: fix hang when transfer_one_message fails

Message ID	1388965166-27334-1-git-send-email-daniel.santos@pobox.com (mailing list archive)
State	Accepted
Commit	e120cc0dcf2880a4c5c0a6cb27b655600a1cfa1d
Headers	show Return-Path: <linux-spi-owner@kernel.org> From: danielfsantos@att.net To: Mark Brown <broonie@kernel.org>, linux-spi <linux-spi@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org> Cc: Daniel Santos <daniel.santos@pobox.com> Subject: spidev: fix hang when transfer_one_message fails Date: Sun, 5 Jan 2014 17:39:26 -0600 Message-Id: <1388965166-27334-1-git-send-email-daniel.santos@pobox.com> Reply-To: Daniel Santos <daniel.santos@pobox.com> Sender: linux-spi-owner@vger.kernel.org Precedence: bulk

Daniel Santos Jan. 5, 2014, 11:39 p.m. UTC

This corrects a problem in spi_pump_messages() that leads to an spi
message hanging forever when a call to transfer_one_message() fails.
This failure occurs in my MCP2210 driver when the cs_change bit is set
on the last transfer in a message, an operation which the hardware does
not support.

Rationale
Since the transfer_one_message() returns an int, we must presume that it
may fail.  If transfer_one_message() should never fail, it should return
void.  Thus, calls to transfer_one_message() should properly manage a
failure.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 drivers/spi/spi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Daniel Santos Jan. 5, 2014, 11:52 p.m. UTC | #1

Sorry, the "[PATCH]" prefix didn't get in the subject.
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Brown Jan. 6, 2014, 12:53 p.m. UTC | #2

On Sun, Jan 05, 2014 at 05:39:26PM -0600, danielfsantos@att.net wrote:
> This corrects a problem in spi_pump_messages() that leads to an spi
> message hanging forever when a call to transfer_one_message() fails.
> This failure occurs in my MCP2210 driver when the cs_change bit is set
> on the last transfer in a message, an operation which the hardware does
> not support.

Applied, thanks.

Geert Uytterhoeven Jan. 23, 2014, 4:47 p.m. UTC | #3

On Mon, Jan 6, 2014 at 12:39 AM,  <danielfsantos@att.net> wrote:
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -735,7 +735,9 @@ static void spi_pump_messages(struct kthread_work *work)
>         ret = master->transfer_one_message(master, master->cur_msg);
>         if (ret) {
>                 dev_err(&master->dev,
> -                       "failed to transfer one message from queue\n");
> +                       "failed to transfer one message from queue: %d\n", ret);
> +               master->cur_msg->status = ret;

This crashes with drivers using the generic spi_transfer_one_message(),
which always calls spi_finalize_current_message(), which zeroes
master->cur_msg:

spi_master spi0: failed to transfer one message from queue: -110
spi_pump_messages:748 master = ef3d8c00
spi_pump_messages:749 master->cur_msg =   (null)
Unable to handle kernel NULL pointer dereference at virtual address 00000020
pgd = c0004000
[00000020] *pgd=00000000
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 30 Comm: spi0 Not tainted
3.13.0-koelsch-00403-gecb6e4e65dea-dirty #274
task: ef250bc0 ti: ef3f0000 task.ti: ef3f0000
PC is at spi_pump_messages+0x22c/0x288
LR is at irq_work_queue+0x6c/0xcc

Probably your transfer_one_message() forgot to call
spi_finalize_current_message()? Is this allowed in case of failure?

 * @transfer_one_message: the subsystem calls the driver to transfer a single
 *      message while queuing transfers that arrive in the meantime. When the
 *      driver is finished with this message, it must call
 *      spi_finalize_current_message() so the subsystem can issue the next
 *      transfer

Alternatively, we need a check for master->cur_msg here.

> +               spi_finalize_current_message(master);
>                 return;
>         }
>  }

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Brown Jan. 23, 2014, 6:17 p.m. UTC | #4

On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote:

> Probably your transfer_one_message() forgot to call
> spi_finalize_current_message()? Is this allowed in case of failure?

Probably not, or at least we should be consistent about requiring it or
not.  Do you want to send a revert for this with a suitable changelog?

Daniel Santos Jan. 24, 2014, 2:21 a.m. UTC | #5

On 01/23/2014 12:17 PM, Mark Brown wrote:
> On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote:
>
>> Probably your transfer_one_message() forgot to call
>> spi_finalize_current_message()? Is this allowed in case of failure?
> Probably not, or at least we should be consistent about requiring it or
> not.

Hmm, well it sounds like the core problem is a lack of specificity about 
the interface.

1. When a message is being rejected, who is responsible for finalizing 
it, the spi subsystem or the master driver?
2. What does a non-zero return value from transfer() or 
transfer_one_message() mean? transfer() is supposed to just queue the 
message and not sleep, so it would seem appropriate that it would mean 
that the message was rejected due to something being invalid or 
unsupported (or an OOM), etc. , but transfer_one_message() is also where 
*most but not all* drivers transmit the message, so should it mean the 
message was rejected outright for being invalid/unsupported/OOM or 
should it mean a failure, such as EIO while xmitting or both?
3. Is there ever a reason to set the message's status to anything other 
than the return value of transfer()/transfer_one_message()? From a 
cursory review of mainline spi drivers, this appears to vary. Some 
drivers are always returning zero while setting the status upon error, 
some return the status and others still will set the status to one 
value, but return a different error code.

So if a non-zero return value from transfer() or transfer_one_message() 
should also be the status, I'm thinking we can have a small reduction in 
the code footprint if it's done in the spi core. However, I suppose that 
I can't properly discuss this without delving into an almost unrelated 
issue, which may render the point moot.

The only reason I'm using transfer_one_message() at all is because 
transfer() is being deprecated. My driver (currently out-of-tree) 
supports both but will prefer transfer() as long as it hasn't been 
removed or become broken ( which I'm managing via a #if 
LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: 
https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143). 
The reason is for this is that the mcp2210 driver has an internal 
command queue that manages (per its requirements) spi messages as well 
as other types of commands to the remote (via USB) device (which is both 
an spi master and gpio chip). From a cursory review of other spi drivers 
in the mainline, I can see that at least two of them do this as well: 
spi-pxa2xx and spi-bfin-v3. So perhaps we need a non-deprecated 
mechanism to do our own queuing and avoid the overhead of the spi core 
providing a thread & queue which we'll just ignore. Then, the core can 
take care of setting status and finalizing when calls to transfer() fail 
(since there should be no ambiguity about this here), but leave that up 
to the driver when calling transfer_one_message()?

Either way, I think that we need to decide and spell it out in the 
kerneldocs.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Brown Jan. 24, 2014, 1:01 p.m. UTC | #6

On Thu, Jan 23, 2014 at 08:21:39PM -0600, Daniel Santos wrote:
> On 01/23/2014 12:17 PM, Mark Brown wrote:
> >On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote:

Please don't write enormous walls of text, it really doesn't make it
easy to read your messages or encourage doing so.  Use blank lines
between paragraphs (including within lists) and try to either split or
condense your ideas so that what you're trying to say comes over more
clearly.

> The only reason I'm using transfer_one_message() at all is because
> transfer() is being deprecated. My driver (currently out-of-tree)
> supports both but will prefer transfer() as long as it hasn't been
> removed or become broken ( which I'm managing via a #if
> LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143).

No, don't do that - it's not sensible.  If there's something you need
work upstream to get it implemented or understand how to use the
framework better.  Don't code around the frameworks, talk to people
instead.

> of other spi drivers in the mainline, I can see that at least two of
> them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need
> a non-deprecated mechanism to do our own queuing and avoid the

No, that's not what those drivers are doing (nor the others doing
similar things) - they have done some optimisation on the code that
pushes messages to hardware so they don't defer to task context when
they don't have to.  There's very little hardware specific about what
they're doing, it's all about how we work with the scheduler to minimise
the idle time for the hardware.  A major goal of factoring out the loops
that traverse the messages from the drivers is to allow us to move that
code out of the drivers and into the framework where it belongs.

> overhead of the spi core providing a thread & queue which we'll just
> ignore. Then, the core can take care of setting status and
> finalizing when calls to transfer() fail (since there should be no
> ambiguity about this here), but leave that up to the driver when
> calling transfer_one_message()?

When the core refactoring is finished popping up into the thread will be
mostly optional.  Things like PIO, clock reprogramming and delays will
need to be pushed up into task context as do some of the DMA operations
and the completions - you don't want to be doing anything slow in
interrupt context.

Daniel Santos Jan. 27, 2014, 11:16 p.m. UTC | #7

On 01/24/2014 07:01 AM, Mark Brown wrote:
> Please don't write enormous walls of text, it really doesn't make it
> easy to read your messages or encourage doing so.  Use blank lines
> between paragraphs (including within lists) and try to either split or
> condense your ideas so that what you're trying to say comes over more
> clearly.

Indeed, that was pretty ugly. :) Sorry about that.

>
>> The only reason I'm using transfer_one_message() at all is because
>> transfer() is being deprecated. My driver (currently out-of-tree)
>> supports both but will prefer transfer() as long as it hasn't been
>> removed or become broken ( which I'm managing via a #if
>> LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143).
> No, don't do that - it's not sensible.  If there's something you need
> work upstream to get it implemented or understand how to use the
> framework better.  Don't code around the frameworks, talk to people
> instead.

I suppose that at the time I worked on this, I had some time pressures 
and I did plan to come back to it and discuss this with linux-spi to 
figure out how to better manage this or if I should just simply use the 
spi's queue and leave it be. I've faced a lot of challenges thus far 
because:

a.) It's my first device driver, and

b.) I must dynamically create/destroy gpio_chips, irq_chips, spi_masters 
and their children since this is a USB "bridge" device that can be added 
& removed at any point in time.

I originally thought that it was a first in its class, but I've since 
discovered another out-of-tree project that is doing very similar 
things, USB to i2c/spi (https://github.com/groeck/diolan)

>
>> of other spi drivers in the mainline, I can see that at least two of
>> them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need
>> a non-deprecated mechanism to do our own queuing and avoid the
> No, that's not what those drivers are doing (nor the others doing
> similar things) - they have done some optimisation on the code that
> pushes messages to hardware so they don't defer to task context when
> they don't have to.  There's very little hardware specific about what
> they're doing, it's all about how we work with the scheduler to minimise
> the idle time for the hardware.  A major goal of factoring out the loops
> that traverse the messages from the drivers is to allow us to move that
> code out of the drivers and into the framework where it belongs.

Oh, that's cool! :)  Thanks for the clarification.

>> overhead of the spi core providing a thread & queue which we'll just
>> ignore. Then, the core can take care of setting status and
>> finalizing when calls to transfer() fail (since there should be no
>> ambiguity about this here), but leave that up to the driver when
>> calling transfer_one_message()?
> When the core refactoring is finished popping up into the thread will be
> mostly optional.  Things like PIO, clock reprogramming and delays will
> need to be pushed up into task context as do some of the DMA operations
> and the completions - you don't want to be doing anything slow in
> interrupt context.

I suppose I need to read up more on the refactoring work happening in 
this subsystem. Yes, we definitely don't want to spend much time in 
interrupt context and my driver currently spends a lot of time there (at 
least to me).  My strategy has been that when I get an spi message from 
transfer(), I create and submit an mcp2210-specific command for that 
message. If no command is currently in-process, I also submit 64-byte 
interrupt URB for that command prior to returning (the mcp2210 has a 
tiny buffer).  I suppose I've been trying to follow the "first make it 
correct, then make it fast" credo.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

spidev: fix hang when transfer_one_message fails

Commit Message

Comments

Patch