Message ID | 1388965166-27334-1-git-send-email-daniel.santos@pobox.com (mailing list archive) |
---|---|
State | Accepted |
Commit | e120cc0dcf2880a4c5c0a6cb27b655600a1cfa1d |
Headers | show |
Sorry, the "[PATCH]" prefix didn't get in the subject. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 05, 2014 at 05:39:26PM -0600, danielfsantos@att.net wrote: > This corrects a problem in spi_pump_messages() that leads to an spi > message hanging forever when a call to transfer_one_message() fails. > This failure occurs in my MCP2210 driver when the cs_change bit is set > on the last transfer in a message, an operation which the hardware does > not support. Applied, thanks.
On Mon, Jan 6, 2014 at 12:39 AM, <danielfsantos@att.net> wrote: > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -735,7 +735,9 @@ static void spi_pump_messages(struct kthread_work *work) > ret = master->transfer_one_message(master, master->cur_msg); > if (ret) { > dev_err(&master->dev, > - "failed to transfer one message from queue\n"); > + "failed to transfer one message from queue: %d\n", ret); > + master->cur_msg->status = ret; This crashes with drivers using the generic spi_transfer_one_message(), which always calls spi_finalize_current_message(), which zeroes master->cur_msg: spi_master spi0: failed to transfer one message from queue: -110 spi_pump_messages:748 master = ef3d8c00 spi_pump_messages:749 master->cur_msg = (null) Unable to handle kernel NULL pointer dereference at virtual address 00000020 pgd = c0004000 [00000020] *pgd=00000000 Internal error: Oops: 817 [#1] SMP ARM Modules linked in: CPU: 1 PID: 30 Comm: spi0 Not tainted 3.13.0-koelsch-00403-gecb6e4e65dea-dirty #274 task: ef250bc0 ti: ef3f0000 task.ti: ef3f0000 PC is at spi_pump_messages+0x22c/0x288 LR is at irq_work_queue+0x6c/0xcc Probably your transfer_one_message() forgot to call spi_finalize_current_message()? Is this allowed in case of failure? * @transfer_one_message: the subsystem calls the driver to transfer a single * message while queuing transfers that arrive in the meantime. When the * driver is finished with this message, it must call * spi_finalize_current_message() so the subsystem can issue the next * transfer Alternatively, we need a check for master->cur_msg here. > + spi_finalize_current_message(master); > return; > } > } Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote: > Probably your transfer_one_message() forgot to call > spi_finalize_current_message()? Is this allowed in case of failure? Probably not, or at least we should be consistent about requiring it or not. Do you want to send a revert for this with a suitable changelog?
On 01/23/2014 12:17 PM, Mark Brown wrote: > On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote: > >> Probably your transfer_one_message() forgot to call >> spi_finalize_current_message()? Is this allowed in case of failure? > Probably not, or at least we should be consistent about requiring it or > not. Hmm, well it sounds like the core problem is a lack of specificity about the interface. 1. When a message is being rejected, who is responsible for finalizing it, the spi subsystem or the master driver? 2. What does a non-zero return value from transfer() or transfer_one_message() mean? transfer() is supposed to just queue the message and not sleep, so it would seem appropriate that it would mean that the message was rejected due to something being invalid or unsupported (or an OOM), etc. , but transfer_one_message() is also where *most but not all* drivers transmit the message, so should it mean the message was rejected outright for being invalid/unsupported/OOM or should it mean a failure, such as EIO while xmitting or both? 3. Is there ever a reason to set the message's status to anything other than the return value of transfer()/transfer_one_message()? From a cursory review of mainline spi drivers, this appears to vary. Some drivers are always returning zero while setting the status upon error, some return the status and others still will set the status to one value, but return a different error code. So if a non-zero return value from transfer() or transfer_one_message() should also be the status, I'm thinking we can have a small reduction in the code footprint if it's done in the spi core. However, I suppose that I can't properly discuss this without delving into an almost unrelated issue, which may render the point moot. The only reason I'm using transfer_one_message() at all is because transfer() is being deprecated. My driver (currently out-of-tree) supports both but will prefer transfer() as long as it hasn't been removed or become broken ( which I'm managing via a #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143). The reason is for this is that the mcp2210 driver has an internal command queue that manages (per its requirements) spi messages as well as other types of commands to the remote (via USB) device (which is both an spi master and gpio chip). From a cursory review of other spi drivers in the mainline, I can see that at least two of them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need a non-deprecated mechanism to do our own queuing and avoid the overhead of the spi core providing a thread & queue which we'll just ignore. Then, the core can take care of setting status and finalizing when calls to transfer() fail (since there should be no ambiguity about this here), but leave that up to the driver when calling transfer_one_message()? Either way, I think that we need to decide and spell it out in the kerneldocs. Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 23, 2014 at 08:21:39PM -0600, Daniel Santos wrote: > On 01/23/2014 12:17 PM, Mark Brown wrote: > >On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote: Please don't write enormous walls of text, it really doesn't make it easy to read your messages or encourage doing so. Use blank lines between paragraphs (including within lists) and try to either split or condense your ideas so that what you're trying to say comes over more clearly. > The only reason I'm using transfer_one_message() at all is because > transfer() is being deprecated. My driver (currently out-of-tree) > supports both but will prefer transfer() as long as it hasn't been > removed or become broken ( which I'm managing via a #if > LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143). No, don't do that - it's not sensible. If there's something you need work upstream to get it implemented or understand how to use the framework better. Don't code around the frameworks, talk to people instead. > of other spi drivers in the mainline, I can see that at least two of > them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need > a non-deprecated mechanism to do our own queuing and avoid the No, that's not what those drivers are doing (nor the others doing similar things) - they have done some optimisation on the code that pushes messages to hardware so they don't defer to task context when they don't have to. There's very little hardware specific about what they're doing, it's all about how we work with the scheduler to minimise the idle time for the hardware. A major goal of factoring out the loops that traverse the messages from the drivers is to allow us to move that code out of the drivers and into the framework where it belongs. > overhead of the spi core providing a thread & queue which we'll just > ignore. Then, the core can take care of setting status and > finalizing when calls to transfer() fail (since there should be no > ambiguity about this here), but leave that up to the driver when > calling transfer_one_message()? When the core refactoring is finished popping up into the thread will be mostly optional. Things like PIO, clock reprogramming and delays will need to be pushed up into task context as do some of the DMA operations and the completions - you don't want to be doing anything slow in interrupt context.
On 01/24/2014 07:01 AM, Mark Brown wrote: > Please don't write enormous walls of text, it really doesn't make it > easy to read your messages or encourage doing so. Use blank lines > between paragraphs (including within lists) and try to either split or > condense your ideas so that what you're trying to say comes over more > clearly. Indeed, that was pretty ugly. :) Sorry about that. > >> The only reason I'm using transfer_one_message() at all is because >> transfer() is being deprecated. My driver (currently out-of-tree) >> supports both but will prefer transfer() as long as it hasn't been >> removed or become broken ( which I'm managing via a #if >> LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143). > No, don't do that - it's not sensible. If there's something you need > work upstream to get it implemented or understand how to use the > framework better. Don't code around the frameworks, talk to people > instead. I suppose that at the time I worked on this, I had some time pressures and I did plan to come back to it and discuss this with linux-spi to figure out how to better manage this or if I should just simply use the spi's queue and leave it be. I've faced a lot of challenges thus far because: a.) It's my first device driver, and b.) I must dynamically create/destroy gpio_chips, irq_chips, spi_masters and their children since this is a USB "bridge" device that can be added & removed at any point in time. I originally thought that it was a first in its class, but I've since discovered another out-of-tree project that is doing very similar things, USB to i2c/spi (https://github.com/groeck/diolan) > >> of other spi drivers in the mainline, I can see that at least two of >> them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need >> a non-deprecated mechanism to do our own queuing and avoid the > No, that's not what those drivers are doing (nor the others doing > similar things) - they have done some optimisation on the code that > pushes messages to hardware so they don't defer to task context when > they don't have to. There's very little hardware specific about what > they're doing, it's all about how we work with the scheduler to minimise > the idle time for the hardware. A major goal of factoring out the loops > that traverse the messages from the drivers is to allow us to move that > code out of the drivers and into the framework where it belongs. Oh, that's cool! :) Thanks for the clarification. >> overhead of the spi core providing a thread & queue which we'll just >> ignore. Then, the core can take care of setting status and >> finalizing when calls to transfer() fail (since there should be no >> ambiguity about this here), but leave that up to the driver when >> calling transfer_one_message()? > When the core refactoring is finished popping up into the thread will be > mostly optional. Things like PIO, clock reprogramming and delays will > need to be pushed up into task context as do some of the DMA operations > and the completions - you don't want to be doing anything slow in > interrupt context. I suppose I need to read up more on the refactoring work happening in this subsystem. Yes, we definitely don't want to spend much time in interrupt context and my driver currently spends a lot of time there (at least to me). My strategy has been that when I get an spi message from transfer(), I create and submit an mcp2210-specific command for that message. If no command is currently in-process, I also submit 64-byte interrupt URB for that command prior to returning (the mcp2210 has a tiny buffer). I suppose I've been trying to follow the "first make it correct, then make it fast" credo. Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 98f4b77..907122e 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -735,7 +735,9 @@ static void spi_pump_messages(struct kthread_work *work) ret = master->transfer_one_message(master, master->cur_msg); if (ret) { dev_err(&master->dev, - "failed to transfer one message from queue\n"); + "failed to transfer one message from queue: %d\n", ret); + master->cur_msg->status = ret; + spi_finalize_current_message(master); return; } }
This corrects a problem in spi_pump_messages() that leads to an spi message hanging forever when a call to transfer_one_message() fails. This failure occurs in my MCP2210 driver when the cs_change bit is set on the last transfer in a message, an operation which the hardware does not support. Rationale Since the transfer_one_message() returns an int, we must presume that it may fail. If transfer_one_message() should never fail, it should return void. Thus, calls to transfer_one_message() should properly manage a failure. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> --- drivers/spi/spi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)