Message ID | 20230116144149.305560-1-brgl@bgdev.pl (mailing list archive) |
---|---|
State | Accepted |
Commit | eede42c9459b58b71edc99303dad65216a655810 |
Headers | show |
Series | [-next] spi: spidev: fix a recursive locking error | expand |
Hi On Mon, 2023-01-16 at 15:41 +0100, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > When calling spidev_message() from the one of the ioctl() callbacks, the > spi_lock is already taken. When we then end up calling spidev_sync(), we > get the following splat: > > [ 214.047619] > [ 214.049198] ============================================ > [ 214.054533] WARNING: possible recursive locking detected > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted > [ 214.065969] -------------------------------------------- > [ 214.071290] spidev_test/1454 is trying to acquire lock: > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8 > [ 214.084164] > [ 214.084164] but task is already holding lock: > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8 > [ 214.097537] > [ 214.097537] other info that might help us debug this: > [ 214.104075] Possible unsafe locking scenario: > [ 214.104075] > [ 214.110004] CPU0 > [ 214.112461] ---- > [ 214.114916] lock(&spidev->spi_lock); > [ 214.118687] lock(&spidev->spi_lock); > [ 214.122457] > [ 214.122457] *** DEADLOCK *** > [ 214.122457] > [ 214.128386] May be due to missing lock nesting notation > [ 214.128386] > [ 214.135183] 2 locks held by spidev_test/1454: > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8 > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8 > [ 214.155493] > [ 214.155493] stack backtrace: > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14 > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90 > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858 > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378 > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8 > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24 > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8 > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0) > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0 > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003 > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6 > > Fix it by introducing an unlocked variant of spidev_sync() and calling it > from spidev_message() while other users who don't check the spidev->spi's > existence keep on using the locking flavor. > > Reported-by: Francesco Dolcini <francesco@dolcini.it> > Fixes: 1f4d2dd45b6e ("spi: spidev: fix a race condition when accessing spidev->spi") > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Thanks for the quick fix. I tested on a Colibri iMX7 which showed the bug before the patch is applied but not after. Tested-by: Max Krummenacher <max.krummenacher@toradex.com> Regards Max > --- > drivers/spi/spidev.c | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c > index 8ef22ebcde1f..892965ac8fdf 100644 > --- a/drivers/spi/spidev.c > +++ b/drivers/spi/spidev.c > @@ -89,10 +89,22 @@ MODULE_PARM_DESC(bufsiz, "data bytes in biggest supported SPI message"); > > /*-------------------------------------------------------------------------*/ > > +static ssize_t > +spidev_sync_unlocked(struct spi_device *spi, struct spi_message *message) > +{ > + ssize_t status; > + > + status = spi_sync(spi, message); > + if (status == 0) > + status = message->actual_length; > + > + return status; > +} > + > static ssize_t > spidev_sync(struct spidev_data *spidev, struct spi_message *message) > { > - int status; > + ssize_t status; > struct spi_device *spi; > > mutex_lock(&spidev->spi_lock); > @@ -101,12 +113,10 @@ spidev_sync(struct spidev_data *spidev, struct spi_message *message) > if (spi == NULL) > status = -ESHUTDOWN; > else > - status = spi_sync(spi, message); > - > - if (status == 0) > - status = message->actual_length; > + status = spidev_sync_unlocked(spi, message); > > mutex_unlock(&spidev->spi_lock); > + > return status; > } > > @@ -294,7 +304,7 @@ static int spidev_message(struct spidev_data *spidev, > spi_message_add_tail(k_tmp, &msg); > } > > - status = spidev_sync(spidev, &msg); > + status = spidev_sync_unlocked(spidev->spi, &msg); > if (status < 0) > goto done; >
On Mon, 16 Jan 2023 15:41:49 +0100, Bartosz Golaszewski wrote: > When calling spidev_message() from the one of the ioctl() callbacks, the > spi_lock is already taken. When we then end up calling spidev_sync(), we > get the following splat: > > [ 214.047619] > [ 214.049198] ============================================ > [ 214.054533] WARNING: possible recursive locking detected > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted > [ 214.065969] -------------------------------------------- > [ 214.071290] spidev_test/1454 is trying to acquire lock: > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8 > [ 214.084164] > [ 214.084164] but task is already holding lock: > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8 > [ 214.097537] > [ 214.097537] other info that might help us debug this: > [ 214.104075] Possible unsafe locking scenario: > [ 214.104075] > [ 214.110004] CPU0 > [ 214.112461] ---- > [ 214.114916] lock(&spidev->spi_lock); > [ 214.118687] lock(&spidev->spi_lock); > [ 214.122457] > [ 214.122457] *** DEADLOCK *** > [ 214.122457] > [ 214.128386] May be due to missing lock nesting notation > [ 214.128386] > [ 214.135183] 2 locks held by spidev_test/1454: > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8 > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8 > [ 214.155493] > [ 214.155493] stack backtrace: > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14 > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90 > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858 > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378 > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8 > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24 > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8 > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0) > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0 > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003 > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6 > > [...] Applied to broonie/spi.git for-next Thanks! [1/1] spi: spidev: fix a recursive locking error commit: 9bab63a3e9498afd6a29cd3a43cf3ad4a4612b85 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Hello Mark, On Tue, Jan 17, 2023 at 11:06:07AM +0000, Mark Brown wrote: > On Mon, 16 Jan 2023 15:41:49 +0100, Bartosz Golaszewski wrote: > > When calling spidev_message() from the one of the ioctl() callbacks, the > > spi_lock is already taken. When we then end up calling spidev_sync(), we > > get the following splat: > > > > [ 214.047619] > > [ 214.049198] ============================================ > > [ 214.054533] WARNING: possible recursive locking detected > > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted > > [ 214.065969] -------------------------------------------- > > [ 214.071290] spidev_test/1454 is trying to acquire lock: > > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8 > > [ 214.084164] > > [ 214.084164] but task is already holding lock: > > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8 > > [ 214.097537] > > [ 214.097537] other info that might help us debug this: > > [ 214.104075] Possible unsafe locking scenario: > > [ 214.104075] > > [ 214.110004] CPU0 > > [ 214.112461] ---- > > [ 214.114916] lock(&spidev->spi_lock); > > [ 214.118687] lock(&spidev->spi_lock); > > [ 214.122457] > > [ 214.122457] *** DEADLOCK *** > > [ 214.122457] > > [ 214.128386] May be due to missing lock nesting notation > > [ 214.128386] > > [ 214.135183] 2 locks held by spidev_test/1454: > > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8 > > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8 > > [ 214.155493] > > [ 214.155493] stack backtrace: > > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 > > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14 > > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90 > > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858 > > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378 > > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8 > > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24 > > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8 > > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c > > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c > > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0) > > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0 > > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003 > > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6 > > > > [...] > > Applied to > > broonie/spi.git for-next > > Thanks! > > [1/1] spi: spidev: fix a recursive locking error > commit: 9bab63a3e9498afd6a29cd3a43cf3ad4a4612b85 Any plan to have this fix sent to Linus for v6.2? The reason for asking is that because of this bug our whole test infrastructure crashes preventing the normal testing we normally do on mainline kernel (that is also how this issue was spotted in the first place). Francesco
On Sat, Feb 11, 2023 at 04:18:37PM +0100, Francesco Dolcini wrote: > Any plan to have this fix sent to Linus for v6.2? The reason for asking > is that because of this bug our whole test infrastructure crashes > preventing the normal testing we normally do on mainline kernel (that is > also how this issue was spotted in the first place). I already did.
On Sat, Feb 11, 2023 at 11:13:35PM +0000, Mark Brown wrote: > On Sat, Feb 11, 2023 at 04:18:37PM +0100, Francesco Dolcini wrote: > > Any plan to have this fix sent to Linus for v6.2? The reason for asking > I already did. Thanks a lot - appreciated! Francesco
diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c index 8ef22ebcde1f..892965ac8fdf 100644 --- a/drivers/spi/spidev.c +++ b/drivers/spi/spidev.c @@ -89,10 +89,22 @@ MODULE_PARM_DESC(bufsiz, "data bytes in biggest supported SPI message"); /*-------------------------------------------------------------------------*/ +static ssize_t +spidev_sync_unlocked(struct spi_device *spi, struct spi_message *message) +{ + ssize_t status; + + status = spi_sync(spi, message); + if (status == 0) + status = message->actual_length; + + return status; +} + static ssize_t spidev_sync(struct spidev_data *spidev, struct spi_message *message) { - int status; + ssize_t status; struct spi_device *spi; mutex_lock(&spidev->spi_lock); @@ -101,12 +113,10 @@ spidev_sync(struct spidev_data *spidev, struct spi_message *message) if (spi == NULL) status = -ESHUTDOWN; else - status = spi_sync(spi, message); - - if (status == 0) - status = message->actual_length; + status = spidev_sync_unlocked(spi, message); mutex_unlock(&spidev->spi_lock); + return status; } @@ -294,7 +304,7 @@ static int spidev_message(struct spidev_data *spidev, spi_message_add_tail(k_tmp, &msg); } - status = spidev_sync(spidev, &msg); + status = spidev_sync_unlocked(spidev->spi, &msg); if (status < 0) goto done;