diff mbox series

[next] trace/blktrace: fix task hung in blk_trace_ioctl

Message ID tencent_6537E04AAC74F976B567603CEB377A96FA09@qq.com (mailing list archive)
State Handled Elsewhere
Headers show
Series [next] trace/blktrace: fix task hung in blk_trace_ioctl | expand

Commit Message

Edward Adam Davis Dec. 2, 2023, 9:01 a.m. UTC
The reproducer involves running test programs on multiple processors separately,
in order to enter blkdev_ioctl() and ultimately reach blk_trace_ioctl() through
two different paths, triggering an AA deadlock.

	CPU0						CPU1
	---						---
	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)


The first path:
blkdev_ioctl()->
	blk_trace_ioctl()->
		mutex_lock(&q->debugfs_mutex)

The second path:
blkdev_ioctl()->				
	blkdev_common_ioctl()->
		blk_trace_ioctl()->
			mutex_lock(&q->debugfs_mutex)

The solution I have proposed is to exit blk_trace_ioctl() to avoid AA locks if
a task has already obtained debugfs_mutex.

Fixes: 0d345996e4cb ("x86/kernel: increase kcov coverage under arch/x86/kernel folder")
Reported-and-tested-by: syzbot+ed812ed461471ab17a0c@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
 kernel/trace/blktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Yu Kuai Dec. 2, 2023, 9:19 a.m. UTC | #1
Hi,

在 2023/12/02 17:01, Edward Adam Davis 写道:
> The reproducer involves running test programs on multiple processors separately,
> in order to enter blkdev_ioctl() and ultimately reach blk_trace_ioctl() through
> two different paths, triggering an AA deadlock.
> 
> 	CPU0						CPU1
> 	---						---
> 	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
> 	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
> 
> 
> The first path:
> blkdev_ioctl()->
> 	blk_trace_ioctl()->
> 		mutex_lock(&q->debugfs_mutex)
> 
> The second path:
> blkdev_ioctl()->				
> 	blkdev_common_ioctl()->
> 		blk_trace_ioctl()->
> 			mutex_lock(&q->debugfs_mutex)
I still don't understand how this AA deadlock is triggered, does the
'debugfs_mutex' already held before calling blk_trace_ioctl()?

> 
> The solution I have proposed is to exit blk_trace_ioctl() to avoid AA locks if
> a task has already obtained debugfs_mutex.
> 
> Fixes: 0d345996e4cb ("x86/kernel: increase kcov coverage under arch/x86/kernel folder")
> Reported-and-tested-by: syzbot+ed812ed461471ab17a0c@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
>   kernel/trace/blktrace.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index 54ade89a1ad2..34e5bce42b1e 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -735,7 +735,8 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
>   	int ret, start = 0;
>   	char b[BDEVNAME_SIZE];
>   
> -	mutex_lock(&q->debugfs_mutex);
> +	if (!mutex_trylock(&q->debugfs_mutex))
> +		return -EBUSY;

This is absolutely not a proper fix, a lot of user case will fail after
this patch.

Thanks,
Kuai

>   
>   	switch (cmd) {
>   	case BLKTRACESETUP:
>
Steven Rostedt Dec. 2, 2023, 10:07 p.m. UTC | #2
On Sat, 2 Dec 2023 17:19:25 +0800
Yu Kuai <yukuai1@huaweicloud.com> wrote:

> Hi,
> 
> 在 2023/12/02 17:01, Edward Adam Davis 写道:
> > The reproducer involves running test programs on multiple processors separately,
> > in order to enter blkdev_ioctl() and ultimately reach blk_trace_ioctl() through
> > two different paths, triggering an AA deadlock.
> > 
> > 	CPU0						CPU1
> > 	---						---
> > 	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
> > 	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
> > 
> > 
> > The first path:
> > blkdev_ioctl()->
> > 	blk_trace_ioctl()->
> > 		mutex_lock(&q->debugfs_mutex)
> > 
> > The second path:
> > blkdev_ioctl()->				
> > 	blkdev_common_ioctl()->
> > 		blk_trace_ioctl()->
> > 			mutex_lock(&q->debugfs_mutex)  
> I still don't understand how this AA deadlock is triggered, does the
> 'debugfs_mutex' already held before calling blk_trace_ioctl()?

Right, I don't see where the mutex is taken twice. You don't need two
paths for an AA lock, you only need one.

> 
> > 
> > The solution I have proposed is to exit blk_trace_ioctl() to avoid AA locks if
> > a task has already obtained debugfs_mutex.
> > 
> > Fixes: 0d345996e4cb ("x86/kernel: increase kcov coverage under arch/x86/kernel folder")

How does it fix the above? I don't see how the above is even related to this.

-- Steve

> > Reported-and-tested-by: syzbot+ed812ed461471ab17a0c@syzkaller.appspotmail.com
> > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> > ---
> >   kernel/trace/blktrace.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
Pengfei Xu Dec. 3, 2023, 11:45 a.m. UTC | #3
Hi,

On 2023-12-03 at 06:07:43 +0800, Steven Rostedt wrote:
> On Sat, 2 Dec 2023 17:19:25 +0800
> Yu Kuai <yukuai1@huaweicloud.com> wrote:
> 
> > Hi,
> > 
> > 在 2023/12/02 17:01, Edward Adam Davis 写道:
> > > The reproducer involves running test programs on multiple processors separately,
> > > in order to enter blkdev_ioctl() and ultimately reach blk_trace_ioctl() through
> > > two different paths, triggering an AA deadlock.
> > > 
> > > 	CPU0						CPU1
> > > 	---						---
> > > 	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
> > > 	mutex_lock(&q->debugfs_mutex)			mutex_lock(&q->debugfs_mutex)
> > > 
> > > 
> > > The first path:
> > > blkdev_ioctl()->
> > > 	blk_trace_ioctl()->
> > > 		mutex_lock(&q->debugfs_mutex)
> > > 
> > > The second path:
> > > blkdev_ioctl()->				
> > > 	blkdev_common_ioctl()->
> > > 		blk_trace_ioctl()->
> > > 			mutex_lock(&q->debugfs_mutex)  
> > I still don't understand how this AA deadlock is triggered, does the
> > 'debugfs_mutex' already held before calling blk_trace_ioctl()?
> 
> Right, I don't see where the mutex is taken twice. You don't need two
> paths for an AA lock, you only need one.
> 
> > 
> > > 
> > > The solution I have proposed is to exit blk_trace_ioctl() to avoid AA locks if
> > > a task has already obtained debugfs_mutex.
> > > 
> > > Fixes: 0d345996e4cb ("x86/kernel: increase kcov coverage under arch/x86/kernel folder")
> 
> How does it fix the above? I don't see how the above is even related to this.

I bisected this issue and the following fix information is more accurate:
"
Fixes: f2c2e717642c ("usb: gadget: add raw-gadget interface")
"

All the bisected info is in link: https://github.com/xupengfe/syzkaller_logs/tree/main/231203_140738_blk_trace_ioctl

Acked-by: Pengfei Xu <pengfei.xu@intel.com>

Thanks!

> 
> -- Steve
> 
> > > Reported-and-tested-by: syzbot+ed812ed461471ab17a0c@syzkaller.appspotmail.com
> > > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> > > ---
> > >   kernel/trace/blktrace.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
diff mbox series

Patch

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 54ade89a1ad2..34e5bce42b1e 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -735,7 +735,8 @@  int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
 	int ret, start = 0;
 	char b[BDEVNAME_SIZE];
 
-	mutex_lock(&q->debugfs_mutex);
+	if (!mutex_trylock(&q->debugfs_mutex))
+		return -EBUSY;
 
 	switch (cmd) {
 	case BLKTRACESETUP: