Message ID | 20240716083801.809763-1-yang.yang@vivo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix deadlock between sd_remove & sd_release | expand |
Hi, 在 2024/07/16 16:38, Yang Yang 写道: > Our test report the following hung task: > > [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds. > [ 2538.459427] Call trace: > [ 2538.459430] __switch_to+0x174/0x338 > [ 2538.459436] __schedule+0x628/0x9c4 > [ 2538.459442] schedule+0x7c/0xe8 > [ 2538.459447] schedule_preempt_disabled+0x24/0x40 > [ 2538.459453] __mutex_lock+0x3ec/0xf04 > [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 > [ 2538.459459] mutex_lock+0x30/0xd8 > [ 2538.459462] del_gendisk+0xdc/0x350 > [ 2538.459466] sd_remove+0x30/0x60 > [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 > [ 2538.459474] device_release_driver+0x18/0x28 > [ 2538.459478] bus_remove_device+0x15c/0x174 > [ 2538.459483] device_del+0x1d0/0x358 > [ 2538.459488] __scsi_remove_device+0xa8/0x198 > [ 2538.459493] scsi_forget_host+0x50/0x70 > [ 2538.459497] scsi_remove_host+0x80/0x180 > [ 2538.459502] usb_stor_disconnect+0x68/0xf4 > [ 2538.459506] usb_unbind_interface+0xd4/0x280 > [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 > [ 2538.459514] device_release_driver+0x18/0x28 > [ 2538.459518] bus_remove_device+0x15c/0x174 > [ 2538.459523] device_del+0x1d0/0x358 > [ 2538.459528] usb_disable_device+0x84/0x194 > [ 2538.459532] usb_disconnect+0xec/0x300 > [ 2538.459537] hub_event+0xb80/0x1870 > [ 2538.459541] process_scheduled_works+0x248/0x4dc > [ 2538.459545] worker_thread+0x244/0x334 > [ 2538.459549] kthread+0x114/0x1bc > > [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds. > [ 2538.461014] Call trace: > [ 2538.461016] __switch_to+0x174/0x338 > [ 2538.461021] __schedule+0x628/0x9c4 > [ 2538.461025] schedule+0x7c/0xe8 > [ 2538.461030] blk_queue_enter+0xc4/0x160 > [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 > [ 2538.461037] scsi_execute_cmd+0x7c/0x23c > [ 2538.461040] ioctl_internal_command+0x5c/0x164 > [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 > [ 2538.461051] sd_release+0x50/0x94 > [ 2538.461054] blkdev_put+0x190/0x28c > [ 2538.461058] blkdev_release+0x28/0x40 > [ 2538.461063] __fput+0xf8/0x2a8 > [ 2538.461066] __fput_sync+0x28/0x5c > [ 2538.461070] __arm64_sys_close+0x84/0xe8 > [ 2538.461073] invoke_syscall+0x58/0x114 > [ 2538.461078] el0_svc_common+0xac/0xe0 > [ 2538.461082] do_el0_svc+0x1c/0x28 > [ 2538.461087] el0_svc+0x38/0x68 > [ 2538.461090] el0t_64_sync_handler+0x68/0xbc > [ 2538.461093] el0t_64_sync+0x1a8/0x1ac > > T1: T2: > sd_remove > del_gendisk > __blk_mark_disk_dead > blk_freeze_queue_start > ++q->mq_freeze_depth > bdev_release > mutex_lock(&disk->open_mutex) > sd_release > scsi_execute_cmd > blk_queue_enter > wait_event(!q->mq_freeze_depth) This looks wrong, there is a condition blk_queue_dying() in blk_queue_enter(). Thanks, Kuai > mutex_lock(&disk->open_mutex) > > This is a classic ABBA deadlock. To fix the deadlock, make sure we don't > try to acquire disk->open_mutex after freezing the queue. > > Signed-off-by: Yang Yang <yang.yang@vivo.com> > --- > block/genhd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 8f1f3c6b4d67..c5fca3e893a0 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk) > */ > if (!test_bit(GD_DEAD, &disk->state)) > blk_report_disk_dead(disk, false); > - __blk_mark_disk_dead(disk); > > /* > * Drop all partitions now that the disk is marked dead. > */ > mutex_lock(&disk->open_mutex); > + __blk_mark_disk_dead(disk); > xa_for_each_start(&disk->part_tbl, idx, part, 1) > drop_partition(part); > mutex_unlock(&disk->open_mutex); >
On 2024/7/16 17:15, Yu Kuai wrote: > Hi, > > 在 2024/07/16 16:38, Yang Yang 写道: >> Our test report the following hung task: >> >> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds. >> [ 2538.459427] Call trace: >> [ 2538.459430] __switch_to+0x174/0x338 >> [ 2538.459436] __schedule+0x628/0x9c4 >> [ 2538.459442] schedule+0x7c/0xe8 >> [ 2538.459447] schedule_preempt_disabled+0x24/0x40 >> [ 2538.459453] __mutex_lock+0x3ec/0xf04 >> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 >> [ 2538.459459] mutex_lock+0x30/0xd8 >> [ 2538.459462] del_gendisk+0xdc/0x350 >> [ 2538.459466] sd_remove+0x30/0x60 >> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 >> [ 2538.459474] device_release_driver+0x18/0x28 >> [ 2538.459478] bus_remove_device+0x15c/0x174 >> [ 2538.459483] device_del+0x1d0/0x358 >> [ 2538.459488] __scsi_remove_device+0xa8/0x198 >> [ 2538.459493] scsi_forget_host+0x50/0x70 >> [ 2538.459497] scsi_remove_host+0x80/0x180 >> [ 2538.459502] usb_stor_disconnect+0x68/0xf4 >> [ 2538.459506] usb_unbind_interface+0xd4/0x280 >> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 >> [ 2538.459514] device_release_driver+0x18/0x28 >> [ 2538.459518] bus_remove_device+0x15c/0x174 >> [ 2538.459523] device_del+0x1d0/0x358 >> [ 2538.459528] usb_disable_device+0x84/0x194 >> [ 2538.459532] usb_disconnect+0xec/0x300 >> [ 2538.459537] hub_event+0xb80/0x1870 >> [ 2538.459541] process_scheduled_works+0x248/0x4dc >> [ 2538.459545] worker_thread+0x244/0x334 >> [ 2538.459549] kthread+0x114/0x1bc >> >> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds. >> [ 2538.461014] Call trace: >> [ 2538.461016] __switch_to+0x174/0x338 >> [ 2538.461021] __schedule+0x628/0x9c4 >> [ 2538.461025] schedule+0x7c/0xe8 >> [ 2538.461030] blk_queue_enter+0xc4/0x160 >> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 >> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c >> [ 2538.461040] ioctl_internal_command+0x5c/0x164 >> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 >> [ 2538.461051] sd_release+0x50/0x94 >> [ 2538.461054] blkdev_put+0x190/0x28c >> [ 2538.461058] blkdev_release+0x28/0x40 >> [ 2538.461063] __fput+0xf8/0x2a8 >> [ 2538.461066] __fput_sync+0x28/0x5c >> [ 2538.461070] __arm64_sys_close+0x84/0xe8 >> [ 2538.461073] invoke_syscall+0x58/0x114 >> [ 2538.461078] el0_svc_common+0xac/0xe0 >> [ 2538.461082] do_el0_svc+0x1c/0x28 >> [ 2538.461087] el0_svc+0x38/0x68 >> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc >> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac >> >> T1: T2: >> sd_remove >> del_gendisk >> __blk_mark_disk_dead >> blk_freeze_queue_start >> ++q->mq_freeze_depth >> bdev_release >> mutex_lock(&disk->open_mutex) >> sd_release >> scsi_execute_cmd >> blk_queue_enter >> wait_event(!q->mq_freeze_depth) > > This looks wrong, there is a condition blk_queue_dying() in > blk_queue_enter(). 584 static void __blk_mark_disk_dead(struct gendisk *disk) 585 { ...... 591 592 if (test_bit(GD_OWNS_QUEUE, &disk->state)) 593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in this scenario. Thanks. > > Thanks, > Kuai > >> mutex_lock(&disk->open_mutex) >> >> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't >> try to acquire disk->open_mutex after freezing the queue. >> >> Signed-off-by: Yang Yang <yang.yang@vivo.com> >> --- >> block/genhd.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index 8f1f3c6b4d67..c5fca3e893a0 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk) >> */ >> if (!test_bit(GD_DEAD, &disk->state)) >> blk_report_disk_dead(disk, false); >> - __blk_mark_disk_dead(disk); >> /* >> * Drop all partitions now that the disk is marked dead. >> */ >> mutex_lock(&disk->open_mutex); >> + __blk_mark_disk_dead(disk); >> xa_for_each_start(&disk->part_tbl, idx, part, 1) >> drop_partition(part); >> mutex_unlock(&disk->open_mutex); >> >
Hi, 在 2024/07/16 17:30, YangYang 写道: > On 2024/7/16 17:15, Yu Kuai wrote: >> Hi, >> >> 在 2024/07/16 16:38, Yang Yang 写道: >>> Our test report the following hung task: >>> >>> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 >>> seconds. >>> [ 2538.459427] Call trace: >>> [ 2538.459430] __switch_to+0x174/0x338 >>> [ 2538.459436] __schedule+0x628/0x9c4 >>> [ 2538.459442] schedule+0x7c/0xe8 >>> [ 2538.459447] schedule_preempt_disabled+0x24/0x40 >>> [ 2538.459453] __mutex_lock+0x3ec/0xf04 >>> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 >>> [ 2538.459459] mutex_lock+0x30/0xd8 >>> [ 2538.459462] del_gendisk+0xdc/0x350 >>> [ 2538.459466] sd_remove+0x30/0x60 >>> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 >>> [ 2538.459474] device_release_driver+0x18/0x28 >>> [ 2538.459478] bus_remove_device+0x15c/0x174 >>> [ 2538.459483] device_del+0x1d0/0x358 >>> [ 2538.459488] __scsi_remove_device+0xa8/0x198 >>> [ 2538.459493] scsi_forget_host+0x50/0x70 >>> [ 2538.459497] scsi_remove_host+0x80/0x180 >>> [ 2538.459502] usb_stor_disconnect+0x68/0xf4 >>> [ 2538.459506] usb_unbind_interface+0xd4/0x280 >>> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 >>> [ 2538.459514] device_release_driver+0x18/0x28 >>> [ 2538.459518] bus_remove_device+0x15c/0x174 >>> [ 2538.459523] device_del+0x1d0/0x358 >>> [ 2538.459528] usb_disable_device+0x84/0x194 >>> [ 2538.459532] usb_disconnect+0xec/0x300 >>> [ 2538.459537] hub_event+0xb80/0x1870 >>> [ 2538.459541] process_scheduled_works+0x248/0x4dc >>> [ 2538.459545] worker_thread+0x244/0x334 >>> [ 2538.459549] kthread+0x114/0x1bc >>> >>> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 >>> seconds. >>> [ 2538.461014] Call trace: >>> [ 2538.461016] __switch_to+0x174/0x338 >>> [ 2538.461021] __schedule+0x628/0x9c4 >>> [ 2538.461025] schedule+0x7c/0xe8 >>> [ 2538.461030] blk_queue_enter+0xc4/0x160 >>> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 >>> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c >>> [ 2538.461040] ioctl_internal_command+0x5c/0x164 >>> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 >>> [ 2538.461051] sd_release+0x50/0x94 >>> [ 2538.461054] blkdev_put+0x190/0x28c >>> [ 2538.461058] blkdev_release+0x28/0x40 >>> [ 2538.461063] __fput+0xf8/0x2a8 >>> [ 2538.461066] __fput_sync+0x28/0x5c >>> [ 2538.461070] __arm64_sys_close+0x84/0xe8 >>> [ 2538.461073] invoke_syscall+0x58/0x114 >>> [ 2538.461078] el0_svc_common+0xac/0xe0 >>> [ 2538.461082] do_el0_svc+0x1c/0x28 >>> [ 2538.461087] el0_svc+0x38/0x68 >>> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc >>> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac >>> >>> T1: T2: >>> sd_remove >>> del_gendisk >>> __blk_mark_disk_dead >>> blk_freeze_queue_start >>> ++q->mq_freeze_depth >>> bdev_release >>> mutex_lock(&disk->open_mutex) >>> sd_release >>> scsi_execute_cmd >>> blk_queue_enter >>> wait_event(!q->mq_freeze_depth) >> >> This looks wrong, there is a condition blk_queue_dying() in >> blk_queue_enter(). > > 584 static void __blk_mark_disk_dead(struct gendisk *disk) > 585 { > ...... > 591 > 592 if (test_bit(GD_OWNS_QUEUE, &disk->state)) > 593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); > > SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in > this scenario. Yes, you're right. Please explain this in commit message as well. > > Thanks. > >> >> Thanks, >> Kuai >> >>> mutex_lock(&disk->open_mutex) >>> >>> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't >>> try to acquire disk->open_mutex after freezing the queue. >>> >>> Signed-off-by: Yang Yang <yang.yang@vivo.com> >>> --- >>> block/genhd.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/block/genhd.c b/block/genhd.c >>> index 8f1f3c6b4d67..c5fca3e893a0 100644 >>> --- a/block/genhd.c >>> +++ b/block/genhd.c >>> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk) >>> */ >>> if (!test_bit(GD_DEAD, &disk->state)) >>> blk_report_disk_dead(disk, false); >>> - __blk_mark_disk_dead(disk); >>> /* >>> * Drop all partitions now that the disk is marked dead. >>> */ >>> mutex_lock(&disk->open_mutex); >>> + __blk_mark_disk_dead(disk); >>> xa_for_each_start(&disk->part_tbl, idx, part, 1) >>> drop_partition(part); >>> mutex_unlock(&disk->open_mutex); Take a look at del_gendisk(), between freeze and unfreeze queue, I notice that there is also device_del() here, which means it will wait for all sysfs readers/writers to be done, so related sysfs api can't issue IO to trigger blk_queue_enter(). And this behaviour does't look reasonable, we never forbit this. Then take a look at scsi sysfs api, I notice that scsi_rescan_device() can be called and issue IO. Hence other than the 'open_mutex', same deadlock still exist: t1: t2 store_state_field scsi_rescan_device scsi_attach_vpd scsi_get_vpd_buf scsi_vpd_inquiry scsi_execute_cmd del_gendisk bdev_unhash blk_freeze_queue_start blk_queue_enter device_del kobject_del sysfs_remove_dir I didn't test this, just by code review, and I could be wrong. And I don't check all the procedures between freeze and unfreeze yet. Thanks, Kuai >>> >> > > . >
On 2024/7/17 11:53, Yu Kuai wrote: > Hi, > > 在 2024/07/16 17:30, YangYang 写道: >> On 2024/7/16 17:15, Yu Kuai wrote: >>> Hi, >>> >>> 在 2024/07/16 16:38, Yang Yang 写道: >>>> Our test report the following hung task: >>>> >>>> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds. >>>> [ 2538.459427] Call trace: >>>> [ 2538.459430] __switch_to+0x174/0x338 >>>> [ 2538.459436] __schedule+0x628/0x9c4 >>>> [ 2538.459442] schedule+0x7c/0xe8 >>>> [ 2538.459447] schedule_preempt_disabled+0x24/0x40 >>>> [ 2538.459453] __mutex_lock+0x3ec/0xf04 >>>> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 >>>> [ 2538.459459] mutex_lock+0x30/0xd8 >>>> [ 2538.459462] del_gendisk+0xdc/0x350 >>>> [ 2538.459466] sd_remove+0x30/0x60 >>>> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 >>>> [ 2538.459474] device_release_driver+0x18/0x28 >>>> [ 2538.459478] bus_remove_device+0x15c/0x174 >>>> [ 2538.459483] device_del+0x1d0/0x358 >>>> [ 2538.459488] __scsi_remove_device+0xa8/0x198 >>>> [ 2538.459493] scsi_forget_host+0x50/0x70 >>>> [ 2538.459497] scsi_remove_host+0x80/0x180 >>>> [ 2538.459502] usb_stor_disconnect+0x68/0xf4 >>>> [ 2538.459506] usb_unbind_interface+0xd4/0x280 >>>> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 >>>> [ 2538.459514] device_release_driver+0x18/0x28 >>>> [ 2538.459518] bus_remove_device+0x15c/0x174 >>>> [ 2538.459523] device_del+0x1d0/0x358 >>>> [ 2538.459528] usb_disable_device+0x84/0x194 >>>> [ 2538.459532] usb_disconnect+0xec/0x300 >>>> [ 2538.459537] hub_event+0xb80/0x1870 >>>> [ 2538.459541] process_scheduled_works+0x248/0x4dc >>>> [ 2538.459545] worker_thread+0x244/0x334 >>>> [ 2538.459549] kthread+0x114/0x1bc >>>> >>>> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds. >>>> [ 2538.461014] Call trace: >>>> [ 2538.461016] __switch_to+0x174/0x338 >>>> [ 2538.461021] __schedule+0x628/0x9c4 >>>> [ 2538.461025] schedule+0x7c/0xe8 >>>> [ 2538.461030] blk_queue_enter+0xc4/0x160 >>>> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 >>>> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c >>>> [ 2538.461040] ioctl_internal_command+0x5c/0x164 >>>> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 >>>> [ 2538.461051] sd_release+0x50/0x94 >>>> [ 2538.461054] blkdev_put+0x190/0x28c >>>> [ 2538.461058] blkdev_release+0x28/0x40 >>>> [ 2538.461063] __fput+0xf8/0x2a8 >>>> [ 2538.461066] __fput_sync+0x28/0x5c >>>> [ 2538.461070] __arm64_sys_close+0x84/0xe8 >>>> [ 2538.461073] invoke_syscall+0x58/0x114 >>>> [ 2538.461078] el0_svc_common+0xac/0xe0 >>>> [ 2538.461082] do_el0_svc+0x1c/0x28 >>>> [ 2538.461087] el0_svc+0x38/0x68 >>>> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc >>>> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac >>>> >>>> T1: T2: >>>> sd_remove >>>> del_gendisk >>>> __blk_mark_disk_dead >>>> blk_freeze_queue_start >>>> ++q->mq_freeze_depth >>>> bdev_release >>>> mutex_lock(&disk->open_mutex) >>>> sd_release >>>> scsi_execute_cmd >>>> blk_queue_enter >>>> wait_event(!q->mq_freeze_depth) >>> >>> This looks wrong, there is a condition blk_queue_dying() in >>> blk_queue_enter(). >> >> 584 static void __blk_mark_disk_dead(struct gendisk *disk) >> 585 { >> ...... >> 591 >> 592 if (test_bit(GD_OWNS_QUEUE, &disk->state)) >> 593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); >> >> SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in >> this scenario. > > Yes, you're right. Please explain this in commit message as well. Got it. >> >> Thanks. >> >>> >>> Thanks, >>> Kuai >>> >>>> mutex_lock(&disk->open_mutex) >>>> >>>> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't >>>> try to acquire disk->open_mutex after freezing the queue. >>>> >>>> Signed-off-by: Yang Yang <yang.yang@vivo.com> >>>> --- >>>> block/genhd.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/block/genhd.c b/block/genhd.c >>>> index 8f1f3c6b4d67..c5fca3e893a0 100644 >>>> --- a/block/genhd.c >>>> +++ b/block/genhd.c >>>> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk) >>>> */ >>>> if (!test_bit(GD_DEAD, &disk->state)) >>>> blk_report_disk_dead(disk, false); >>>> - __blk_mark_disk_dead(disk); >>>> /* >>>> * Drop all partitions now that the disk is marked dead. >>>> */ >>>> mutex_lock(&disk->open_mutex); >>>> + __blk_mark_disk_dead(disk); >>>> xa_for_each_start(&disk->part_tbl, idx, part, 1) >>>> drop_partition(part); >>>> mutex_unlock(&disk->open_mutex); > > Take a look at del_gendisk(), between freeze and unfreeze queue, I > notice that there is also device_del() here, which means it will wait > for all sysfs readers/writers to be done, so related sysfs api can't > issue IO to trigger blk_queue_enter(). And this behaviour does't look > reasonable, we never forbit this. > > Then take a look at scsi sysfs api, I notice that scsi_rescan_device() > can be called and issue IO. Hence other than the 'open_mutex', same > deadlock still exist: > > t1: t2 > store_state_field > scsi_rescan_device > scsi_attach_vpd > scsi_get_vpd_buf > scsi_vpd_inquiry > scsi_execute_cmd > del_gendisk > bdev_unhash > blk_freeze_queue_start > blk_queue_enter > device_del > kobject_del > sysfs_remove_dir > > I didn't test this, just by code review, and I could be wrong. And > I don't check all the procedures between freeze and unfreeze yet. These sysfs nodes are in different directories, the scsi node located at /sys/bus/scsi/devices/0:0:0:0 and the gendisk node located at /sys/block/sda. Would it be necessary to wait for the completion of the scsi sysfs nodes' read/write operations before removing the gendisk sysfs node? Thanks.
On 7/17/24 3:15 AM, YangYang wrote: > These sysfs nodes are in different directories, the scsi node located > at /sys/bus/scsi/devices/0:0:0:0 and the gendisk node located at > /sys/block/sda. Would it be necessary to wait for the completion of > the scsi sysfs nodes' read/write operations before removing the > gendisk sysfs node? No. sysfs_remove_files() waits for pending read and write operations to complete. Bart.
在 2024/07/18 0:23, Bart Van Assche 写道: > On 7/17/24 3:15 AM, YangYang wrote: >> These sysfs nodes are in different directories, the scsi node located >> at /sys/bus/scsi/devices/0:0:0:0 and the gendisk node located at >> /sys/block/sda. Would it be necessary to wait for the completion of >> the scsi sysfs nodes' read/write operations before removing the >> gendisk sysfs node? > > No. sysfs_remove_files() waits for pending read and write operations to > complete. > So I check this and actually gendisk kobject is a child of scsi disk kobject, not the other way. sda/device is just a symbol link. sda/device -> ../../../14:2:0:0. So, the sysfs api for gendisk itself doesn't issue IO, and as long as related driver doesn't create new such api under gendisk, this won't be a problem. Thanks, Kuai > Bart. > > . >
diff --git a/block/genhd.c b/block/genhd.c index 8f1f3c6b4d67..c5fca3e893a0 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk) */ if (!test_bit(GD_DEAD, &disk->state)) blk_report_disk_dead(disk, false); - __blk_mark_disk_dead(disk); /* * Drop all partitions now that the disk is marked dead. */ mutex_lock(&disk->open_mutex); + __blk_mark_disk_dead(disk); xa_for_each_start(&disk->part_tbl, idx, part, 1) drop_partition(part); mutex_unlock(&disk->open_mutex);
Our test report the following hung task: [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds. [ 2538.459427] Call trace: [ 2538.459430] __switch_to+0x174/0x338 [ 2538.459436] __schedule+0x628/0x9c4 [ 2538.459442] schedule+0x7c/0xe8 [ 2538.459447] schedule_preempt_disabled+0x24/0x40 [ 2538.459453] __mutex_lock+0x3ec/0xf04 [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 [ 2538.459459] mutex_lock+0x30/0xd8 [ 2538.459462] del_gendisk+0xdc/0x350 [ 2538.459466] sd_remove+0x30/0x60 [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 [ 2538.459474] device_release_driver+0x18/0x28 [ 2538.459478] bus_remove_device+0x15c/0x174 [ 2538.459483] device_del+0x1d0/0x358 [ 2538.459488] __scsi_remove_device+0xa8/0x198 [ 2538.459493] scsi_forget_host+0x50/0x70 [ 2538.459497] scsi_remove_host+0x80/0x180 [ 2538.459502] usb_stor_disconnect+0x68/0xf4 [ 2538.459506] usb_unbind_interface+0xd4/0x280 [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 [ 2538.459514] device_release_driver+0x18/0x28 [ 2538.459518] bus_remove_device+0x15c/0x174 [ 2538.459523] device_del+0x1d0/0x358 [ 2538.459528] usb_disable_device+0x84/0x194 [ 2538.459532] usb_disconnect+0xec/0x300 [ 2538.459537] hub_event+0xb80/0x1870 [ 2538.459541] process_scheduled_works+0x248/0x4dc [ 2538.459545] worker_thread+0x244/0x334 [ 2538.459549] kthread+0x114/0x1bc [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds. [ 2538.461014] Call trace: [ 2538.461016] __switch_to+0x174/0x338 [ 2538.461021] __schedule+0x628/0x9c4 [ 2538.461025] schedule+0x7c/0xe8 [ 2538.461030] blk_queue_enter+0xc4/0x160 [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 [ 2538.461037] scsi_execute_cmd+0x7c/0x23c [ 2538.461040] ioctl_internal_command+0x5c/0x164 [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 [ 2538.461051] sd_release+0x50/0x94 [ 2538.461054] blkdev_put+0x190/0x28c [ 2538.461058] blkdev_release+0x28/0x40 [ 2538.461063] __fput+0xf8/0x2a8 [ 2538.461066] __fput_sync+0x28/0x5c [ 2538.461070] __arm64_sys_close+0x84/0xe8 [ 2538.461073] invoke_syscall+0x58/0x114 [ 2538.461078] el0_svc_common+0xac/0xe0 [ 2538.461082] do_el0_svc+0x1c/0x28 [ 2538.461087] el0_svc+0x38/0x68 [ 2538.461090] el0t_64_sync_handler+0x68/0xbc [ 2538.461093] el0t_64_sync+0x1a8/0x1ac T1: T2: sd_remove del_gendisk __blk_mark_disk_dead blk_freeze_queue_start ++q->mq_freeze_depth bdev_release mutex_lock(&disk->open_mutex) sd_release scsi_execute_cmd blk_queue_enter wait_event(!q->mq_freeze_depth) mutex_lock(&disk->open_mutex) This is a classic ABBA deadlock. To fix the deadlock, make sure we don't try to acquire disk->open_mutex after freezing the queue. Signed-off-by: Yang Yang <yang.yang@vivo.com> --- block/genhd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)