Message ID | 20220415061243.30229-2-alice.chao@mediatek.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v4,1/1] scsi: Fix racing between dev init and dev reset | expand |
On 4/14/22 23:12, Alice Chao wrote: > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 0a70aa763a96..abf9a71ed77c 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt) > break; > case SDEV_EVT_POWER_ON_RESET_OCCURRED: > envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED"; > + wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL); > break; > default: > /* do nothing */ > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index f4e6c68ac99e..431f229ac435 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost) > } else { > scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD, > SCAN_WILD_CARD, 0); > + wake_up(&shost->host_wait); > } > } The above patch can only work if scanning a host is followed by a POWER ON unit attention. I don't think that is guaranteed. Additionally, what will happen with the wait loop if the device generates a POWER on unit attention after the device has been removed from its parent? kobject_del() clears the parent pointer. device_del() calls kobject_del(). scsi_remove_device() calls device_del() before the request queue is destroyed. Thanks, Bart.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0a70aa763a96..abf9a71ed77c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt) break; case SDEV_EVT_POWER_ON_RESET_OCCURRED: envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED"; + wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL); break; default: /* do nothing */ diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index f4e6c68ac99e..431f229ac435 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost) } else { scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD, SCAN_WILD_CARD, 0); + wake_up(&shost->host_wait); } }
Device reset thread uses kobject_uevent_env() to get kobj.parent, and it races with device init thread which calls device_add() to add kobj.parent before kobject_uevent_env(). Device init call: Device reset call: scsi_probe_and_add_lun() scsi_evt_thread() scsi_add_lun() scsi_evt_emit() scsi_sysfs_add_sdev() kobject_uevent_env() //get kobj.parent scsi_target_add() kobject_get_path() len = get_kobj_path_length () // len=1 because parent hasn't created yet device_add() // add kobj.parent kobject_uevent_env() kobject_get_path() path = kzalloc() fill_kobj_path() fill_kobj_path() // --length; length -= cur is a negative value memcpy(path + length, kobject_name(parent), cur); // slab OOB! Above backtrace describes the problem, device reset thread will get wrong kobj.parent when device init thread didn’t add kobj.parent yet. When this racing happened, it triggers the a KASAN dump on the final iteration: BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8 Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58 <snip> Call trace: __kasan_report+0x124/0x1c8 kasan_report+0x54/0x84 kasan_check_range+0x200/0x208 memcpy+0xb8/0xf0 kobject_get_path+0xf8/0x1b8 kobject_uevent_env+0x228/0xa88 scsi_evt_thread+0x2d0/0x5b0 process_one_work+0x570/0xf94 worker_thread+0x7cc/0xf80 kthread+0x2c4/0x388 These two jobs are scheduled asynchronously, we can't guaranteed that kobj.parent will be created in device init thread before device reset thread calls kobject_get_path(). To resolve the racing issue between device init thread and device reset thread, we use wait_event() in scsi_evt_emit() to wait for device_add() to complete the creation of kobj.parent. Device init call: Device reset call: ufshcd_async_scan() scsi_evt_thread() scsi_scan_host() scsi_evt_emit() <- add wait_event() do_scsi_scan_host() <- add wake_up() scsi_scan_host_selected() scsi_scan_channel() scsi_probe_and_add_lun() scsi_target_add() device_add() // add kobj.parent kobject_uevent_env() kobject_get_path() fill_kobj_path() // call wake_up() after scsi_scan_host_selected is done kobject_uevent_env() // add kobj.parent kobject_get_path() // get valid kobj.parent fill_kobj_path() After we add wake_up at do_scsi_scan_host() in device init thread, we can ensure that device reset thread will get kobject after device init thread finishes adding parent. Signed-off-by: Alice Chao <alice.chao@mediatek.com> --- Change in v4 -Change commit: Change call stack description. Change in v3 -Change commit: Describe the preblem first and then the solution. -Add commit: Add KASAN error log. Change in v2 -Remove Change-Id. --- drivers/scsi/scsi_lib.c | 1 + drivers/scsi/scsi_scan.c | 1 + 2 files changed, 2 insertions(+)