Message ID | 20220415040446.26451-2-alice.chao@mediatek.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3,1/1] scsi: Fix racing between dev init and dev reset | expand |
Hi Alice, > Device reset thread uses kobject_uevent_env() to get kobj.parent, and it > aces with device init thread which calls device_add() to add kobj.parent "aces" may be "races"? > before kobject_uevent_env(). > > Device init call: Device reset call: > scsi_probe_and_add_lun() scsi_evt_thread() > scsi_add_lun() scsi_evt_emit() > scsi_sysfs_add_sdev() kobject_uevent_env() //get kobj.parent > scsi_target_add() kobject_get_path() > len = get_kobj_path_length () // len=1 because parent hasn't created yet > device_add() // add kobj.parent > kobject_uevent_env() > kobject_get_path() path = kzalloc() > fill_kobj_path() fill_kobj_path() // --length; length -= cur is a negative value > memcpy(path + length, kobject_name(parent), cur); // slab OOB! > > Above backtrace describes the problem, device reset thread will get wrong > kobj.parent when device init thread didn’t add kobj.parent yet. When this > racing happened, it triggers the a KASAN dump on the final iteration: > > BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8 > Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58 > <snip> > Call trace: > __kasan_report+0x124/0x1c8 > kasan_report+0x54/0x84 > kasan_check_range+0x200/0x208 > memcpy+0xb8/0xf0 > kobject_get_path+0xf8/0x1b8 > kobject_uevent_env+0x228/0xa88 > scsi_evt_thread+0x2d0/0x5b0 > process_one_work+0x570/0xf94 > worker_thread+0x7cc/0xf80 > kthread+0x2c4/0x388 > > These two jobs are scheduled asynchronously, we can't guaranteed that > kobj.parent will be created in device init thread before device reset > thread calls kobject_get_path(). > > To resolve the racing issue between device init thread and device reset > thread, we use wait_event() in scsi_evt_emit() to wait for device_add() > to complete the creation of kobj.parent. > > Device init call: Device reset call: > ufshcd_async_scan() scsi_evt_thread() > scsi_scan_host() scsi_evt_emit() <- add wait_event() > do_scsi_scan_host() <- add wake_up() > scsi_scan_host_selected() > scsi_scan_channel() > scsi_probe_and_add_lun() > scsi_target_add() > device_add() // add kobj.parent > kobject_uevent_env() > kobject_get_path() > fill_kobj_path() > do_scan_async() <- wake_up() kobject_uevent_env() // add kobj.parent There is no do_scan_async() changes in this patch. It this a typo? From the patch, the flow looks like: Device init call Device reset call: do_scsi_scan_host() scsi_evt_thread() scsi_scan_host_selected() scsi_evt_emit() <- add wait_event() scsi_scan_channel() scsi_probe_and_add_lun() scsi_target_add() device_add() // add kobj.parent kobject_uevent_env() kobject_get_path() fill_kobj_path() //call wake_up() after scsi_scan_host_selected is done kobject_uevent_env() kobject_get_path() // get valid kobj.parent ... fill_kobj_path() > kobject_get_path() // get valid kobj.parent > fill_kobj_path() > > After we add wake_up at do_scsi_scan_host() in device init thread, we can > ensure that device reset thread will get kobject after device init thread > finishes adding parent. > > Signed-off-by: Alice Chao <alice.chao@mediatek.com> > > --- > > Change since v2 > -Change commit: Describes the preblem first and then the solution. > -Add commit: Add KASAN error log. Please keep all change history. e.g., See https://lore.kernel.org/lkml/20220326022728.2969-1-jianjun.wang@mediatek.com/ as an example Thanks, Miles > > --- > drivers/scsi/scsi_lib.c | 1 + > drivers/scsi/scsi_scan.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 0a70aa763a96..abf9a71ed77c 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt) > break; > case SDEV_EVT_POWER_ON_RESET_OCCURRED: > envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED"; > + wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL); > break; > default: > /* do nothing */ > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index f4e6c68ac99e..431f229ac435 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost) > } else { > scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD, > SCAN_WILD_CARD, 0); > + wake_up(&shost->host_wait); > } > } > > -- > 2.18.0 > >
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0a70aa763a96..abf9a71ed77c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt) break; case SDEV_EVT_POWER_ON_RESET_OCCURRED: envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED"; + wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL); break; default: /* do nothing */ diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index f4e6c68ac99e..431f229ac435 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost) } else { scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD, SCAN_WILD_CARD, 0); + wake_up(&shost->host_wait); } }
Device reset thread uses kobject_uevent_env() to get kobj.parent, and it aces with device init thread which calls device_add() to add kobj.parent before kobject_uevent_env(). Device init call: Device reset call: scsi_probe_and_add_lun() scsi_evt_thread() scsi_add_lun() scsi_evt_emit() scsi_sysfs_add_sdev() kobject_uevent_env() //get kobj.parent scsi_target_add() kobject_get_path() len = get_kobj_path_length () // len=1 because parent hasn't created yet device_add() // add kobj.parent kobject_uevent_env() kobject_get_path() path = kzalloc() fill_kobj_path() fill_kobj_path() // --length; length -= cur is a negative value memcpy(path + length, kobject_name(parent), cur); // slab OOB! Above backtrace describes the problem, device reset thread will get wrong kobj.parent when device init thread didn’t add kobj.parent yet. When this racing happened, it triggers the a KASAN dump on the final iteration: BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8 Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58 <snip> Call trace: __kasan_report+0x124/0x1c8 kasan_report+0x54/0x84 kasan_check_range+0x200/0x208 memcpy+0xb8/0xf0 kobject_get_path+0xf8/0x1b8 kobject_uevent_env+0x228/0xa88 scsi_evt_thread+0x2d0/0x5b0 process_one_work+0x570/0xf94 worker_thread+0x7cc/0xf80 kthread+0x2c4/0x388 These two jobs are scheduled asynchronously, we can't guaranteed that kobj.parent will be created in device init thread before device reset thread calls kobject_get_path(). To resolve the racing issue between device init thread and device reset thread, we use wait_event() in scsi_evt_emit() to wait for device_add() to complete the creation of kobj.parent. Device init call: Device reset call: ufshcd_async_scan() scsi_evt_thread() scsi_scan_host() scsi_evt_emit() <- add wait_event() do_scsi_scan_host() <- add wake_up() scsi_scan_host_selected() scsi_scan_channel() scsi_probe_and_add_lun() scsi_target_add() device_add() // add kobj.parent kobject_uevent_env() kobject_get_path() fill_kobj_path() do_scan_async() <- wake_up() kobject_uevent_env() // add kobj.parent kobject_get_path() // get valid kobj.parent fill_kobj_path() After we add wake_up at do_scsi_scan_host() in device init thread, we can ensure that device reset thread will get kobject after device init thread finishes adding parent. Signed-off-by: Alice Chao <alice.chao@mediatek.com> --- Change since v2 -Change commit: Describes the preblem first and then the solution. -Add commit: Add KASAN error log. --- drivers/scsi/scsi_lib.c | 1 + drivers/scsi/scsi_scan.c | 1 + 2 files changed, 2 insertions(+)