Message ID | 20240621113051.29523-1-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] ufs: core: fix ufshcd_abort_all racing issue | expand |
On 6/21/24 4:30 AM, peter.wang@mediatek.com wrote: > [ ... ] Fixes: and Cc: stable tags are missing from this patch. Please add these. > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c > index 8944548c30fa..3b2e5bcb08a7 100644 > --- a/drivers/ufs/core/ufs-mcq.c > +++ b/drivers/ufs/core/ufs-mcq.c > @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag) > return -ETIMEDOUT; > > if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) { > - if (!cmd) > - return -EINVAL; > + /* Should return 0 if cmd is already complete by irq */ > + if (!cmd || !ufshcd_cmd_inflight(cmd)) > + return 0; > hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd)); > } else { > hwq = hba->dev_cmd_queue; This code change reduces the race window but does not eliminate it since no locks are held around this code path. Additionally, are you sure that this change is necessary? I don't think that ufshcd_err_handler() calls ufshcd_mcq_sq_cleanup(). Thanks, Bart.
On Fri, 2024-06-21 at 10:15 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 6/21/24 4:30 AM, peter.wang@mediatek.com wrote: > > [ ... ] > > Fixes: and Cc: stable tags are missing from this patch. Please add > these. > Hi Bart, Okay, will add next version. Thanks. > > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs- > mcq.c > > index 8944548c30fa..3b2e5bcb08a7 100644 > > --- a/drivers/ufs/core/ufs-mcq.c > > +++ b/drivers/ufs/core/ufs-mcq.c > > @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, > int task_tag) > > return -ETIMEDOUT; > > > > if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) { > > -if (!cmd) > > -return -EINVAL; > > +/* Should return 0 if cmd is already complete by irq */ > > +if (!cmd || !ufshcd_cmd_inflight(cmd)) > > +return 0; > > hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd)); > > } else { > > hwq = hba->dev_cmd_queue; > > This code change reduces the race window but does not eliminate it > since > no locks are held around this code path. > > Additionally, are you sure that this change is necessary? I don't > think > that ufshcd_err_handler() calls ufshcd_mcq_sq_cleanup(). > > Thanks, > > Bart. Hi Bart, Yes, this code reduces the race window only. Beacuse if we want get mcq lock, we need know witch hwq[i] lock can use. But before we get which hwq[i] lock could use, we will get this null pointer error. This is quite the chicken or the eqq dilemma. Could you have any suggestion? Additionally, here is the backtrace of ufshcd_mcq_sq_cleanup. ufshcd_try_to_abort_task: cmd pending in the device. tag = 6 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194 pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14 lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise] Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise] Call trace: dump_backtrace+0xf8/0x148 show_stack+0x18/0x24 dump_stack_lvl+0x60/0x7c dump_stack+0x18/0x3c mrdump_common_die+0x24c/0x398 [mrdump] ipanic_die+0x20/0x34 [mrdump] notify_die+0x80/0xd8 die+0x94/0x2b8 __do_kernel_fault+0x264/0x298 do_page_fault+0xa4/0x4b8 do_translation_fault+0x38/0x54 do_mem_abort+0x58/0x118 el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c blk_mq_unique_tag+0x8/0x14 ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise] ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise] ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise] process_one_work+0x208/0x4fc worker_thread+0x228/0x438 kthread+0x104/0x1d4 ret_from_fork+0x10/0x20 Thanks. Peter
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c index 8944548c30fa..3b2e5bcb08a7 100644 --- a/drivers/ufs/core/ufs-mcq.c +++ b/drivers/ufs/core/ufs-mcq.c @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag) return -ETIMEDOUT; if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) { - if (!cmd) - return -EINVAL; + /* Should return 0 if cmd is already complete by irq */ + if (!cmd || !ufshcd_cmd_inflight(cmd)) + return 0; hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd)); } else { hwq = hba->dev_cmd_queue; diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index e5e9da61f15d..e8bca62ceed8 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -6455,11 +6455,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv) /* Release cmd in MCQ mode if abort succeeds */ if (is_mcq_enabled(hba) && (*ret == 0)) { - hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd)); - spin_lock_irqsave(&hwq->cq_lock, flags); - if (ufshcd_cmd_inflight(lrbp->cmd)) + if (ufshcd_cmd_inflight(lrbp->cmd)) { + hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd)); + spin_lock_irqsave(&hwq->cq_lock, flags); ufshcd_release_scsi_cmd(hba, lrbp); - spin_unlock_irqrestore(&hwq->cq_lock, flags); + spin_unlock_irqrestore(&hwq->cq_lock, flags); + } } return *ret == 0;