Message ID | 20211013150116.31158-2-adrian.hunter@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | scsi: ufs: core: Fix task management completion timeout race | expand |
On 10/13/21 08:01, Adrian Hunter wrote: > __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, > which races with the completion function ufshcd_tmc_handler() which > expects req->end_io_data to have a value. > > Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already > synchronized using hba->tmf_rqs and hba->outstanding_tasks under the > host_lock spinlock. > > It is also not necessary (nor typical) to clear req->end_io_data because > the block layer does it before allocating out requests e.g. via > blk_get_request(). > > So fix by not clearing it. > > Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/scsi/ufs/ufshcd.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 95be7ecdfe10..f34b3994d1aa 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, > err = wait_for_completion_io_timeout(&wait, > msecs_to_jiffies(TM_CMD_TIMEOUT)); > if (!err) { > - /* > - * Make sure that ufshcd_compl_tm() does not trigger a > - * use-after-free. > - */ > - req->end_io_data = NULL; > ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); > dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", > __func__, tm_function); With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion. Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()? Thanks, Bart.
On 14/10/2021 07:14, Bart Van Assche wrote: > On 10/13/21 08:01, Adrian Hunter wrote: >> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, >> which races with the completion function ufshcd_tmc_handler() which >> expects req->end_io_data to have a value. >> >> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already >> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the >> host_lock spinlock. >> >> It is also not necessary (nor typical) to clear req->end_io_data because >> the block layer does it before allocating out requests e.g. via >> blk_get_request(). >> >> So fix by not clearing it. >> >> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> >> --- >> drivers/scsi/ufs/ufshcd.c | 5 ----- >> 1 file changed, 5 deletions(-) >> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 95be7ecdfe10..f34b3994d1aa 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, >> err = wait_for_completion_io_timeout(&wait, >> msecs_to_jiffies(TM_CMD_TIMEOUT)); >> if (!err) { >> - /* >> - * Make sure that ufshcd_compl_tm() does not trigger a >> - * use-after-free. >> - */ >> - req->end_io_data = NULL; >> ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); >> dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", >> __func__, tm_function); > > With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion. AFAICT that would only happen because blk_mq_tagset_busy_iter() is not synchronized with respect to blk_put_request(). But ufshcd_tmc_handler() does not use blk_mq_tagset_busy_iter() anymore, so that can't happen. Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()? If that were needed, it would imply the synchronization was broken i.e. why are we referencing a request that has already been through blk_put_request()?
On 10/13/21 11:02 PM, Adrian Hunter wrote: > On 14/10/2021 07:14, Bart Van Assche wrote: >> Wouldn't it be better to keep the code that clears req->end_io_data >> and to change complete(c) into if(c) complete(c) in >> ufshcd_tmc_handler()? > > If that were needed, it would imply the synchronization was broken > i.e. why are we referencing a request that has already been through > blk_put_request()? The scenario I'm worried about is as follows: * __ufshcd_issue_tm_cmd() issues a task management function. * No completion is received before TM_CMD_TIMEOUT has expired (100 ms). * ufshcd_clear_tm_cmd() fails. * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data). Can this happen? I agree that this scenario involves completion of a request that has already been through blk_put_request(). Thanks, Bart.
On 14/10/2021 19:47, Bart Van Assche wrote: > On 10/13/21 11:02 PM, Adrian Hunter wrote: >> On 14/10/2021 07:14, Bart Van Assche wrote: >>> Wouldn't it be better to keep the code that clears req->end_io_data >>> and to change complete(c) into if(c) complete(c) in >>> ufshcd_tmc_handler()? >> >> If that were needed, it would imply the synchronization was broken >> i.e. why are we referencing a request that has already been through >> blk_put_request()? > > The scenario I'm worried about is as follows: > * __ufshcd_issue_tm_cmd() issues a task management function. > * No completion is received before TM_CMD_TIMEOUT has expired (100 ms). > * ufshcd_clear_tm_cmd() fails. > * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data). > > Can this happen? No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd() and ufshcd_clear_tm_cmd(). > > I agree that this scenario involves completion of a request that has already been through blk_put_request(). > > Thanks, > > Bart.
On 10/13/21 8:01 AM, Adrian Hunter wrote: > __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, > which races with the completion function ufshcd_tmc_handler() which > expects req->end_io_data to have a value. > > Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already > synchronized using hba->tmf_rqs and hba->outstanding_tasks under the > host_lock spinlock. > > It is also not necessary (nor typical) to clear req->end_io_data because > the block layer does it before allocating out requests e.g. via > blk_get_request(). > > So fix by not clearing it. > > Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/scsi/ufs/ufshcd.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 95be7ecdfe10..f34b3994d1aa 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, > err = wait_for_completion_io_timeout(&wait, > msecs_to_jiffies(TM_CMD_TIMEOUT)); > if (!err) { > - /* > - * Make sure that ufshcd_compl_tm() does not trigger a > - * use-after-free. > - */ > - req->end_io_data = NULL; > ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); > dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", > __func__, tm_function); Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On 14/10/2021 21:50, Adrian Hunter wrote: > On 14/10/2021 19:47, Bart Van Assche wrote: >> On 10/13/21 11:02 PM, Adrian Hunter wrote: >>> On 14/10/2021 07:14, Bart Van Assche wrote: >>>> Wouldn't it be better to keep the code that clears req->end_io_data >>>> and to change complete(c) into if(c) complete(c) in >>>> ufshcd_tmc_handler()? >>> >>> If that were needed, it would imply the synchronization was broken >>> i.e. why are we referencing a request that has already been through >>> blk_put_request()? >> >> The scenario I'm worried about is as follows: >> * __ufshcd_issue_tm_cmd() issues a task management function. >> * No completion is received before TM_CMD_TIMEOUT has expired (100 ms). >> * ufshcd_clear_tm_cmd() fails. >> * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data). >> >> Can this happen? > > No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and > access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd() > and ufshcd_clear_tm_cmd(). Although I just noticed a different issue. In ufshcd_tmc_handler() the task doorbell register needs to be read in conjunction with outstanding_tasks i.e. under the spinlock I will send a V2. > >> >> I agree that this scenario involves completion of a request that has already been through blk_put_request(). >> >> Thanks, >> >> Bart. >
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 95be7ecdfe10..f34b3994d1aa 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, err = wait_for_completion_io_timeout(&wait, msecs_to_jiffies(TM_CMD_TIMEOUT)); if (!err) { - /* - * Make sure that ufshcd_compl_tm() does not trigger a - * use-after-free. - */ - req->end_io_data = NULL; ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", __func__, tm_function);
__ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, which races with the completion function ufshcd_tmc_handler() which expects req->end_io_data to have a value. Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already synchronized using hba->tmf_rqs and hba->outstanding_tasks under the host_lock spinlock. It is also not necessary (nor typical) to clear req->end_io_data because the block layer does it before allocating out requests e.g. via blk_get_request(). So fix by not clearing it. Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> --- drivers/scsi/ufs/ufshcd.c | 5 ----- 1 file changed, 5 deletions(-)