Message ID | 1611807365-35513-3-git-send-email-cang@codeaurora.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Three fixes for task management request implementation | expand |
On 1/27/21 8:16 PM, Can Guo wrote: > ufshcd_compl_tm() looks for all 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL > and call complete() for each req who has the req->end_io_data set. There > can be a race condition btw tmc send/compl, because the req->end_io_data is > set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it is > possible that when ufshcd_compl_tm() checks the req->end_io_data, it is set > but the corresponding tag has not been set in REG_UTP_TASK_REQ_DOOR_BELL. > Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not been > sent out. Fix it by protecting req->end_io_data with host lock, and let > ufshcd_compl_tm() only handle those tm cmds which have been completed > instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL. I don't know any other block driver that needs locking to protect races between submission and completion context. Can the block layer timeout mechanism be used instead of the mechanism introduced by this patch, e.g. by using blk_execute_rq_nowait() to submit requests? That would allow to reuse the existing mechanism in the block layer core to handle races between request completion and timeout handling. Thanks, Bart.
On 2021-01-29 11:20, Bart Van Assche wrote: > On 1/27/21 8:16 PM, Can Guo wrote: >> ufshcd_compl_tm() looks for all 0 bits in the >> REG_UTP_TASK_REQ_DOOR_BELL >> and call complete() for each req who has the req->end_io_data set. >> There >> can be a race condition btw tmc send/compl, because the >> req->end_io_data is >> set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it >> is >> possible that when ufshcd_compl_tm() checks the req->end_io_data, it >> is set >> but the corresponding tag has not been set in >> REG_UTP_TASK_REQ_DOOR_BELL. >> Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not >> been >> sent out. Fix it by protecting req->end_io_data with host lock, and >> let >> ufshcd_compl_tm() only handle those tm cmds which have been completed >> instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL. > > I don't know any other block driver that needs locking to protect races > between submission and completion context. Can the block layer timeout > mechanism be used instead of the mechanism introduced by this patch, > e.g. by using blk_execute_rq_nowait() to submit requests? That would > allow to reuse the existing mechanism in the block layer core to handle > races between request completion and timeout handling. This patch is not introducing any new mechanism, it is fixing the usage of completion (req->end_io_data = c) introduced by commit 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs"). If you have better idea to get it fixed once for all, we are glad to take your change to get it fixed asap. Regards, Can Guo. > > Thanks, > > Bart.
On 2021-01-29 14:06, Can Guo wrote: > On 2021-01-29 11:20, Bart Van Assche wrote: >> On 1/27/21 8:16 PM, Can Guo wrote: >>> ufshcd_compl_tm() looks for all 0 bits in the >>> REG_UTP_TASK_REQ_DOOR_BELL >>> and call complete() for each req who has the req->end_io_data set. >>> There >>> can be a race condition btw tmc send/compl, because the >>> req->end_io_data is >>> set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it >>> is >>> possible that when ufshcd_compl_tm() checks the req->end_io_data, it >>> is set >>> but the corresponding tag has not been set in >>> REG_UTP_TASK_REQ_DOOR_BELL. >>> Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not >>> been >>> sent out. Fix it by protecting req->end_io_data with host lock, and >>> let >>> ufshcd_compl_tm() only handle those tm cmds which have been completed >>> instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL. >> >> I don't know any other block driver that needs locking to protect >> races >> between submission and completion context. Can the block layer timeout >> mechanism be used instead of the mechanism introduced by this patch, >> e.g. by using blk_execute_rq_nowait() to submit requests? That would >> allow to reuse the existing mechanism in the block layer core to >> handle >> races between request completion and timeout handling. > > This patch is not introducing any new mechanism, it is fixing the > usage of completion (req->end_io_data = c) introduced by commit > 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate > and free TMFs"). If you have better idea to get it fixed once for > all, we are glad to take your change to get it fixed asap. > > Regards, > > Can Guo. > On second thought, actually the 1st fix alone is enough to eliminate the race condition. Because blk_mq_tagset_busy_iter() only iterates over all requests which are not in IDLE state, if blk_mq_start_request() is called within the protection of host spin lock, ufshcd_compl_tm() shall not run into the scenario where req->end_io_data is set but REG_UTP_TASK_REQ_DOOR_BELL has not been set. What do you think? Thanks, Can Guo. >> >> Thanks, >> >> Bart.
On 1/28/21 10:29 PM, Can Guo wrote: > On second thought, actually the 1st fix alone is enough to eliminate the > race condition. Because blk_mq_tagset_busy_iter() only iterates over all > requests which are not in IDLE state, if blk_mq_start_request() is called > within the protection of host spin lock, ufshcd_compl_tm() shall not run > into the scenario where req->end_io_data is set but > REG_UTP_TASK_REQ_DOOR_BELL > has not been set. What do you think? That sounds reasonable to me. Thanks, Bart.
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index c0c5925..43894a3 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -6225,7 +6225,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba) struct ctm_info { struct ufs_hba *hba; - unsigned long pending; + unsigned long completed; unsigned int ncpl; }; @@ -6234,13 +6234,13 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved) struct ctm_info *const ci = priv; struct completion *c; - WARN_ON_ONCE(reserved); - if (test_bit(req->tag, &ci->pending)) - return true; - ci->ncpl++; - c = req->end_io_data; - if (c) - complete(c); + if (test_bit(req->tag, &ci->completed)) { + __clear_bit(req->tag, &ci->hba->outstanding_tasks); + ci->ncpl++; + c = req->end_io_data; + if (c) + complete(c); + } return true; } @@ -6255,12 +6255,19 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved) static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba) { struct request_queue *q = hba->tmf_queue; + u32 tm_doorbell; + unsigned long completed; struct ctm_info ci = { - .hba = hba, - .pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL), + .hba = hba, + .ncpl = 0, }; - blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci); + tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL); + completed = tm_doorbell ^ hba->outstanding_tasks; + if (completed) { + ci.completed = completed; + blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci); + } return ci.ncpl ? IRQ_HANDLED : IRQ_NONE; } @@ -6388,12 +6395,12 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, if (IS_ERR(req)) return PTR_ERR(req); - req->end_io_data = &wait; free_slot = req->tag; WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs); ufshcd_hold(hba, false); spin_lock_irqsave(host->host_lock, flags); + req->end_io_data = &wait; task_tag = hba->nutrs + free_slot; blk_mq_start_request(req); @@ -6420,11 +6427,13 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, err = wait_for_completion_io_timeout(&wait, msecs_to_jiffies(TM_CMD_TIMEOUT)); if (!err) { + spin_lock_irqsave(hba->host->host_lock, flags); /* * Make sure that ufshcd_compl_tm() does not trigger a * use-after-free. */ req->end_io_data = NULL; + spin_unlock_irqrestore(hba->host->host_lock, flags); ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", __func__, tm_function);
ufshcd_compl_tm() looks for all 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL and call complete() for each req who has the req->end_io_data set. There can be a race condition btw tmc send/compl, because the req->end_io_data is set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it is possible that when ufshcd_compl_tm() checks the req->end_io_data, it is set but the corresponding tag has not been set in REG_UTP_TASK_REQ_DOOR_BELL. Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not been sent out. Fix it by protecting req->end_io_data with host lock, and let ufshcd_compl_tm() only handle those tm cmds which have been completed instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL. Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs") Signed-off-by: Can Guo <cang@codeaurora.org> --- drivers/scsi/ufs/ufshcd.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-)