Message ID | 20211123122115.8599-1-mrangankar@marvell.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | qedi: Fix cmd_cleanup_cmpl counter mismatch issue. | expand |
On 11/23/21 4:21 AM, Manish Rangankar wrote: > When issued LUN reset under heavy i/o, we hit the qedi WARN_ON > because of a mismatch in firmware i/o cmd cleanup request count > and i/o cmd cleanup response count received. The mismatch is > because of the race caused by the postfix increment of > cmd_cleanup_cmpl. > > [qedi_clearsq:1295]:18: fatal error, need hard reset, cid=0x0 > WARNING: CPU: 48 PID: 110963 at drivers/scsi/qedi/qedi_fw.c:1296 qedi_clearsq+0xa5/0xd0 [qedi] > CPU: 48 PID: 110963 Comm: kworker/u130:0 Kdump: loaded Tainted: G W > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 04/15/2020 > Workqueue: iscsi_conn_cleanup iscsi_cleanup_conn_work_fn [scsi_transport_iscsi] > RIP: 0010:qedi_clearsq+0xa5/0xd0 [qedi] > RSP: 0018:ffffac2162c7fd98 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff975213c40ab8 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff9761bf816858 RDI: ffff9761bf816858 > RBP: ffff975247018628 R08: 000000000000522c R09: 000000000000005b > R10: 0000000000000000 R11: ffffac2162c7fbd8 R12: ffff97522e1b2be8 > R13: 0000000000000000 R14: ffff97522e1b2800 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff9761bf800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f1a34e3e1a0 CR3: 0000000108bb2000 CR4: 0000000000350ee0 > Call Trace: > qedi_ep_disconnect+0x533/0x550 [qedi] > ? iscsi_dbg_trace+0x63/0x80 [scsi_transport_iscsi] > ? _cond_resched+0x15/0x30 > ? iscsi_suspend_queue+0x19/0x40 [libiscsi] > iscsi_ep_disconnect+0xb0/0x130 [scsi_transport_iscsi] > iscsi_cleanup_conn_work_fn+0x82/0x130 [scsi_transport_iscsi] > process_one_work+0x1a7/0x360 > ? create_worker+0x1a0/0x1a0 > worker_thread+0x30/0x390 > ? create_worker+0x1a0/0x1a0 > kthread+0x116/0x130 > ? kthread_flush_work_fn+0x10/0x10 > ret_from_fork+0x22/0x40 > ---[ end trace 5f1441f59082235c ]--- > > Signed-off-by: Manish Rangankar <mrangankar@marvell.com> > --- > drivers/scsi/qedi/qedi_fw.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c > index 84a4204a2cb4..2eed2c6cf424 100644 > --- a/drivers/scsi/qedi/qedi_fw.c > +++ b/drivers/scsi/qedi/qedi_fw.c > @@ -813,10 +813,11 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi, > > check_cleanup_reqs: > if (qedi_conn->cmd_cleanup_req > 0) { > - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, > - "Freeing tid=0x%x for cid=0x%x\n", > - cqe->itid, qedi_conn->iscsi_conn_id); > - qedi_conn->cmd_cleanup_cmpl++; > + ++qedi_conn->cmd_cleanup_cmpl; > + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, > + "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n", > + cqe->itid, qedi_conn->iscsi_conn_id, > + qedi_conn->cmd_cleanup_cmpl); > wake_up(&qedi_conn->wait_queue); > } else { > QEDI_ERR(&qedi->dbg_ctx, > Reviewed-by: Lee Duncan <lduncan@suse.com>
On 11/23/21 6:21 AM, Manish Rangankar wrote: > When issued LUN reset under heavy i/o, we hit the qedi WARN_ON > because of a mismatch in firmware i/o cmd cleanup request count > and i/o cmd cleanup response count received. The mismatch is > because of the race caused by the postfix increment of > cmd_cleanup_cmpl. > > [qedi_clearsq:1295]:18: fatal error, need hard reset, cid=0x0 > WARNING: CPU: 48 PID: 110963 at drivers/scsi/qedi/qedi_fw.c:1296 qedi_clearsq+0xa5/0xd0 [qedi] > CPU: 48 PID: 110963 Comm: kworker/u130:0 Kdump: loaded Tainted: G W > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 04/15/2020 > Workqueue: iscsi_conn_cleanup iscsi_cleanup_conn_work_fn [scsi_transport_iscsi] > RIP: 0010:qedi_clearsq+0xa5/0xd0 [qedi] > RSP: 0018:ffffac2162c7fd98 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff975213c40ab8 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff9761bf816858 RDI: ffff9761bf816858 > RBP: ffff975247018628 R08: 000000000000522c R09: 000000000000005b > R10: 0000000000000000 R11: ffffac2162c7fbd8 R12: ffff97522e1b2be8 > R13: 0000000000000000 R14: ffff97522e1b2800 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff9761bf800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f1a34e3e1a0 CR3: 0000000108bb2000 CR4: 0000000000350ee0 > Call Trace: > qedi_ep_disconnect+0x533/0x550 [qedi] > ? iscsi_dbg_trace+0x63/0x80 [scsi_transport_iscsi] > ? _cond_resched+0x15/0x30 > ? iscsi_suspend_queue+0x19/0x40 [libiscsi] > iscsi_ep_disconnect+0xb0/0x130 [scsi_transport_iscsi] > iscsi_cleanup_conn_work_fn+0x82/0x130 [scsi_transport_iscsi] > process_one_work+0x1a7/0x360 > ? create_worker+0x1a0/0x1a0 > worker_thread+0x30/0x390 > ? create_worker+0x1a0/0x1a0 > kthread+0x116/0x130 > ? kthread_flush_work_fn+0x10/0x10 > ret_from_fork+0x22/0x40 > ---[ end trace 5f1441f59082235c ]--- > > Signed-off-by: Manish Rangankar <mrangankar@marvell.com> > --- > drivers/scsi/qedi/qedi_fw.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c > index 84a4204a2cb4..2eed2c6cf424 100644 > --- a/drivers/scsi/qedi/qedi_fw.c > +++ b/drivers/scsi/qedi/qedi_fw.c > @@ -813,10 +813,11 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi, > > check_cleanup_reqs: > if (qedi_conn->cmd_cleanup_req > 0) { > - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, > - "Freeing tid=0x%x for cid=0x%x\n", > - cqe->itid, qedi_conn->iscsi_conn_id); > - qedi_conn->cmd_cleanup_cmpl++; > + ++qedi_conn->cmd_cleanup_cmpl; > + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, > + "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n", > + cqe->itid, qedi_conn->iscsi_conn_id, > + qedi_conn->cmd_cleanup_cmpl); Is the issue that cmd_cleanup_cmpl's increment is not seen by qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up, and your patch fixes this by doing a pre increment? Does doing a pre increment give you barrier like behavior and is that why this works? I thought if wake_up ends up waking up the other thread it does a barrier already, so it's not clear to me how changing to a pre-increment helps. Is doing a pre-increment a common way to handle this? It looks like we do a post increment and wake_up* in other places. However, like in the scsi layer we do wake_up_process and memory-barriers.txt says that always does a general barrier, so is that why we can do a post increment there? Does pre-increment give you barrier like behavior, and is the wake_up call not waking up the process so we didn't get a barrier from that, and so that's why this works? > wake_up(&qedi_conn->wait_queue); > } else { > QEDI_ERR(&qedi->dbg_ctx, >
> > > > check_cleanup_reqs: > > if (qedi_conn->cmd_cleanup_req > 0) { > > - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, > > - "Freeing tid=0x%x for cid=0x%x\n", > > - cqe->itid, qedi_conn->iscsi_conn_id); > > - qedi_conn->cmd_cleanup_cmpl++; > > + ++qedi_conn->cmd_cleanup_cmpl; > > + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, > > + "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n", > > + cqe->itid, qedi_conn->iscsi_conn_id, > > + qedi_conn->cmd_cleanup_cmpl); > > Is the issue that cmd_cleanup_cmpl's increment is not seen by > qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up, > and your patch fixes this by doing a pre increment? > Yes, cmd_cleanup_cmpl's increment is not seen by qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up, even after firmware post all the ISCSI_CQE_TYPE_TASK_CLEANUP events for requested cmd_cleanup_req. Yes, pre increment did addressed this issue. Do you feel otherwise ? > Does doing a pre increment give you barrier like behavior and is that why this > works? I thought if wake_up ends up waking up the other thread it does a barrier > already, so it's not clear to me how changing to a pre-increment helps. > > Is doing a pre-increment a common way to handle this? It looks like we do a > post increment and wake_up* in other places. However, like in the scsi layer we > do wake_up_process and memory-barriers.txt says that always does a general > barrier, so is that why we can do a post increment there? > > Does pre-increment give you barrier like behavior, and is the wake_up call not > waking up the process so we didn't get a barrier from that, and so that's why this > works? > Issue happen before calling wake_up. When we gets a ISCSI_CQE_TYPE_TASK_CLEANUP surge on multiple Rx threads, cmd_cleanup_cmpl tend to miss the increment. The scenario is more similar to multiple threads access cmd_cleanup_cmpl causing race during postfix increment. This could be because of thread reading the same value at a time. Now that I am explaining it, it felt instead of pre-incrementing cmd_cleanup_cmpl, it should be atomic variable. Do see any issue ? From logs, ------------------------------------------------------- [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_iscsi_cleanup_task:2160" conn_err.log | wc -l 99 [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_cleanup_all_io:1215" conn_err.log | wc -l 99 [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_fp_process_cqes:925" conn_err.log | wc -l 99 [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_fp_process_cqes:922" conn_err.log | wc -l 99 [Thu Oct 21 22:03:32 2021] [0000:a5:00.5]:[qedi_cleanup_all_io:1246]:18: i/o cmd_cleanup_req=99, not equal to cmd_cleanup_cmpl=97, cid=0x0 <<< [Thu Oct 21 22:03:38 2021] [0000:a5:00.5]:[qedi_clearsq:1299]:18: fatal error, need hard reset, cid=0x0 -----------------------------------------------------
On 11/24/21 12:05 AM, Manish Rangankar wrote: >>> >>> check_cleanup_reqs: >>> if (qedi_conn->cmd_cleanup_req > 0) { >>> - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, >>> - "Freeing tid=0x%x for cid=0x%x\n", >>> - cqe->itid, qedi_conn->iscsi_conn_id); >>> - qedi_conn->cmd_cleanup_cmpl++; >>> + ++qedi_conn->cmd_cleanup_cmpl; >>> + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, >>> + "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n", >>> + cqe->itid, qedi_conn->iscsi_conn_id, >>> + qedi_conn->cmd_cleanup_cmpl); >> >> Is the issue that cmd_cleanup_cmpl's increment is not seen by >> qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up, >> and your patch fixes this by doing a pre increment? >> > > Yes, cmd_cleanup_cmpl's increment is not seen by qedi_cleanup_all_io's > wait_event_interruptible_timeout call when it wakes up, even after firmware > post all the ISCSI_CQE_TYPE_TASK_CLEANUP events for requested cmd_cleanup_req. > Yes, pre increment did addressed this issue. Do you feel otherwise ? > >> Does doing a pre increment give you barrier like behavior and is that why this >> works? I thought if wake_up ends up waking up the other thread it does a barrier >> already, so it's not clear to me how changing to a pre-increment helps. >> >> Is doing a pre-increment a common way to handle this? It looks like we do a >> post increment and wake_up* in other places. However, like in the scsi layer we >> do wake_up_process and memory-barriers.txt says that always does a general >> barrier, so is that why we can do a post increment there? >> >> Does pre-increment give you barrier like behavior, and is the wake_up call not >> waking up the process so we didn't get a barrier from that, and so that's why this >> works? >> > > Issue happen before calling wake_up. When we gets a ISCSI_CQE_TYPE_TASK_CLEANUP surge on > multiple Rx threads, cmd_cleanup_cmpl tend to miss the increment. The scenario is more similar to > multiple threads access cmd_cleanup_cmpl causing race during postfix increment. This could be because of > thread reading the same value at a time. > > Now that I am explaining it, it felt instead of pre-incrementing cmd_cleanup_cmpl, > it should be atomic variable. Do see any issue ? > Yeah, atomic. And then I guess for this: if (qedi_conn->cmd_cleanup_req > 0) { QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, "Freeing tid=0x%x for cid=0x%x\n", cqe->itid, qedi_conn->iscsi_conn_id); qedi_conn->cmd_cleanup_cmpl++; wake_up(&qedi_conn->wait_queue); we might only want to do the wake_up once: if (atomic_inc_return(&qedi_conn->cmd_cleanup_cmpl) == qedi_conn->cmd_cleanup_req) { ?
> > > > Now that I am explaining it, it felt instead of pre-incrementing > > cmd_cleanup_cmpl, it should be atomic variable. Do see any issue ? > > > > Yeah, atomic. > > And then I guess for this: > > if (qedi_conn->cmd_cleanup_req > 0) { > QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, > "Freeing tid=0x%x for cid=0x%x\n", > cqe->itid, qedi_conn->iscsi_conn_id); > qedi_conn->cmd_cleanup_cmpl++; > wake_up(&qedi_conn->wait_queue); > > > we might only want to do the wake_up once: > > if (atomic_inc_return(&qedi_conn->cmd_cleanup_cmpl) == > qedi_conn->cmd_cleanup_req) { > > ? Agree, I will share the updated patch.
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index 84a4204a2cb4..2eed2c6cf424 100644 --- a/drivers/scsi/qedi/qedi_fw.c +++ b/drivers/scsi/qedi/qedi_fw.c @@ -813,10 +813,11 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi, check_cleanup_reqs: if (qedi_conn->cmd_cleanup_req > 0) { - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, - "Freeing tid=0x%x for cid=0x%x\n", - cqe->itid, qedi_conn->iscsi_conn_id); - qedi_conn->cmd_cleanup_cmpl++; + ++qedi_conn->cmd_cleanup_cmpl; + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, + "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n", + cqe->itid, qedi_conn->iscsi_conn_id, + qedi_conn->cmd_cleanup_cmpl); wake_up(&qedi_conn->wait_queue); } else { QEDI_ERR(&qedi->dbg_ctx,
When issued LUN reset under heavy i/o, we hit the qedi WARN_ON because of a mismatch in firmware i/o cmd cleanup request count and i/o cmd cleanup response count received. The mismatch is because of the race caused by the postfix increment of cmd_cleanup_cmpl. [qedi_clearsq:1295]:18: fatal error, need hard reset, cid=0x0 WARNING: CPU: 48 PID: 110963 at drivers/scsi/qedi/qedi_fw.c:1296 qedi_clearsq+0xa5/0xd0 [qedi] CPU: 48 PID: 110963 Comm: kworker/u130:0 Kdump: loaded Tainted: G W Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 04/15/2020 Workqueue: iscsi_conn_cleanup iscsi_cleanup_conn_work_fn [scsi_transport_iscsi] RIP: 0010:qedi_clearsq+0xa5/0xd0 [qedi] RSP: 0018:ffffac2162c7fd98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff975213c40ab8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9761bf816858 RDI: ffff9761bf816858 RBP: ffff975247018628 R08: 000000000000522c R09: 000000000000005b R10: 0000000000000000 R11: ffffac2162c7fbd8 R12: ffff97522e1b2be8 R13: 0000000000000000 R14: ffff97522e1b2800 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff9761bf800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1a34e3e1a0 CR3: 0000000108bb2000 CR4: 0000000000350ee0 Call Trace: qedi_ep_disconnect+0x533/0x550 [qedi] ? iscsi_dbg_trace+0x63/0x80 [scsi_transport_iscsi] ? _cond_resched+0x15/0x30 ? iscsi_suspend_queue+0x19/0x40 [libiscsi] iscsi_ep_disconnect+0xb0/0x130 [scsi_transport_iscsi] iscsi_cleanup_conn_work_fn+0x82/0x130 [scsi_transport_iscsi] process_one_work+0x1a7/0x360 ? create_worker+0x1a0/0x1a0 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40 ---[ end trace 5f1441f59082235c ]--- Signed-off-by: Manish Rangankar <mrangankar@marvell.com> --- drivers/scsi/qedi/qedi_fw.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)