diff mbox series

[v2] lpfc: Handle gracefully failed FLOGI attempts in devloss callback

Message ID 20230111113841.108588-1-dwagner@suse.de (mailing list archive)
State Not Applicable
Headers show
Series [v2] lpfc: Handle gracefully failed FLOGI attempts in devloss callback | expand

Commit Message

Daniel Wagner Jan. 11, 2023, 11:38 a.m. UTC
When FLOGI attempts fail, the vport can be released via
lpfc_nlp_release() function. This function will set the pointer to NULL
and the node state to NLP_STE_FREED_NODE. Though it wont stop the
devloss timer in the upper SCSI layer.

Hence when the devloss timer eventually fires,
lpfc_dev_loss_tmo_callbk() is called and it tries to operate on vport
NULL pointer.

Just do nothing in this case. To be extra cautions also check for the
state and issue a warning if we have an inconsistency.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
changes:
v2:
  - this time with code (/me fights with evil-mode)

v1:
  - initial version


lpfc 0000:65:00.1: 94: [20252.520693] 7:0357 ELS CQE error: status=x3: CQE: 116b0300 00000000 31420002 90010000
lpfc 0000:65:00.1: 95: [20252.520707] 7:0321 Rsp Ring 2 error: IOCB Data: x116b0300 x0 x31420002 x90010000
lpfc 0000:65:00.1: 7:(0):2858 FLOGI failure Status:x3/x31420002 TMO:x14 Data x11140820 x0
 rport-18:0-1: blocked FC remote port time out: removing rport
**** lpfc_rport_invalid: Null vport on ndlp xffff88828bd82e00, DID xfffffe rport xffff8884f936e000 SID xffffffff
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 14 PID: 86204 Comm: kworker/14:0 Tainted: G           OE  X    5.14.21-150400.24.18-default #1 SLE15-SP4 695ab7a8fc20f5ddb345280570966cd1eb06d469
Hardware name: XXXX
Workqueue: fc_wq_18 fc_rport_final_delete [scsi_transport_fc]
RIP: e030:lpfc_dev_loss_tmo_callbk+0x50/0x4d0 [lpfc]
Code: 00 00 00 0f b7 8b ac 00 00 00 48 c7 c2 e0 d1 c6 c0 44 8b 83 98 00 00 00 44 8b 8b 94 00 00 00 48 89 fd be 80 00 00 00 4c 89 e7 <4d> 8b 2c 24 e8 37 9e 04 00 4c 8b 83 f8 00 00 00 41 8b 90 e0 02 00
RSP: e02b:ffffc9004d853e38 EFLAGS: 00010286
RAX: ffff8884f936e510 RBX: ffff88828bd82e00 RCX: 000000000000ffff
RDX: ffffffffc0c6d1e0 RSI: 0000000000000080 RDI: 0000000000000000
RBP: ffff8884f936e000 R08: 0000000000fffffe R09: 0000000000000000
R10: ffffc900401fbd98 R11: ffffc9004d853c80 R12: 0000000000000000
R13: ffff8884f936e000 R14: ffff88810b705000 R15: ffff888126973080
FS:  0000000000000000(0000) GS:ffff88888e980000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000103f50000 CR4: 0000000000050660
Call Trace:
 <TASK>
 fc_rport_final_delete+0xec/0x1c0 [scsi_transport_fc e9142b03c2f4a15da538eb15a15c5b37fc11a87f]
 process_one_work+0x264/0x440
 ? process_one_work+0x440/0x440
 worker_thread+0x2d/0x3d0
 ? process_one_work+0x440/0x440
 kthread+0x154/0x180
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x1f/0x30
 </TASK>



 drivers/scsi/lpfc/lpfc_hbadisc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Daniel Wagner Jan. 11, 2023, 2:08 p.m. UTC | #1
On Wed, Jan 11, 2023 at 12:38:41PM +0100, Daniel Wagner wrote:
> When FLOGI attempts fail, the vport can be released via
> lpfc_nlp_release() function. This function will set the pointer to NULL
> and the node state to NLP_STE_FREED_NODE. Though it wont stop the
> devloss timer in the upper SCSI layer.
> 
> Hence when the devloss timer eventually fires,
> lpfc_dev_loss_tmo_callbk() is called and it tries to operate on vport
> NULL pointer.
> 
> Just do nothing in this case. To be extra cautions also check for the
> state and issue a warning if we have an inconsistency.

Ignore this one. Just saw the proper fix:

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?id=97f256913c5d
diff mbox series

Patch

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 83d2b29ee2a6..e7dd5f90d6c4 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -160,6 +160,19 @@  lpfc_dev_loss_tmo_callbk(struct fc_rport *rport)
 	if (!ndlp)
 		return;
 
+	if (!ndlp->vport) {
+		/*
+		 * dev loss timer from the scsi layer might time out after
+		 * failed FLOGI attempts. In this case the node will alreay be
+		 * freed via lpfc_nlp_release(), which frees the vport pointer
+		 * and sets the state to NLP_STE_FREED_NODE.
+		 */
+		WARN_ONCE(ndlp->nlp_state != NLP_STE_FREED_NODE,
+			  "**** %s, vport NULL but ntp_state is not in freed state",
+			  __func__);
+		return;
+	}
+
 	vport = ndlp->vport;
 	phba  = vport->phba;