diff mbox

[1/1] qla2xxx: Initialize Work element before requesting IRQs

Message ID 20171016182605.22174-1-himanshu.madhani@cavium.com (mailing list archive)
State Accepted
Headers show

Commit Message

Madhani, Himanshu Oct. 16, 2017, 6:26 p.m. UTC
From: Himanshu Madhani <himanshu.madhani@cavium.com>

commit a9e170e28636 ("scsi: qla2xxx: Fix uninitialized work element")
moved initialziation of work element earlier in the probe to fix call
stack. However, it still leaves a window where interrupt can be generated
before work element is initialized. Fix that window by Initializing work
element before we are requesting IRQs.

Fixes: a9e170e28636 ("scsi: qla2xxx: Fix uninitialized work element")
Cc: <stable@vger.kernel.org> # 4.13
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
---
Hi Martin,

Please apply this patch to 4.14.0-rc6. This patch fixes a small window where 
user will see call stack with qla2xxx driver. 

Thanks,
Himanshu

 drivers/scsi/qla2xxx/qla_os.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Martin K. Petersen Oct. 17, 2017, 3:13 a.m. UTC | #1
Himanshu,

> commit a9e170e28636 ("scsi: qla2xxx: Fix uninitialized work element")
> moved initialziation of work element earlier in the probe to fix call
> stack. However, it still leaves a window where interrupt can be
> generated before work element is initialized. Fix that window by
> Initializing work element before we are requesting IRQs.

Applied to 4.14/scsi-fixes. Thank you!
Bart Van Assche Oct. 18, 2017, 4:12 a.m. UTC | #2
On Mon, 2017-10-16 at 11:26 -0700, Madhani, Himanshu wrote:
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c

> index 937209805baf..3bd956d3bc5d 100644

> --- a/drivers/scsi/qla2xxx/qla_os.c

> +++ b/drivers/scsi/qla2xxx/qla_os.c

> @@ -3061,6 +3061,8 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)

>  	    host->max_cmd_len, host->max_channel, host->max_lun,

>  	    host->transportt, sht->vendor_id);

>  

> +	INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);

> +

>  	/* Set up the irqs */

>  	ret = qla2x00_request_irqs(ha, rsp);

>  	if (ret)

> @@ -3175,8 +3177,6 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)

>  	    host->can_queue, base_vha->req,

>  	    base_vha->mgmt_svr_loop_id, host->sg_tablesize);

>  

> -	INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);

> -

>  	if (ha->mqenable) {

>  		bool mq = false;

>  		bool startit = false;


Hello Himanshu,

That patch indeed fixes the bug described in the patch description when
applied on top of kernel v4.13.7. However, with that patch applied I ran
into another bug. Can you have a look?

BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
IP: qlt_free_session_done+0x172/0x570 [qla2xxx]
PGD 0 
P4D 0 
Oops: 0000 [#1] SMP
CPU: 0 PID: 47 Comm: kworker/0:1 Not tainted 4.13.7+ #1
Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 08/03/2015
Workqueue: events qlt_free_session_done [qla2xxx]
task: ffff9c4bcee94300 task.stack: ffffba99c01d4000
RIP: 0010:qlt_free_session_done+0x172/0x570 [qla2xxx]
RSP: 0018:ffffba99c01d7dc8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff9c4bb5c95720 RCX: ffffffffc09174c8
RDX: 0000000000000001 RSI: ffff9c4bb5c95720 RDI: ffff9c4baf7e5ce4
RBP: ffffba99c01d7e50 R08: ffffffffc0903620 R09: ffff9c4bc840e400
R10: ffffba99c01d7db0 R11: 0000000000000000 R12: ffff9c4bc840e400
R13: 0000000000000000 R14: ffff9c4baf7e5000 R15: ffff9c4bc840e4c0
FS:  0000000000000000(0000) GS:ffff9c4bdfa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000190 CR3: 000000040c624000 CR4: 00000000001426f0
Call Trace:
 ? qlt_unreg_sess+0xfe/0x110 [qla2xxx]
 ? qla24xx_delete_sess_fn+0x69/0x80 [qla2xxx]
 process_one_work+0x1d6/0x3d0
 worker_thread+0x42/0x3e0
 kthread+0x11f/0x140
 ? trace_event_raw_event_workqueue_execute_start+0x90/0x90
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x22/0x30
Code: 00 00 00 00 41 c7 87 ac 00 00 00 07 00 00 00 83 e0 f9 83 c8 04 41 f6 87 71 ff ff ff 02 41 88 87 70 ff ff ff 48 8b 83 40 04 00 00 <8b> 80 90 01 00 00 41 89 47 4c 74 22 41 8b 87 68 ff ff ff 25 00 
RIP: qlt_free_session_done+0x172/0x570 [qla2xxx] RSP: ffffba99c01d7dc8
CR2: 0000000000000190
---[ end trace 89dee74f51a05258 ]---

(gdb) list *(qlt_free_session_done+0x172)
0x661c2 is in qlt_free_session_done (drivers/scsi/qla2xxx/qla_target.c:1027).
1022            }
1023
1024            sess->disc_state = DSC_DELETED;
1025            sess->fw_login_state = DSC_LS_PORT_UNAVAIL;
1026            sess->deleted = QLA_SESS_DELETED;
1027            sess->login_retry = vha->hw->login_retry_count;
1028
1029            if (sess->login_succ && !IS_SW_RESV_ADDR(sess->d_id)) {
1030                    vha->fcport_count--;
1031                    sess->login_succ = 0;
Madhani, Himanshu Oct. 26, 2017, 3:45 a.m. UTC | #3
Hello Bart, 

> On Oct 17, 2017, at 9:12 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> 
> On Mon, 2017-10-16 at 11:26 -0700, Madhani, Himanshu wrote:
>> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
>> index 937209805baf..3bd956d3bc5d 100644
>> --- a/drivers/scsi/qla2xxx/qla_os.c
>> +++ b/drivers/scsi/qla2xxx/qla_os.c
>> @@ -3061,6 +3061,8 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>> 	    host->max_cmd_len, host->max_channel, host->max_lun,
>> 	    host->transportt, sht->vendor_id);
>> 
>> +	INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);
>> +
>> 	/* Set up the irqs */
>> 	ret = qla2x00_request_irqs(ha, rsp);
>> 	if (ret)
>> @@ -3175,8 +3177,6 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>> 	    host->can_queue, base_vha->req,
>> 	    base_vha->mgmt_svr_loop_id, host->sg_tablesize);
>> 
>> -	INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);
>> -
>> 	if (ha->mqenable) {
>> 		bool mq = false;
>> 		bool startit = false;
> 
> Hello Himanshu,
> 
> That patch indeed fixes the bug described in the patch description when
> applied on top of kernel v4.13.7. However, with that patch applied I ran
> into another bug. Can you have a look?
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
> IP: qlt_free_session_done+0x172/0x570 [qla2xxx]
> PGD 0 
> P4D 0 
> Oops: 0000 [#1] SMP
> CPU: 0 PID: 47 Comm: kworker/0:1 Not tainted 4.13.7+ #1
> Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 08/03/2015
> Workqueue: events qlt_free_session_done [qla2xxx]
> task: ffff9c4bcee94300 task.stack: ffffba99c01d4000
> RIP: 0010:qlt_free_session_done+0x172/0x570 [qla2xxx]
> RSP: 0018:ffffba99c01d7dc8 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff9c4bb5c95720 RCX: ffffffffc09174c8
> RDX: 0000000000000001 RSI: ffff9c4bb5c95720 RDI: ffff9c4baf7e5ce4
> RBP: ffffba99c01d7e50 R08: ffffffffc0903620 R09: ffff9c4bc840e400
> R10: ffffba99c01d7db0 R11: 0000000000000000 R12: ffff9c4bc840e400
> R13: 0000000000000000 R14: ffff9c4baf7e5000 R15: ffff9c4bc840e4c0
> FS:  0000000000000000(0000) GS:ffff9c4bdfa00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000190 CR3: 000000040c624000 CR4: 00000000001426f0
> Call Trace:
> ? qlt_unreg_sess+0xfe/0x110 [qla2xxx]
> ? qla24xx_delete_sess_fn+0x69/0x80 [qla2xxx]
> process_one_work+0x1d6/0x3d0
> worker_thread+0x42/0x3e0
> kthread+0x11f/0x140
> ? trace_event_raw_event_workqueue_execute_start+0x90/0x90
> ? kthread_create_on_node+0x40/0x40
> ret_from_fork+0x22/0x30
> Code: 00 00 00 00 41 c7 87 ac 00 00 00 07 00 00 00 83 e0 f9 83 c8 04 41 f6 87 71 ff ff ff 02 41 88 87 70 ff ff ff 48 8b 83 40 04 00 00 <8b> 80 90 01 00 00 41 89 47 4c 74 22 41 8b 87 68 ff ff ff 25 00 
> RIP: qlt_free_session_done+0x172/0x570 [qla2xxx] RSP: ffffba99c01d7dc8
> CR2: 0000000000000190
> ---[ end trace 89dee74f51a05258 ]---
> 
> (gdb) list *(qlt_free_session_done+0x172)
> 0x661c2 is in qlt_free_session_done (drivers/scsi/qla2xxx/qla_target.c:1027).
> 1022            }
> 1023
> 1024            sess->disc_state = DSC_DELETED;
> 1025            sess->fw_login_state = DSC_LS_PORT_UNAVAIL;
> 1026            sess->deleted = QLA_SESS_DELETED;
> 1027            sess->login_retry = vha->hw->login_retry_count;
> 1028
> 1029            if (sess->login_succ && !IS_SW_RESV_ADDR(sess->d_id)) {
> 1030                    vha->fcport_count--;
> 1031                    sess->login_succ = 0;

I will take a look and see if i can reproduce this issue.

Thanks,
- Himanshu
diff mbox

Patch

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 937209805baf..3bd956d3bc5d 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3061,6 +3061,8 @@  qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	    host->max_cmd_len, host->max_channel, host->max_lun,
 	    host->transportt, sht->vendor_id);
 
+	INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);
+
 	/* Set up the irqs */
 	ret = qla2x00_request_irqs(ha, rsp);
 	if (ret)
@@ -3175,8 +3177,6 @@  qla2x00_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	    host->can_queue, base_vha->req,
 	    base_vha->mgmt_svr_loop_id, host->sg_tablesize);
 
-	INIT_WORK(&base_vha->iocb_work, qla2x00_iocb_work_fn);
-
 	if (ha->mqenable) {
 		bool mq = false;
 		bool startit = false;