diff mbox series

[2/2] qla2xxx: Fix NULL pointer crash due to stale CPUID

Message ID 20190315220419.14010-3-hmadhani@marvell.com (mailing list archive)
State Mainlined
Commit ac444b4f0ace05d7c4c99f6b1e5b0cae0852f025
Headers show
Series qla2xxx: fixes for qla2xxx | expand

Commit Message

Himanshu Madhani March 15, 2019, 10:04 p.m. UTC
This patch fixes crash due to NULL pointer derefrence because
CPU pointer is not set and used by driver.  Instead, driver is
passes CPU as tag via ha->isp_ops->{lun_reset|target_reset}

[   30.160780] qla2xxx [0000:a0:00.1]-8038:9: Cable is unplugged...
[   69.984045] qla2xxx [0000:a0:00.0]-8009:8: DEVICE RESET ISSUED nexus=8:0:0 cmd=00000000b0d62f46.
[   69.992849] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[   70.000680] PGD 0 P4D 0
[   70.003232] Oops: 0000 [#1] SMP PTI
[   70.006727] CPU: 2 PID: 6714 Comm: sg_reset Kdump: loaded Not tainted 4.18.0-67.el8.x86_64 #1
[   70.015258] Hardware name: NEC Express5800/T110j [N8100-2758Y]/MX32-PH0-NJ, BIOS F11 02/13/2019
[   70.024016] RIP: 0010:blk_mq_rq_cpu+0x9/0x10
[   70.028315] Code: 01 58 01 00 00 48 83 c0 28 48 3d 80 02 00 00 75 ab c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48
 8b 47 08 <8b> 40 40 c3 0f 1f 00 0f 1f 44 00 00 48 83 ec 10 48 c7 c6 20 6e 7c
[   70.047087] RSP: 0018:ffff99a481487d58 EFLAGS: 00010246
[   70.052322] RAX: 0000000000000000 RBX: ffffffffc041b08b RCX: 0000000000000000
[   70.059466] RDX: 0000000000000000 RSI: ffff8d10b6b16898 RDI: ffff8d10b341e400
[   70.066615] RBP: ffffffffc03a6bd0 R08: 0000000000000415 R09: 0000000000aaaaaa
[   70.073765] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d10b341e528
[   70.080914] R13: ffff8d10aadefc00 R14: ffff8d0f64efa998 R15: ffff8d0f64efa000
[   70.088083] FS:  00007f90a201e540(0000) GS:ffff8d10b6b00000(0000) knlGS:0000000000000000
[   70.096188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.101959] CR2: 0000000000000040 CR3: 0000000268886005 CR4: 00000000003606e0
[   70.109127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   70.116277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   70.123425] Call Trace:
[   70.125896]  __qla2xxx_eh_generic_reset+0xb1/0x220 [qla2xxx]
[   70.131572]  scsi_ioctl_reset+0x1f5/0x2a0
[   70.135600]  scsi_ioctl+0x18e/0x397
[   70.139099]  ? sd_ioctl+0x7c/0x100 [sd_mod]
[   70.143287]  blkdev_ioctl+0x32b/0x9f0
[   70.146954]  ? __check_object_size+0xa3/0x181
[   70.151323]  block_ioctl+0x39/0x40
[   70.154735]  do_vfs_ioctl+0xa4/0x630
[   70.158322]  ? syscall_trace_enter+0x1d3/0x2c0
[   70.162769]  ksys_ioctl+0x60/0x90
[   70.166104]  __x64_sys_ioctl+0x16/0x20
[   70.169859]  do_syscall_64+0x5b/0x1b0
[   70.173532]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[   70.178587] RIP: 0033:0x7f90a1b3445b
[   70.182183] Code: 0f 1e fa 48 8b 05 2d aa 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd a9 2c 00 f7 d8 64 89 01 48
[   70.200956] RSP: 002b:00007fffdca88b68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   70.208535] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f90a1b3445b
[   70.215684] RDX: 00007fffdca88b84 RSI: 0000000000002284 RDI: 0000000000000003
[   70.222833] RBP: 00007fffdca88ca8 R08: 00007fffdca88b84 R09: 0000000000000000
[   70.229981] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffdca88b84
[   70.237131] R13: 0000000000000000 R14: 000055ab09b0bd28 R15: 0000000000000000
[   70.244284] Modules linked in: nft_chain_route_ipv4 xt_CHECKSUM nft_chain_nat_ipv4 ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack_ipv4
 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 nft_counter nft_compat tun bridge stp llc nf_tables nfnetli
nk devlink sunrpc vfat fat intel_rapl intel_pmc_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm wmi_bmof iTCO_wdt iTCO_
vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif intel_cstate intel_uncore intel_rapl_perf ipmi_si jo
ydev pcspkr ipmi_devintf sg wmi ipmi_msghandler video acpi_power_meter acpi_pad mei_me i2c_i801 mei ip_tables ext4 mbcache jbd2 sr_mod cd
rom sd_mod qla2xxx ast i2c_algo_bit drm_kms_helper nvme_fc syscopyarea sysfillrect uas sysimgblt fb_sys_fops nvme_fabrics ttm
[   70.314805]  usb_storage nvme_core crc32c_intel scsi_transport_fc ahci drm libahci tg3 libata megaraid_sas pinctrl_cannonlake pinctrl_
intel
[   70.327335] CR2: 0000000000000040

Fixes: 9cf2bab630765 ("block: kill request ->cpu member")
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
---
 drivers/scsi/qla2xxx/qla_os.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ewan Milne March 18, 2019, 1:36 p.m. UTC | #1
On Fri, 2019-03-15 at 15:04 -0700, Himanshu Madhani wrote:
> This patch fixes crash due to NULL pointer derefrence because
> CPU pointer is not set and used by driver.  Instead, driver is
> passes CPU as tag via ha->isp_ops->{lun_reset|target_reset}
> 
...
> 
> Fixes: 9cf2bab630765 ("block: kill request ->cpu member")
> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 677f82fdf56f..91f576d743fe 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -1517,7 +1517,7 @@ __qla2xxx_eh_generic_reset(char *name, enum nexus_wait_type type,
>  		goto eh_reset_failed;
>  	}
>  	err = 2;
> -	if (do_reset(fcport, cmd->device->lun, blk_mq_rq_cpu(cmd->request) + 1)
> +	if (do_reset(fcport, cmd->device->lun, 1)
>  		!= QLA_SUCCESS) {
>  		ql_log(ql_log_warn, vha, 0x800c,
>  		    "do_reset failed for cmd=%p.\n", cmd);

Hi Himanshu-

The 3rd parameter to do_reset() may end up being passed to the HBA in the TM
IOCB in qla2x00_async_tm_cmd().  Can you explain why the CPU number was used
previously and why passing a constant here is sufficient?  Was this not needed
in the original implementation or is this a functional change?

-Ewan
Himanshu Madhani March 18, 2019, 5:12 p.m. UTC | #2
Hi Ewan, 


On 3/18/19, 6:36 AM, "linux-scsi-owner@vger.kernel.org on behalf of Ewan D. Milne" <linux-scsi-owner@vger.kernel.org on behalf of emilne@redhat.com> wrote:

    On Fri, 2019-03-15 at 15:04 -0700, Himanshu Madhani wrote:
    > This patch fixes crash due to NULL pointer derefrence because
    > CPU pointer is not set and used by driver.  Instead, driver is
    > passes CPU as tag via ha->isp_ops->{lun_reset|target_reset}
    > 
    ...
    > 
    > Fixes: 9cf2bab630765 ("block: kill request ->cpu member")
    > Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
    > ---
    >  drivers/scsi/qla2xxx/qla_os.c | 2 +-
    >  1 file changed, 1 insertion(+), 1 deletion(-)
    > 
    > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
    > index 677f82fdf56f..91f576d743fe 100644
    > --- a/drivers/scsi/qla2xxx/qla_os.c
    > +++ b/drivers/scsi/qla2xxx/qla_os.c
    > @@ -1517,7 +1517,7 @@ __qla2xxx_eh_generic_reset(char *name, enum nexus_wait_type type,
    >  		goto eh_reset_failed;
    >  	}
    >  	err = 2;
    > -	if (do_reset(fcport, cmd->device->lun, blk_mq_rq_cpu(cmd->request) + 1)
    > +	if (do_reset(fcport, cmd->device->lun, 1)
    >  		!= QLA_SUCCESS) {
    >  		ql_log(ql_log_warn, vha, 0x800c,
    >  		    "do_reset failed for cmd=%p.\n", cmd);
    
    Hi Himanshu-
    
    The 3rd parameter to do_reset() may end up being passed to the HBA in the TM
    IOCB in qla2x00_async_tm_cmd().  Can you explain why the CPU number was used
    previously and why passing a constant here is sufficient?  Was this not needed
    in the original implementation or is this a functional change?

    -Ewan

In earlier BLK-MQ implementation 3rd field must have some meaning and so driver must be setting that value. 
(I could not find history on old implementation). However, with latest updates  to BLK-MQ, 3rd parameter is
passed as tag in the driver and it's not being used in driver. 

In qla2x00_async_tm_cmd(),  3rd parameter is passed as tag to and its value is saved in tm_iocb->u.tmf.data  but
not used in the driver anywhere else, so passing value of 1 is safe here since return value from qla2x00_start_sp()
will override this value. 

Thanks,
Himanshu
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 677f82fdf56f..91f576d743fe 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1517,7 +1517,7 @@  __qla2xxx_eh_generic_reset(char *name, enum nexus_wait_type type,
 		goto eh_reset_failed;
 	}
 	err = 2;
-	if (do_reset(fcport, cmd->device->lun, blk_mq_rq_cpu(cmd->request) + 1)
+	if (do_reset(fcport, cmd->device->lun, 1)
 		!= QLA_SUCCESS) {
 		ql_log(ql_log_warn, vha, 0x800c,
 		    "do_reset failed for cmd=%p.\n", cmd);