Message ID | 20180207200724.GD695913@devbig577.frc2.facebook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2018-02-07 at 12:07 -0800, tj@kernel.org wrote: > Ah, you're right. u64_stat_sync doesn't imply barriers, so we want > something like the following. > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index df93102..d6edf3b 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -593,7 +593,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate) > */ > local_irq_save(flags); > u64_stats_update_begin(&rq->aborted_gstate_sync); > - rq->aborted_gstate = gstate; > + smp_store_release(&rq->aborted_gstate, gstate); > u64_stats_update_end(&rq->aborted_gstate_sync); > local_irq_restore(flags); > } > @@ -605,7 +605,7 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq) > > do { > start = u64_stats_fetch_begin(&rq->aborted_gstate_sync); > - aborted_gstate = rq->aborted_gstate; > + aborted_gstate = smp_load_acquire(&rq->aborted_gstate); > } while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start)); > > return aborted_gstate; > @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) > * ->aborted_gstate is set, this may lead to ignored > * completions and further spurious timeouts. > */ > - blk_mq_rq_update_aborted_gstate(req, 0); > blk_add_timer(req); > + blk_mq_rq_update_aborted_gstate(req, 0); > break; > case BLK_EH_NOT_HANDLED: > break; Hello Tejun, With this patch applied I see requests for which it seems like the timeout handler did not get invoked: sdc/hctx0/busy:0000000095e04b7c {.op=WRITE, .cmd_flags=FAILFAST_TRANSPORT|SYNC|N OMERGE|IDLE, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT|MQ_TIMEOUT_EXPIRED, .state=i n_flight, .gstate=0xed/0xed, .tag=26, .internal_tag=-1, .cmd=Write(10) 2a 00 00 00 60 ba 00 00 08 00, .retries=0, .result = 0x50000, .flags=TAGGED|INITIALIZED, .timeout=1.000, allocated 1093.180 s ago} sdc/hctx0/busy:0000000065a64e9b {.op=WRITE, .cmd_flags=FAILFAST_TRANSPORT|SYNC|N OMERGE|IDLE, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT|MQ_TIMEOUT_EXPIRED, .state=i n_flight, .gstate=0x5/0x5, .tag=27, .internal_tag=-1, .cmd=Write(10) 2a 00 00 00 62 d2 00 00 08 00, .retries=0, .result = 0x50000, .flags=TAGGED|INITIALIZED, .t imeout=1.000, allocated 1093.180 s ago} [ ... ] sdc/hctx3/busy:00000000479cc2a9 {.op=WRITE, .cmd_flags=FAILFAST_TRANSPORT|SYNC|N OMERGE|IDLE, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT, .state=in_flight, .gstate=0 x11/0x11, .tag=57, .internal_tag=-1, .cmd=Write(10) 2a 00 00 00 61 d2 00 00 08 0 0, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED, .timeout=1.000, allocat ed 1093.150 s ago} sdc/hctx3/busy:000000008fd130d5 {.op=WRITE, .cmd_flags=FAILFAST_TRANSPORT|SYNC|N OMERGE|IDLE, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT, .state=in_flight, .gstate=0 xd/0xd, .tag=61, .internal_tag=-1, .cmd=Write(10) 2a 00 00 00 c3 94 00 00 08 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED, .timeout=1.000, allocated 1093.140 s ago} As one can see for some requests MQ_TIMEOUT_EXPIRED is set and .result = 0x50000. The value of .result means that the SCSI error handler has submitted an abort (see also scmnd->result = DID_ABORT << 16 in drivers/infiniband/ulp/srp/ib_srp.c). For the last two requests shown above however MQ_TIMEOUT_EXPIRED is not set and the SCSI result has value 0. I think that means that it can happen that a request times out but that the timeout handler does not get invoked ... Thanks, Bart.
On Wed, 2018-02-07 at 23:48 +0000, Bart Van Assche wrote: > With this patch applied I see requests for which it seems like the timeout handler > did not get invoked: [ ... ] I just noticed the following in the system log, which is probably the reason why some requests got stuck: Feb 7 15:16:26 ubuntu-vm kernel: BUG: unable to handle kernel NULL pointer dereference at (null) Feb 7 15:16:26 ubuntu-vm kernel: IP: scsi_times_out+0x17/0x2c0 [scsi_mod] Feb 7 15:16:26 ubuntu-vm kernel: PGD 0 P4D 0 Feb 7 15:16:26 ubuntu-vm kernel: Oops: 0000 [#1] PREEMPT SMP Feb 7 15:16:26 ubuntu-vm kernel: Modules linked in: dm_service_time ib_srp libcrc32c crc32c_generic scsi_transport_srp target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod rdma_rxe ip6_udp_tunnel udp_tunnel ib_umad ib_uverbs scsi_debug brd mq_deadline kyber_iosched deadline_iosched cfq_iosched bfq crct10dif_pclmul crc32_pclmul af_packet ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw virtio_console virtio_balloon sg button i2c_piix4 dm_multipath dm_mod dax scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables ipv6 autofs4 ext4 crc16 mbcache jbd2 sd_mod virtio_net virtio_blk virtio_scsi sr_mod cdrom ata_generic pata_acpi psmouse crc32c_intel i2c_core atkbd Feb 7 15:16:26 ubuntu-vm kernel: uhci_hcd ehci_hcd intel_agp ata_piix intel_gtt agpgart libata virtio_pci usbcore virtio_ring virtio scsi_mod usb_common unix Feb 7 15:16:26 ubuntu-vm kernel: CPU: 0 PID: 146 Comm: kworker/0:1H Not tainted 4.15.0-dbg+ #1 Feb 7 15:16:26 ubuntu-vm kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Feb 7 15:16:26 ubuntu-vm kernel: Workqueue: kblockd blk_mq_timeout_work Feb 7 15:16:26 ubuntu-vm kernel: RIP: 0010:scsi_times_out+0x17/0x2c0 [scsi_mod] Feb 7 15:16:26 ubuntu-vm kernel: RSP: 0018:ffff98f0c02abd58 EFLAGS: 00010246 Feb 7 15:16:26 ubuntu-vm kernel: RAX: 0000000000000000 RBX: ffff965de2b3a2c0 RCX: 0000000000000000 Feb 7 15:16:26 ubuntu-vm kernel: RDX: ffffffffc0094740 RSI: 0000000000000000 RDI: ffff965de2b3a2c0 Feb 7 15:16:26 ubuntu-vm kernel: RBP: ffff965de2b3a438 R08: fffffffffffffffc R09: 0000000000000007 Feb 7 15:16:26 ubuntu-vm kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 Feb 7 15:16:26 ubuntu-vm kernel: R13: 0000000000000000 R14: ffff965de2a44218 R15: ffff965de2a48728 Feb 7 15:16:26 ubuntu-vm kernel: FS: 0000000000000000(0000) GS:ffff965dffc00000(0000) knlGS:0000000000000000 Feb 7 15:16:26 ubuntu-vm kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 7 15:16:26 ubuntu-vm kernel: CR2: 0000000000000000 CR3: 000000003ce0f003 CR4: 00000000003606f0 Feb 7 15:16:26 ubuntu-vm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 7 15:16:26 ubuntu-vm kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Feb 7 15:16:26 ubuntu-vm kernel: Call Trace: Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_terminate_expired+0x42/0x80 Feb 7 15:16:26 ubuntu-vm kernel: bt_iter+0x3d/0x50 Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_queue_tag_busy_iter+0xe9/0x200 Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_timeout_work+0x181/0x2e0 Feb 7 15:16:26 ubuntu-vm kernel: process_one_work+0x21c/0x6d0 Feb 7 15:16:26 ubuntu-vm kernel: worker_thread+0x35/0x380 Feb 7 15:16:26 ubuntu-vm kernel: kthread+0x117/0x130 Feb 7 15:16:26 ubuntu-vm kernel: ret_from_fork+0x24/0x30 Feb 7 15:16:26 ubuntu-vm kernel: Code: ff ff 0f ff e9 fd fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 8d af 78 01 00 00 53 48 8b 87 b0 01 00 00 48 89 fb <4c> 8b 28 0f 1f 44 00 00 65 8b 05 6a b5 f8 3f 83 f8 0f 0f 87 ed Feb 7 15:16:26 ubuntu-vm kernel: RIP: scsi_times_out+0x17/0x2c0 [scsi_mod] RSP: ffff98f0c02abd58 Feb 7 15:16:26 ubuntu-vm kernel: CR2: 0000000000000000 Feb 7 15:16:26 ubuntu-vm kernel: ---[ end trace ce6c20d95bab450e ]--- Bart.
On Thu, Feb 08, 2018 at 01:09:57AM +0000, Bart Van Assche wrote: > On Wed, 2018-02-07 at 23:48 +0000, Bart Van Assche wrote: > > With this patch applied I see requests for which it seems like the timeout handler > > did not get invoked: [ ... ] > > I just noticed the following in the system log, which is probably the reason why some > requests got stuck: > > Feb 7 15:16:26 ubuntu-vm kernel: BUG: unable to handle kernel NULL pointer dereference at (null) > Feb 7 15:16:26 ubuntu-vm kernel: IP: scsi_times_out+0x17/0x2c0 [scsi_mod] > Feb 7 15:16:26 ubuntu-vm kernel: PGD 0 P4D 0 > Feb 7 15:16:26 ubuntu-vm kernel: Oops: 0000 [#1] PREEMPT SMP > Feb 7 15:16:26 ubuntu-vm kernel: Modules linked in: dm_service_time ib_srp libcrc32c crc32c_generic scsi_transport_srp target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod > rdma_rxe ip6_udp_tunnel udp_tunnel ib_umad ib_uverbs scsi_debug brd mq_deadline kyber_iosched deadline_iosched cfq_iosched bfq crct10dif_pclmul crc32_pclmul af_packet ghash_clmulni_intel pcbc > aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw virtio_console virtio_balloon sg button i2c_piix4 dm_multipath dm_mod dax scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm > ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables ipv6 autofs4 ext4 crc16 mbcache jbd2 sd_mod virtio_net virtio_blk virtio_scsi sr_mod cdrom ata_generic > pata_acpi psmouse crc32c_intel i2c_core atkbd > Feb 7 15:16:26 ubuntu-vm kernel: uhci_hcd ehci_hcd intel_agp ata_piix intel_gtt agpgart libata virtio_pci usbcore virtio_ring virtio scsi_mod usb_common unix > Feb 7 15:16:26 ubuntu-vm kernel: CPU: 0 PID: 146 Comm: kworker/0:1H Not tainted 4.15.0-dbg+ #1 > Feb 7 15:16:26 ubuntu-vm kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > Feb 7 15:16:26 ubuntu-vm kernel: Workqueue: kblockd blk_mq_timeout_work > Feb 7 15:16:26 ubuntu-vm kernel: RIP: 0010:scsi_times_out+0x17/0x2c0 [scsi_mod] > Feb 7 15:16:26 ubuntu-vm kernel: RSP: 0018:ffff98f0c02abd58 EFLAGS: 00010246 > Feb 7 15:16:26 ubuntu-vm kernel: RAX: 0000000000000000 RBX: ffff965de2b3a2c0 RCX: 0000000000000000 > Feb 7 15:16:26 ubuntu-vm kernel: RDX: ffffffffc0094740 RSI: 0000000000000000 RDI: ffff965de2b3a2c0 > Feb 7 15:16:26 ubuntu-vm kernel: RBP: ffff965de2b3a438 R08: fffffffffffffffc R09: 0000000000000007 > Feb 7 15:16:26 ubuntu-vm kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 > Feb 7 15:16:26 ubuntu-vm kernel: R13: 0000000000000000 R14: ffff965de2a44218 R15: ffff965de2a48728 > Feb 7 15:16:26 ubuntu-vm kernel: FS: 0000000000000000(0000) GS:ffff965dffc00000(0000) knlGS:0000000000000000 > Feb 7 15:16:26 ubuntu-vm kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Feb 7 15:16:26 ubuntu-vm kernel: CR2: 0000000000000000 CR3: 000000003ce0f003 CR4: 00000000003606f0 > Feb 7 15:16:26 ubuntu-vm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Feb 7 15:16:26 ubuntu-vm kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Feb 7 15:16:26 ubuntu-vm kernel: Call Trace: > Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_terminate_expired+0x42/0x80 > Feb 7 15:16:26 ubuntu-vm kernel: bt_iter+0x3d/0x50 > Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_queue_tag_busy_iter+0xe9/0x200 > Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_timeout_work+0x181/0x2e0 > Feb 7 15:16:26 ubuntu-vm kernel: process_one_work+0x21c/0x6d0 > Feb 7 15:16:26 ubuntu-vm kernel: worker_thread+0x35/0x380 > Feb 7 15:16:26 ubuntu-vm kernel: kthread+0x117/0x130 > Feb 7 15:16:26 ubuntu-vm kernel: ret_from_fork+0x24/0x30 > Feb 7 15:16:26 ubuntu-vm kernel: Code: ff ff 0f ff e9 fd fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 8d af 78 01 00 00 53 48 8b 87 b0 01 00 00 48 89 fb <4c> 8b 28 0f 1f 44 00 00 65 > 8b 05 6a b5 f8 3f 83 f8 0f 0f 87 ed > Feb 7 15:16:26 ubuntu-vm kernel: RIP: scsi_times_out+0x17/0x2c0 [scsi_mod] RSP: ffff98f0c02abd58 > Feb 7 15:16:26 ubuntu-vm kernel: CR2: 0000000000000000 > Feb 7 15:16:26 ubuntu-vm kernel: ---[ end trace ce6c20d95bab450e ]--- So, that's dereferencing %rax which is NULL. That gotta be the ->host deref in the following. enum blk_eh_timer_return scsi_times_out(struct request *req) { struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; struct Scsi_Host *host = scmd->device->host; ... That sounds more like a scsi hotplug but than an issue in the timeout code unless we messed up @req pointer to begin with. Thanks.
On Thu, Feb 08, 2018 at 07:39:40AM -0800, tj@kernel.org wrote: > That sounds more like a scsi hotplug but than an issue in the timeout ^bug > code unless we messed up @req pointer to begin with.
On Thu, 2018-02-08 at 07:39 -0800, tj@kernel.org wrote: > On Thu, Feb 08, 2018 at 01:09:57AM +0000, Bart Van Assche wrote: > > On Wed, 2018-02-07 at 23:48 +0000, Bart Van Assche wrote: > > > With this patch applied I see requests for which it seems like the timeout handler > > > did not get invoked: [ ... ] > > > > I just noticed the following in the system log, which is probably the reason why some > > requests got stuck: > > > > Feb 7 15:16:26 ubuntu-vm kernel: BUG: unable to handle kernel NULL pointer dereference at (null) > > Feb 7 15:16:26 ubuntu-vm kernel: IP: scsi_times_out+0x17/0x2c0 [scsi_mod] > > Feb 7 15:16:26 ubuntu-vm kernel: PGD 0 P4D 0 > > Feb 7 15:16:26 ubuntu-vm kernel: Oops: 0000 [#1] PREEMPT SMP > > Feb 7 15:16:26 ubuntu-vm kernel: Modules linked in: dm_service_time ib_srp libcrc32c crc32c_generic scsi_transport_srp target_core_pscsi target_core_file ib_srpt target_core_iblock > > target_core_mod > > rdma_rxe ip6_udp_tunnel udp_tunnel ib_umad ib_uverbs scsi_debug brd mq_deadline kyber_iosched deadline_iosched cfq_iosched bfq crct10dif_pclmul crc32_pclmul af_packet ghash_clmulni_intel pcbc > > aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw virtio_console virtio_balloon sg button i2c_piix4 dm_multipath dm_mod dax scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm > > iw_cm > > ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables ipv6 autofs4 ext4 crc16 mbcache jbd2 sd_mod virtio_net virtio_blk virtio_scsi sr_mod cdrom > > ata_generic > > pata_acpi psmouse crc32c_intel i2c_core atkbd > > Feb 7 15:16:26 ubuntu-vm kernel: uhci_hcd ehci_hcd intel_agp ata_piix intel_gtt agpgart libata virtio_pci usbcore virtio_ring virtio scsi_mod usb_common unix > > Feb 7 15:16:26 ubuntu-vm kernel: CPU: 0 PID: 146 Comm: kworker/0:1H Not tainted 4.15.0-dbg+ #1 > > Feb 7 15:16:26 ubuntu-vm kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > > Feb 7 15:16:26 ubuntu-vm kernel: Workqueue: kblockd blk_mq_timeout_work > > Feb 7 15:16:26 ubuntu-vm kernel: RIP: 0010:scsi_times_out+0x17/0x2c0 [scsi_mod] > > Feb 7 15:16:26 ubuntu-vm kernel: RSP: 0018:ffff98f0c02abd58 EFLAGS: 00010246 > > Feb 7 15:16:26 ubuntu-vm kernel: RAX: 0000000000000000 RBX: ffff965de2b3a2c0 RCX: 0000000000000000 > > Feb 7 15:16:26 ubuntu-vm kernel: RDX: ffffffffc0094740 RSI: 0000000000000000 RDI: ffff965de2b3a2c0 > > Feb 7 15:16:26 ubuntu-vm kernel: RBP: ffff965de2b3a438 R08: fffffffffffffffc R09: 0000000000000007 > > Feb 7 15:16:26 ubuntu-vm kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 > > Feb 7 15:16:26 ubuntu-vm kernel: R13: 0000000000000000 R14: ffff965de2a44218 R15: ffff965de2a48728 > > Feb 7 15:16:26 ubuntu-vm kernel: FS: 0000000000000000(0000) GS:ffff965dffc00000(0000) knlGS:0000000000000000 > > Feb 7 15:16:26 ubuntu-vm kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Feb 7 15:16:26 ubuntu-vm kernel: CR2: 0000000000000000 CR3: 000000003ce0f003 CR4: 00000000003606f0 > > Feb 7 15:16:26 ubuntu-vm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > Feb 7 15:16:26 ubuntu-vm kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Feb 7 15:16:26 ubuntu-vm kernel: Call Trace: > > Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_terminate_expired+0x42/0x80 > > Feb 7 15:16:26 ubuntu-vm kernel: bt_iter+0x3d/0x50 > > Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_queue_tag_busy_iter+0xe9/0x200 > > Feb 7 15:16:26 ubuntu-vm kernel: blk_mq_timeout_work+0x181/0x2e0 > > Feb 7 15:16:26 ubuntu-vm kernel: process_one_work+0x21c/0x6d0 > > Feb 7 15:16:26 ubuntu-vm kernel: worker_thread+0x35/0x380 > > Feb 7 15:16:26 ubuntu-vm kernel: kthread+0x117/0x130 > > Feb 7 15:16:26 ubuntu-vm kernel: ret_from_fork+0x24/0x30 > > Feb 7 15:16:26 ubuntu-vm kernel: Code: ff ff 0f ff e9 fd fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 8d af 78 01 00 00 53 48 8b 87 b0 01 00 00 48 89 fb <4c> 8b 28 0f 1f 44 00 00 > > 65 > > 8b 05 6a b5 f8 3f 83 f8 0f 0f 87 ed > > Feb 7 15:16:26 ubuntu-vm kernel: RIP: scsi_times_out+0x17/0x2c0 [scsi_mod] RSP: ffff98f0c02abd58 > > Feb 7 15:16:26 ubuntu-vm kernel: CR2: 0000000000000000 > > Feb 7 15:16:26 ubuntu-vm kernel: ---[ end trace ce6c20d95bab450e ]--- > > So, that's dereferencing %rax which is NULL. That gotta be the ->host > deref in the following. > > enum blk_eh_timer_return scsi_times_out(struct request *req) > { > struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); > enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; > struct Scsi_Host *host = scmd->device->host; > ... > > That sounds more like a scsi hotplug bug than an issue in the timeout > code unless we messed up @req pointer to begin with. I don't think that this is related to SCSI hotplugging: this crash does not occur with the v4.15 block layer core and it does not occur with my timeout handler rework patch applied either. I think that means that we cannot exclude the block layer core timeout handler rework as a possible cause. The disassembler output is as follows: (gdb) disas /s scsi_times_out Dump of assembler code for function scsi_times_out: drivers/scsi/scsi_error.c: 282 { 0x0000000000005bd0 <+0>: push %r13 0x0000000000005bd2 <+2>: push %r12 0x0000000000005bd4 <+4>: push %rbp ./include/linux/blk-mq.h: 300 return rq + 1; 0x0000000000005bd5 <+5>: lea 0x178(%rdi),%rbp drivers/scsi/scsi_error.c: 282 { 0x0000000000005bdc <+12>: push %rbx 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; 285 struct Scsi_Host *host = scmd->device->host; 0x0000000000005bdd <+13>: mov 0x1b0(%rdi),%rax 282 { 0x0000000000005be4 <+20>: mov %rdi,%rbx 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; 285 struct Scsi_Host *host = scmd->device->host; 0x0000000000005be7 <+23>: mov (%rax),%r13 0x0000000000005bea <+26>: nopl 0x0(%rax,%rax,1) [ ... ] (gdb) print /x sizeof(struct request) $2 = 0x178 (gdb) print &(((struct scsi_cmnd*)0)->device) $4 = (struct scsi_device **) 0x38 <scsi_cmd_get_serial+8> (gdb) print &(((struct scsi_device*)0)->host) $5 = (struct Scsi_Host **) 0x0 The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The instruction at that address tries to dereference scsi_cmnd.device (%rax). The register dump shows that that pointer has the value NULL. The only function I know of that clears the scsi_cmnd.device pointer is scsi_req_init(). The only caller of that function in the SCSI core is scsi_initialize_rq(). That function has two callers, namely scsi_init_command() and blk_get_request(). However, the scsi_cmnd.device pointer is not cleared when a request finishes. This is why I think that the above crash report indicates that scsi_times_out() was called for a request that was being reinitialized and not by device hotplugging. Thanks, Bart.
Hello, Bart. On Thu, Feb 08, 2018 at 04:31:43PM +0000, Bart Van Assche wrote: > > That sounds more like a scsi hotplug bug than an issue in the timeout > > code unless we messed up @req pointer to begin with. > > I don't think that this is related to SCSI hotplugging: this crash does not > occur with the v4.15 block layer core and it does not occur with my timeout > handler rework patch applied either. I think that means that we cannot > exclude the block layer core timeout handler rework as a possible cause. > > The disassembler output is as follows: > > (gdb) disas /s scsi_times_out > Dump of assembler code for function scsi_times_out: > drivers/scsi/scsi_error.c: > 282 { > 0x0000000000005bd0 <+0>: push %r13 > 0x0000000000005bd2 <+2>: push %r12 > 0x0000000000005bd4 <+4>: push %rbp > ./include/linux/blk-mq.h: > 300 return rq + 1; > 0x0000000000005bd5 <+5>: lea 0x178(%rdi),%rbp > drivers/scsi/scsi_error.c: > 282 { > 0x0000000000005bdc <+12>: push %rbx > 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); > 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; > 285 struct Scsi_Host *host = scmd->device->host; > 0x0000000000005bdd <+13>: mov 0x1b0(%rdi),%rax > 282 { > 0x0000000000005be4 <+20>: mov %rdi,%rbx > 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); > 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; > 285 struct Scsi_Host *host = scmd->device->host; > 0x0000000000005be7 <+23>: mov (%rax),%r13 > 0x0000000000005bea <+26>: nopl 0x0(%rax,%rax,1) > [ ... ] > (gdb) print /x sizeof(struct request) > $2 = 0x178 > (gdb) print &(((struct scsi_cmnd*)0)->device) > $4 = (struct scsi_device **) 0x38 <scsi_cmd_get_serial+8> > (gdb) print &(((struct scsi_device*)0)->host) > $5 = (struct Scsi_Host **) 0x0 > > The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The > instruction at that address tries to dereference scsi_cmnd.device (%rax). The > register dump shows that that pointer has the value NULL. The only function I > know of that clears the scsi_cmnd.device pointer is scsi_req_init(). The only > caller of that function in the SCSI core is scsi_initialize_rq(). That function > has two callers, namely scsi_init_command() and blk_get_request(). However, > the scsi_cmnd.device pointer is not cleared when a request finishes. This is > why I think that the above crash report indicates that scsi_times_out() was > called for a request that was being reinitialized and not by device hotplugging. I could be misreading it but scsi_cmnd->device dereference should be the following. 0x0000000000005bdd <+13>: mov 0x1b0(%rdi),%rax %rdi is @req, 0x1b0(%rdi) seems to be the combined arithmetic of blk_mq_rq_to_pdu() and ->device dereference - 0x178 + 0x38. The faulting access is (%rax), which is deref'ing host from device. Thanks.
On Thu, 2018-02-08 at 09:00 -0800, tj@kernel.org wrote: > On Thu, Feb 08, 2018 at 04:31:43PM +0000, Bart Van Assche wrote: > > The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The > > instruction at that address tries to dereference scsi_cmnd.device (%rax). The > > register dump shows that that pointer has the value NULL. The only function I > > know of that clears the scsi_cmnd.device pointer is scsi_req_init(). The only > > caller of that function in the SCSI core is scsi_initialize_rq(). That function > > has two callers, namely scsi_init_command() and blk_get_request(). However, > > the scsi_cmnd.device pointer is not cleared when a request finishes. This is > > why I think that the above crash report indicates that scsi_times_out() was > > called for a request that was being reinitialized and not by device hotplugging. > > I could be misreading it but scsi_cmnd->device dereference should be > the following. > > 0x0000000000005bdd <+13>: mov 0x1b0(%rdi),%rax > > %rdi is @req, 0x1b0(%rdi) seems to be the combined arithmetic of > blk_mq_rq_to_pdu() and ->device dereference - 0x178 + 0x38. The > faulting access is (%rax), which is deref'ing host from device. Hello Tejun, I think "dereferencing a pointer" means reading the memory location that pointer points at? Anyway, I think we both interpret the crash report in the same way, namely that it means that scmd->device == NULL. Thanks, Bart.
Hello, Bart. On Thu, Feb 08, 2018 at 05:10:45PM +0000, Bart Van Assche wrote: > I think "dereferencing a pointer" means reading the memory location that pointer points > at? Anyway, I think we both interpret the crash report in the same way, namely that it > means that scmd->device == NULL. No, what I'm trying to say is that it's is the device->host reference not the scmd->device reference. Thanks.
On Thu, 2018-02-08 at 09:19 -0800, tj@kernel.org wrote: > Hello, Bart. > > On Thu, Feb 08, 2018 at 05:10:45PM +0000, Bart Van Assche wrote: > > I think "dereferencing a pointer" means reading the memory location that pointer points > > at? Anyway, I think we both interpret the crash report in the same way, namely that it > > means that scmd->device == NULL. > > No, what I'm trying to say is that it's is the device->host reference > not the scmd->device reference. Again, I think that means that we agree about the interpretation of the crash report. The big question here is what the next step should be to analyze this further and/or to avoid that this crash can occur? Thanks, Bart.
On Thu, Feb 08, 2018 at 05:37:46PM +0000, Bart Van Assche wrote: > On Thu, 2018-02-08 at 09:19 -0800, tj@kernel.org wrote: > > Hello, Bart. > > > > On Thu, Feb 08, 2018 at 05:10:45PM +0000, Bart Van Assche wrote: > > > I think "dereferencing a pointer" means reading the memory location that pointer points > > > at? Anyway, I think we both interpret the crash report in the same way, namely that it > > > means that scmd->device == NULL. > > > > No, what I'm trying to say is that it's is the device->host reference > > not the scmd->device reference. > > Again, I think that means that we agree about the interpretation of the crash > report. The big question here is what the next step should be to analyze this > further and/or to avoid that this crash can occur? Heh, sorry about not being clear. What I'm trying to say is that scmd->device != NULL && device->host == NULL. Or was this what you were saying all along? Thanks.
On Thu, 2018-02-08 at 09:40 -0800, tj@kernel.org wrote: > Heh, sorry about not being clear. What I'm trying to say is that > scmd->device != NULL && device->host == NULL. Or was this what you > were saying all along? What I agree with is that the request pointer (req argument) is stored in %rdi and that mov 0x1b0(%rdi),%rax loads scmd->device into %rax. Since RIP == scsi_times_out+0x17, since the instruction at that address tries to dereference %rax and since the register dump shows that %rax == NULL I think that means that scmd->device == NULL. Thanks, Bart.
On Thu, Feb 08, 2018 at 05:48:32PM +0000, Bart Van Assche wrote: > On Thu, 2018-02-08 at 09:40 -0800, tj@kernel.org wrote: > > Heh, sorry about not being clear. What I'm trying to say is that > > scmd->device != NULL && device->host == NULL. Or was this what you > > were saying all along? > > What I agree with is that the request pointer (req argument) is stored in %rdi > and that mov 0x1b0(%rdi),%rax loads scmd->device into %rax. Since RIP == > scsi_times_out+0x17, since the instruction at that address tries to dereference > %rax and since the register dump shows that %rax == NULL I think that means that > scmd->device == NULL. Ah, I was completely confused about which one had to be NULL. You're absolutely right. Let's continue earlier in the thread. Thanks.
diff --git a/block/blk-mq.c b/block/blk-mq.c index df93102..d6edf3b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -593,7 +593,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate) */ local_irq_save(flags); u64_stats_update_begin(&rq->aborted_gstate_sync); - rq->aborted_gstate = gstate; + smp_store_release(&rq->aborted_gstate, gstate); u64_stats_update_end(&rq->aborted_gstate_sync); local_irq_restore(flags); } @@ -605,7 +605,7 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq) do { start = u64_stats_fetch_begin(&rq->aborted_gstate_sync); - aborted_gstate = rq->aborted_gstate; + aborted_gstate = smp_load_acquire(&rq->aborted_gstate); } while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start)); return aborted_gstate; @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) * ->aborted_gstate is set, this may lead to ignored * completions and further spurious timeouts. */ - blk_mq_rq_update_aborted_gstate(req, 0); blk_add_timer(req); + blk_mq_rq_update_aborted_gstate(req, 0); break; case BLK_EH_NOT_HANDLED: break;