Message ID | 20190117002717.84686-3-bvanassche@acm.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | Two SRP initiator bug fixes | expand |
> + /* Check whether all requests have finished. */ > + blk_freeze_queue_start(q); > + time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ); > + blk_mq_unfreeze_queue(q); > > + return time_left > 0 ? SUCCESS : FAILED; This is entirely generic SCSI/block evel functionality. I'd rather have a new WAIT_FOR_FREEZE return value from ->eh_device_reset_handler and handle this in the SCSI midlayer.
On 1/19/19 2:04 AM, Christoph Hellwig wrote: >> + /* Check whether all requests have finished. */ >> + blk_freeze_queue_start(q); >> + time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ); >> + blk_mq_unfreeze_queue(q); >> >> + return time_left > 0 ? SUCCESS : FAILED; > > This is entirely generic SCSI/block evel functionality. I'd rather have > a new WAIT_FOR_FREEZE return value from ->eh_device_reset_handler and > handle this in the SCSI midlayer. Hi Christoph, Since a SCSI device must only reply to a reset task management function after all affected commands have completed, the only case in which that wait code is useful is if a regular reply is sent concurrently with the SCSI reset reply and the two replies get reordered. Since the SCSI error handler is able to deal with pending commands after a device reset, how about leaving out the queue freeze / unfreeze code? Thanks, Bart.
On Tue, 2019-01-22 at 15:55 +0000, Sasha Levin wrote: > [This is an automated email] > > This commit has been processed because it contains a "Fixes:" tag, > fixing commit: 94a9174c630c IB/srp: reduce lock coverage of command completion. > > The bot has tested the following trees: v4.20.3, v4.19.16, v4.14.94, v4.9.151, v4.4.171, v3.18.132. > > v4.20.3: Build OK! > v4.19.16: Build OK! > v4.14.94: Build OK! > v4.9.151: Build failed! Errors: > drivers/infiniband/ulp/srp/ib_srp.c:2657:2: error: implicit declaration of function ‘blk_freeze_queue_start’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-declaration] > drivers/infiniband/ulp/srp/ib_srp.c:2658:14: error: implicit declaration of function ‘blk_mq_freeze_queue_wait_timeout’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function- > declaration] > > v4.4.171: Build failed! Errors: > drivers/infiniband/ulp/srp/ib_srp.c:2612:2: error: implicit declaration of function ‘blk_freeze_queue_start’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-declaration] > drivers/infiniband/ulp/srp/ib_srp.c:2613:14: error: implicit declaration of function ‘blk_mq_freeze_queue_wait_timeout’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function- > declaration] > > v3.18.132: Failed to apply! Possible dependencies: > 205619f2f824 ("IB/srp: Remove stale connection retry mechanism") > 34aa654ecb8e ("IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning") > 394c595ee8c3 ("IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib()") > 509c07bc1850 ("IB/srp: Separate target and channel variables") > 747fe000ef38 ("IB/srp: Introduce two new srp_target_port member variables") > 77f2c1a40e6f ("IB/srp: Use block layer tags") > d92c0da71a35 ("IB/srp: Add multichannel support") > > > How should we proceed with this patch? Hi Sasha, Patch 2/2 does not have a "Cc: stable" tag because it definitely should NOT be backported to older kernels. This patch only works for blk-mq which is fine with kernel v5.0. Older kernels however support both the legacy block layer and blk-mq. Bart.
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 23e5c9afb8fb..f7ccbb07321b 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -3036,9 +3036,11 @@ static int srp_abort(struct scsi_cmnd *scmnd) static int srp_reset_device(struct scsi_cmnd *scmnd) { - struct srp_target_port *target = host_to_target(scmnd->device->host); + struct scsi_device *sdev = scmnd->device; + struct srp_target_port *target = host_to_target(sdev->host); struct srp_rdma_ch *ch; - int i, j; + struct request_queue *q = sdev->request_queue; + int time_left; u8 status; shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n"); @@ -3050,16 +3052,12 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) if (status) return FAILED; - for (i = 0; i < target->ch_count; i++) { - ch = &target->ch[i]; - for (j = 0; j < target->req_ring_size; ++j) { - struct srp_request *req = &ch->req_ring[j]; - - srp_finish_req(ch, req, scmnd->device, DID_RESET << 16); - } - } + /* Check whether all requests have finished. */ + blk_freeze_queue_start(q); + time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ); + blk_mq_unfreeze_queue(q); - return SUCCESS; + return time_left > 0 ? SUCCESS : FAILED; } static int srp_reset_host(struct scsi_cmnd *scmnd)
Since .scsi_done() must only be called after scsi_queue_rq() has finished, make sure that the SRP initiator driver does not call .scsi_done() while scsi_queue_rq() is in progress. Although invoking sg_reset -d while I/O is in progress works fine with kernel v4.20 and before, that is not the case with kernel v5.0-rc1. This patch avoids that the following crash is triggered with kernel v5.0-rc1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000138 CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G B 5.0.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10 Call Trace: blk_mq_sched_dispatch_requests+0x2f7/0x300 __blk_mq_run_hw_queue+0xd6/0x180 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x4f1/0xa20 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: Sergey Gorenko <sergeygo@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> Fixes: 94a9174c630c ("IB/srp: reduce lock coverage of command completion") # v2.6.38 Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-)