Message ID | 1512683953-5959-1-git-send-email-rafael.tinoco@canonical.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Lee, Chris, Some test results. - Single unmounted disk, with transport connection wiped before final logout: http://pastebin.ubuntu.com/26139576/ - Multiple mounted disks, multipath dev-mapper, all transport connections were wiped before the final logout, with heavy write workload: http://pastebin.ubuntu.com/26139620/ Considering sd_shutdown logic - sd_shutdown, sd_sync_cache for each scsi_disk, 3 attempts of scsi_execute with SYNCHRONIZE_CACHE cmd each - you can see that, because transport was down, first SYNC_CACHE cmd waits for the request timeout and for the abort_timeout. All other cmds fail in the enqueuing phase, because of the transport failure + previous timeout + server shutdown happening simultaneously, so you don't have to wait for timeout on each command again. This change also suits any pending requests, not only those coming from sd_shutdown, and it allows OS to reboot and shutdown, back again, independently of how bad userland was configured. Thank you in advance for considering it. -Rafael > On 07/12/2017, at 07:59 PM, Rafael David Tinoco <rafael.tinoco@canonical.com> wrote: > > If, for any reason, userland shuts down iscsi transport interfaces > before proper logouts - like when logging in to LUNs manually, > without logging out on server shutdown, or when automated scripts > can't umount/logout from logged LUNs - kernel will hang forever on > its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd > to all still existent paths. > > PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" > #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee > #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 > #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 > #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 > #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c > #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 > #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 > #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe > #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 > #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c > > This happens because iscsi_eh_cmd_timed_out(), the transport layer > timeout helper, would tell the queue timeout function (scsi_times_out) > to reset the request timer over and over, until the session state is > back to logged in state. Unfortunately, during server shutdown, this > might never happen again. > > Other option would be "not to handle" the issue in the transport > layer. That would trigger the error handler logic, which would also > need the session state to be logged in again. > > Best option, for such case, is to tell upper layers that the command > was handled during the transport layer error handler helper, marking > it as DID_NO_CONNECT, which will allow completion and inform about > the problem. > > After the session was marked as ISCSI_STATE_FAILED, due to the first > timeout during the server shutdown phase, all subsequent cmds will > fail to be queued, allowing upper logic to fail faster. > > Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
On Thu, 2017-12-07 at 19:59 -0200, Rafael David Tinoco wrote: > This happens because iscsi_eh_cmd_timed_out(), the transport layer > timeout helper, would tell the queue timeout function (scsi_times_out) > to reset the request timer over and over, until the session state is > back to logged in state. Unfortunately, during server shutdown, this > might never happen again. Hello Rafael, Have you considered to make iscsi_eh_cmd_timed_out() return BLK_EH_HANDLED if system_state != SYSTEM_RUNNING? That could result in slower shutdown than with your patch but such a change would probably be really easy to review. Thanks, Bart.
Hello Bart, I am returning BLK_EH_HANDLED in iscsi_eh_cmd_timed_out(). Do you mean something different ? That paragraph means that I have tried to return BLK_EH_NOT_HANDLED first, because that would be the other option instead of BLK_EH_RESET_TIMER (which is causing this issue), but if I did it, the EH logic would try scsi_abort_command() and - successful or not - it would try to get sense before completion, causing more traffic on a bad state transport. Best way to allow faster completion was, indeed, returning BLK_EH_HANDLED, but changing result to DID_NO_CONNECT, because that will tell block layer not to retry, allowing the completion to happen in the SOFTIRQ handler, informing result to the upper layer. For the queue, simply now allowing queueing on such condition (shutdown + state != logged in) seemed correct. Let me know if you want me to try something else. I would be happy to. Best, -Rafael > On 08/12/2017, at 09:12 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Thu, 2017-12-07 at 19:59 -0200, Rafael David Tinoco wrote: >> This happens because iscsi_eh_cmd_timed_out(), the transport layer >> timeout helper, would tell the queue timeout function (scsi_times_out) >> to reset the request timer over and over, until the session state is >> back to logged in state. Unfortunately, during server shutdown, this >> might never happen again. > > Hello Rafael, > > Have you considered to make iscsi_eh_cmd_timed_out() return BLK_EH_HANDLED > if system_state != SYSTEM_RUNNING? That could result in slower shutdown than > with your patch but such a change would probably be really easy to review. > > Thanks, > > Bart.
> If, for any reason, userland shuts down iscsi transport interfaces > before proper logouts - like when logging in to LUNs manually, without > logging out on server shutdown, or when automated scripts can't > umount/logout from logged LUNs - kernel will hang forever on its > sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all > still existent paths. Chris and Lee: Please review. Thanks!
Hello Lee, I saw there was a reply from Lee to googlegroups (https://goo.gl/x8LhFm). I haven't responded before because I was subscribed only to linux-scsi@, my bad. Yes, it worked as expected. From your question: > Rafael: > Did you test this change, i.e. shutdowns no longer hang, under test > circumstances, with this change? Yes, my start work was: https://pastebin.ubuntu.com/26292711/ And the tests during devel: https://pastebin.ubuntu.com/26292701/ https://pastebin.ubuntu.com/26292702/ And finally with the submitted patch the expected behavior: https://pastebin.ubuntu.com/26292706/ -> just 1 session https://pastebin.ubuntu.com/26292708/ -> multiple sessions Note: [ 78.427670] session6: iscsi_eh_cmd_timed_out scsi cmd ffff88b2ef499160 timedout [ 78.427671] session6: iscsi_eh_cmd_timed_out sc on shutdown, handled [ 78.427671] session6: iscsi_eh_cmd_timed_out return shutdown or nh [ 78.437637] session7: iscsi_eh_cmd_timed_out scsi cmd ffff88b2f161c160 timedout [ 78.438366] session7: iscsi_eh_cmd_timed_out sc on shutdown, handled [ 78.439004] session7: iscsi_eh_cmd_timed_out return shutdown or nh [ 78.441551] session8: iscsi_eh_cmd_timed_out scsi cmd ffff88b2ef49a160 timedout [ 78.442278] session8: iscsi_eh_cmd_timed_out sc on shutdown, handled [ 78.442914] session8: iscsi_eh_cmd_timed_out return shutdown or nh [ 109.149438] session2: iscsi_eh_cmd_timed_out scsi cmd ffff88b2ef1fd560 timedout [ 109.150251] session2: iscsi_eh_cmd_timed_out sc on shutdown, handled [ 109.150969] session2: iscsi_eh_cmd_timed_out return shutdown or nh [ 78.427506] sd 8:0:0:1: tag#0 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 78.427662] sd 7:0:0:1: tag#0 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 78.439548] sd 9:0:0:1: tag#0 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 109.146728] sd 3:0:0:1: tag#0 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK the iscsi_eh_cmd_timed_out logic after the ping timeouts. And then: [ 78.427678] sd 7:0:0:1: tag#0 Done: SUCCESS Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 78.443456] sd 8:0:0:1: tag#0 Done: SUCCESS Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 78.447592] sd 9:0:0:1: tag#0 Done: SUCCESS Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 109.151582] sd 3:0:0:1: tag#0 Done: SUCCESS Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK The iscsi_queuecommand logic setting RESULT to DID_NO_CONNECT when queueing under shutdown on disconnected transport. [ 78.427683] sd 7:0:0:1: Notifying upper driver of completion (result 10000) [ 78.445899] sd 8:0:0:1: Notifying upper driver of completion (result 10000) [ 78.450035] sd 9:0:0:1: Notifying upper driver of completion (result 10000) [ 109.154495] sd 3:0:0:1: Notifying upper driver of completion (result 10000) > [side note: we *really* need an open-iscsi test suite! Anybody?] I'm interested in creating/helping (specially now that I read big part of the code because of this bug). > As long as the upper levels handle this correctly, I'm good with it. Yes, check it out. At the end: [ 109.354984] sd 8:0:0:1: tag#0 0 sectors total, 0 bytes done. [ 109.355596] sd 8:0:0:1: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 109.356980] reboot: Restarting system [ 109.357392] reboot: machine restart You see the "sync cache failed" message (important to see you couldn't sync that disk and you might need to fix userland shutdown order) with DID_NO_CONNECT (since the sd_shutdown tries to sync 3 times you might see lots of DID_NO_CONNECT errors, for all sessions, but all of the commands will be handled after this change, and upper layer informed of the error). I hope that answers you. Let me know if you want me to provide any other information. Cheers -Rafael On Thu, Dec 21, 2017 at 12:39 AM, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >> If, for any reason, userland shuts down iscsi transport interfaces >> before proper logouts - like when logging in to LUNs manually, without >> logging out on server shutdown, or when automated scripts can't >> umount/logout from logged LUNs - kernel will hang forever on its >> sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all >> still existent paths. > > Chris and Lee: Please review. Thanks! > > -- > Martin K. Petersen Oracle Linux Engineering
On 12/07/2017 01:59 PM, Rafael David Tinoco wrote: > If, for any reason, userland shuts down iscsi transport interfaces > before proper logouts - like when logging in to LUNs manually, > without logging out on server shutdown, or when automated scripts > can't umount/logout from logged LUNs - kernel will hang forever on > its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd > to all still existent paths. > > PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" > #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee > #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 > #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 > #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 > #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c > #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 > #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 > #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe > #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 > #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c > > This happens because iscsi_eh_cmd_timed_out(), the transport layer > timeout helper, would tell the queue timeout function (scsi_times_out) > to reset the request timer over and over, until the session state is > back to logged in state. Unfortunately, during server shutdown, this > might never happen again. > > Other option would be "not to handle" the issue in the transport > layer. That would trigger the error handler logic, which would also > need the session state to be logged in again. > > Best option, for such case, is to tell upper layers that the command > was handled during the transport layer error handler helper, marking > it as DID_NO_CONNECT, which will allow completion and inform about > the problem. > > After the session was marked as ISCSI_STATE_FAILED, due to the first > timeout during the server shutdown phase, all subsequent cmds will > fail to be queued, allowing upper logic to fail faster. > > Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> > --- > drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 9c50d2d9f27c..785d1c55d152 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) > */ > switch (session->state) { > case ISCSI_STATE_FAILED: > + /* > + * cmds should fail during shutdown, if the session > + * state is bad, allowing completion to happen > + */ > + if (unlikely(system_state != SYSTEM_RUNNING)) { > + reason = FAILURE_SESSION_FAILED; > + sc->result = DID_NO_CONNECT << 16; > + break; > + } > case ISCSI_STATE_IN_RECOVERY: > reason = FAILURE_SESSION_IN_RECOVERY; > sc->result = DID_IMM_RETRY << 16; > @@ -1978,6 +1987,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) > } > > if (session->state != ISCSI_STATE_LOGGED_IN) { > + /* > + * During shutdown, if session is prematurely disconnected, > + * recovery won't happen and there will be hung cmds. Not > + * handling cmds would trigger EH, also bad in this case. > + * Instead, handle cmd, allow completion to happen and let > + * upper layer to deal with the result. > + */ > + if (unlikely(system_state != SYSTEM_RUNNING)) { > + sc->result = DID_NO_CONNECT << 16; > + ISCSI_DBG_EH(session, "sc on shutdown, handled\n"); > + rc = BLK_EH_HANDLED; > + goto done; > + } > /* > * We are probably in the middle of iscsi recovery so let > * that complete and handle the error. > @@ -2082,7 +2104,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) > task->last_timeout = jiffies; > spin_unlock(&session->frwd_lock); > ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ? > - "timer reset" : "nh"); > + "timer reset" : "shutdown or nh"); > return rc; > } > EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out); > Reviewed-by: Lee Duncan <lduncan@suse.com>
Rafael, > If, for any reason, userland shuts down iscsi transport interfaces > before proper logouts - like when logging in to LUNs manually, without > logging out on server shutdown, or when automated scripts can't > umount/logout from logged LUNs - kernel will hang forever on its > sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all > still existent paths. Applied to 4.16/scsi-queue. Thank you!
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 9c50d2d9f27c..785d1c55d152 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) */ switch (session->state) { case ISCSI_STATE_FAILED: + /* + * cmds should fail during shutdown, if the session + * state is bad, allowing completion to happen + */ + if (unlikely(system_state != SYSTEM_RUNNING)) { + reason = FAILURE_SESSION_FAILED; + sc->result = DID_NO_CONNECT << 16; + break; + } case ISCSI_STATE_IN_RECOVERY: reason = FAILURE_SESSION_IN_RECOVERY; sc->result = DID_IMM_RETRY << 16; @@ -1978,6 +1987,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) } if (session->state != ISCSI_STATE_LOGGED_IN) { + /* + * During shutdown, if session is prematurely disconnected, + * recovery won't happen and there will be hung cmds. Not + * handling cmds would trigger EH, also bad in this case. + * Instead, handle cmd, allow completion to happen and let + * upper layer to deal with the result. + */ + if (unlikely(system_state != SYSTEM_RUNNING)) { + sc->result = DID_NO_CONNECT << 16; + ISCSI_DBG_EH(session, "sc on shutdown, handled\n"); + rc = BLK_EH_HANDLED; + goto done; + } /* * We are probably in the middle of iscsi recovery so let * that complete and handle the error. @@ -2082,7 +2104,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) task->last_timeout = jiffies; spin_unlock(&session->frwd_lock); ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ? - "timer reset" : "nh"); + "timer reset" : "shutdown or nh"); return rc; } EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);
If, for any reason, userland shuts down iscsi transport interfaces before proper logouts - like when logging in to LUNs manually, without logging out on server shutdown, or when automated scripts can't umount/logout from logged LUNs - kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths. PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c This happens because iscsi_eh_cmd_timed_out(), the transport layer timeout helper, would tell the queue timeout function (scsi_times_out) to reset the request timer over and over, until the session state is back to logged in state. Unfortunately, during server shutdown, this might never happen again. Other option would be "not to handle" the issue in the transport layer. That would trigger the error handler logic, which would also need the session state to be logged in again. Best option, for such case, is to tell upper layers that the command was handled during the transport layer error handler helper, marking it as DID_NO_CONNECT, which will allow completion and inform about the problem. After the session was marked as ISCSI_STATE_FAILED, due to the first timeout during the server shutdown phase, all subsequent cmds will fail to be queued, allowing upper logic to fail faster. Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> --- drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)