Message ID | 1539141790-13557-3-git-send-email-nab@linux-iscsi.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | target: Fix v4.19-rc active I/O shutdown deadlock | expand |
> On Oct 9, 2018, at 10:23 PM, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote: > > From: Nicholas Bellinger <nab@linux-iscsi.org> > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no > signals will be pending for task_struct executing the normal session shutdown > and I/O quiesce code-path. > > For example, iscsi-target and iser-target issue SIGINT to all kthreads as > part of session shutdown. This has been the behaviour since day one. > > As-is when signals are pending with se_cmds active in se_sess->sess_cmd_list, > wait_event_interruptible_lock_irq_timeout() returns a negative number and > immediately kills the machine because of the do while (ret <= 0) loop that > was added in commit 00d909a107 to spin while backend I/O is taking any > amount of extended time (say 30 seconds) to complete. > > Here's what it looks like in action with debug plus delayed backend I/O > completion: > > [ 4951.909951] se_sess: 000000003e7e08fa before target_wait_for_sess_cmds > [ 4951.914600] target_wait_for_sess_cmds: signal_pending: 1 > [ 4951.918015] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 0 > [ 4951.921639] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 1 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 2 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 3 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 4 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 5 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 6 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 7 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 8 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 9 > > ... followed by the usual RCU CPU stalls and deadlock. > > There was never a case pre commit 00d909a107 where wait_for_complete(&se_cmd->cmd_wait_comp) > was able to be interruptted, so to address this for v4.19+ moving forward go ahead and > use wait_event_lock_irq_timeout() instead so new code works with all fabric drivers. > > Also for commit 00d909a107, fix a minor regression in target_release_cmd_kref() > to only wake_up the new se_sess->cmd_list_wq only when shutdown has actually > been triggered via se_sess->sess_tearing_down. > > Fixes: 00d909a107 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted") > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: Mike Christie <mchristi@redhat.com> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Sagi Grimberg <sagi@grimberg.me> > Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com> > Tested-by: Nicholas Bellinger <nab@linux-iscsi.org> > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> > --- > drivers/target/target_core_transport.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > Reviewed-by: Bryant G. Ly <bly@catalogicsoftware.com> -Bryant
On 10/9/18 8:23 PM, Nicholas A. Bellinger wrote: > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 86c0156..fc3093d2 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref) > if (se_sess) { > spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > list_del_init(&se_cmd->se_cmd_list); > - if (list_empty(&se_sess->sess_cmd_list)) > + if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list)) > wake_up(&se_sess->cmd_list_wq); > spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); > } > @@ -2907,7 +2907,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) > > spin_lock_irq(&se_sess->sess_cmd_lock); > do { > - ret = wait_event_interruptible_lock_irq_timeout( > + ret = wait_event_lock_irq_timeout( > se_sess->cmd_list_wq, > list_empty(&se_sess->sess_cmd_list), > se_sess->sess_cmd_lock, 180 * HZ); Since this patch addresses two different issues (performance improvement + fix for a hang), I think it should be split in two patches. Thanks, Bart.
On Wed, Oct 10, 2018 at 03:23:10AM +0000, Nicholas A. Bellinger wrote: > From: Nicholas Bellinger <nab@linux-iscsi.org> > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no > signals will be pending for task_struct executing the normal session shutdown > and I/O quiesce code-path. > > For example, iscsi-target and iser-target issue SIGINT to all kthreads as > part of session shutdown. This has been the behaviour since day one. Not knowing much context here; but does it make sense for those kthreads to handle signals, ever? Most kthreads should be fine with ignore_signals().
On 10/09/2018 10:23 PM, Nicholas A. Bellinger wrote: > From: Nicholas Bellinger <nab@linux-iscsi.org> > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no > signals will be pending for task_struct executing the normal session shutdown > and I/O quiesce code-path. > > For example, iscsi-target and iser-target issue SIGINT to all kthreads as > part of session shutdown. This has been the behaviour since day one. > > As-is when signals are pending with se_cmds active in se_sess->sess_cmd_list, > wait_event_interruptible_lock_irq_timeout() returns a negative number and > immediately kills the machine because of the do while (ret <= 0) loop that > was added in commit 00d909a107 to spin while backend I/O is taking any > amount of extended time (say 30 seconds) to complete. > > Here's what it looks like in action with debug plus delayed backend I/O > completion: > > [ 4951.909951] se_sess: 000000003e7e08fa before target_wait_for_sess_cmds > [ 4951.914600] target_wait_for_sess_cmds: signal_pending: 1 > [ 4951.918015] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 0 > [ 4951.921639] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 1 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 2 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 3 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 4 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 5 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 6 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 7 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 8 > [ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 9 > > ... followed by the usual RCU CPU stalls and deadlock. > > There was never a case pre commit 00d909a107 where wait_for_complete(&se_cmd->cmd_wait_comp) > was able to be interruptted, so to address this for v4.19+ moving forward go ahead and > use wait_event_lock_irq_timeout() instead so new code works with all fabric drivers. > > Also for commit 00d909a107, fix a minor regression in target_release_cmd_kref() > to only wake_up the new se_sess->cmd_list_wq only when shutdown has actually > been triggered via se_sess->sess_tearing_down. > > Fixes: 00d909a107 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted") > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: Mike Christie <mchristi@redhat.com> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Sagi Grimberg <sagi@grimberg.me> > Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com> > Tested-by: Nicholas Bellinger <nab@linux-iscsi.org> > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> > --- > drivers/target/target_core_transport.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 86c0156..fc3093d2 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref) > if (se_sess) { > spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > list_del_init(&se_cmd->se_cmd_list); > - if (list_empty(&se_sess->sess_cmd_list)) > + if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list)) I think there is another issue with 00d909a107 and ibmvscsi_tgt. The problem is that ibmvscsi_tgt never called target_sess_cmd_list_set_waiting. It only called target_wait_for_sess_cmds. So before 00d909a107 there was a bug in that driver and target_wait_for_sess_cmds never did what was intended because sess_wait_list would always be empty. With 00d909a107, we no longer need to call target_sess_cmd_list_set_waiting to wait for outstanding commands, so for ibmvscsi_tgt will now wait for commands like we wanted. However, the commit added a WARN_ON that is hit if target_sess_cmd_list_set_waiting is not called, so we could hit that. So I think we need to add a target_sess_cmd_list_set_waiting call in ibmvscsi_tgt to go along with your patch chunk above and make sure we do not trigger the WARN_ON. > wake_up(&se_sess->cmd_list_wq); > spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); > } > @@ -2907,7 +2907,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) > > spin_lock_irq(&se_sess->sess_cmd_lock); > do { > - ret = wait_event_interruptible_lock_irq_timeout( > + ret = wait_event_lock_irq_timeout( > se_sess->cmd_list_wq, > list_empty(&se_sess->sess_cmd_list), > se_sess->sess_cmd_lock, 180 * HZ); >
Hey Peter & Co, On Wed, 2018-10-10 at 10:43 +0200, Peter Zijlstra wrote: > On Wed, Oct 10, 2018 at 03:23:10AM +0000, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger <nab@linux-iscsi.org> > > > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no > > signals will be pending for task_struct executing the normal session shutdown > > and I/O quiesce code-path. > > > > For example, iscsi-target and iser-target issue SIGINT to all kthreads as > > part of session shutdown. This has been the behaviour since day one. > > Not knowing much context here; but does it make sense for those > kthreads to handle signals, ever? Most kthreads should be fine with > ignore_signals(). > iscsi-target + ib-isert uses SIGINT amongst dedicated rx/tx connection kthreads to signal connection shutdown, requiring in-flight se_cmd I/O descriptors to be quiesced before making forward progress to release se_session. By the point wait_event_lock_irq_timeout() is called in the example here, one of the two rx/tx connection kthreads has been stopped, and the other kthread is still processing shutdown. So while historically the pending SIGINTs where not cleared (or ignored) during shutdown at this point, there is no reason why they could not be ignored for iscsi-target + ib-isert. That said, pre commit 00d909a107 code always used wait_for_completion() and ignored pending signals. As-is target_wait_for_sess_cmds() is called directly from fabric driver code and in one case also from user-space via configfs_write_file(), so AFAICT it does need TASK_UNINTERRUPTIBLE.
Hello MNC & Co, On Wed, 2018-10-10 at 11:58 -0500, Mike Christie wrote: > On 10/09/2018 10:23 PM, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger <nab@linux-iscsi.org> > > > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no > > signals will be pending for task_struct executing the normal session shutdown > > and I/O quiesce code-path. > > <SNIP> > > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > > index 86c0156..fc3093d2 100644 > > --- a/drivers/target/target_core_transport.c > > +++ b/drivers/target/target_core_transport.c > > @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref) > > if (se_sess) { > > spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > > list_del_init(&se_cmd->se_cmd_list); > > - if (list_empty(&se_sess->sess_cmd_list)) > > + if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list)) > > I think there is another issue with 00d909a107 and ibmvscsi_tgt. > > The problem is that ibmvscsi_tgt never called > target_sess_cmd_list_set_waiting. It only called > target_wait_for_sess_cmds. So before 00d909a107 there was a bug in that > driver and target_wait_for_sess_cmds never did what was intended because > sess_wait_list would always be empty. > > With 00d909a107, we no longer need to call > target_sess_cmd_list_set_waiting to wait for outstanding commands, so > for ibmvscsi_tgt will now wait for commands like we wanted. However, the > commit added a WARN_ON that is hit if target_sess_cmd_list_set_waiting > is not called, so we could hit that. > > So I think we need to add a target_sess_cmd_list_set_waiting call in > ibmvscsi_tgt to go along with your patch chunk above and make sure we do > not trigger the WARN_ON. > Nice catch. :) With target_wait_for_sess_cmd() usage pre 00d909a107 doing a list-splice in target_sess_cmd_list_set_waiting(), this particular usage in ibmvscsi_tgt has always been list_empty(&sess->sess_wait_list) = true (eg: no se_cmd I/O is quiesced, because no se_cmd in sess_wait_list) since commit 712db3eb in 4.9.y code. That said, ibmvscsi_tgt usage is very similar to vhost/scsi in the respect individual /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/ endpoints used by VMs do not remove their I_T nexus while the VM is active. So AFAICT, ibmvscsi_tgt doesn't strictly need target_sess_wait_for_cmd() at all if this is true.
On Wed, Oct 10, 2018 at 10:40:12PM -0700, Nicholas A. Bellinger wrote: > Hey Peter & Co, > > On Wed, 2018-10-10 at 10:43 +0200, Peter Zijlstra wrote: > > On Wed, Oct 10, 2018 at 03:23:10AM +0000, Nicholas A. Bellinger wrote: > > > From: Nicholas Bellinger <nab@linux-iscsi.org> > > > > > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no > > > signals will be pending for task_struct executing the normal session shutdown > > > and I/O quiesce code-path. > > > > > > For example, iscsi-target and iser-target issue SIGINT to all kthreads as > > > part of session shutdown. This has been the behaviour since day one. > > > > Not knowing much context here; but does it make sense for those > > kthreads to handle signals, ever? Most kthreads should be fine with > > ignore_signals(). > > > > iscsi-target + ib-isert uses SIGINT amongst dedicated rx/tx connection > kthreads to signal connection shutdown, requiring in-flight se_cmd I/O > descriptors to be quiesced before making forward progress to release > se_session. > > By the point wait_event_lock_irq_timeout() is called in the example > here, one of the two rx/tx connection kthreads has been stopped, and the > other kthread is still processing shutdown. So while historically the > pending SIGINTs where not cleared (or ignored) during shutdown at this > point, there is no reason why they could not be ignored for iscsi-target > + ib-isert. > > That said, pre commit 00d909a107 code always used wait_for_completion() > and ignored pending signals. As-is target_wait_for_sess_cmds() is > called directly from fabric driver code and in one case also from > user-space via configfs_write_file(), so AFAICT it does need > TASK_UNINTERRUPTIBLE. > Fair enough, thanks for the background. I'm always a bit wary when kthreads need to deal with signals, but this seems OK.
> On Oct 11, 2018, at 12:56 AM, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote: > > Hello MNC & Co, > > On Wed, 2018-10-10 at 11:58 -0500, Mike Christie wrote: >> On 10/09/2018 10:23 PM, Nicholas A. Bellinger wrote: >>> From: Nicholas Bellinger <nab@linux-iscsi.org> >>> >>> With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes no >>> signals will be pending for task_struct executing the normal session shutdown >>> and I/O quiesce code-path. >>> > > <SNIP> > >>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c >>> index 86c0156..fc3093d2 100644 >>> --- a/drivers/target/target_core_transport.c >>> +++ b/drivers/target/target_core_transport.c >>> @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref) >>> if (se_sess) { >>> spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); >>> list_del_init(&se_cmd->se_cmd_list); >>> - if (list_empty(&se_sess->sess_cmd_list)) >>> + if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list)) >> >> I think there is another issue with 00d909a107 and ibmvscsi_tgt. >> >> The problem is that ibmvscsi_tgt never called >> target_sess_cmd_list_set_waiting. It only called >> target_wait_for_sess_cmds. So before 00d909a107 there was a bug in that >> driver and target_wait_for_sess_cmds never did what was intended because >> sess_wait_list would always be empty. >> >> With 00d909a107, we no longer need to call >> target_sess_cmd_list_set_waiting to wait for outstanding commands, so >> for ibmvscsi_tgt will now wait for commands like we wanted. However, the >> commit added a WARN_ON that is hit if target_sess_cmd_list_set_waiting >> is not called, so we could hit that. >> >> So I think we need to add a target_sess_cmd_list_set_waiting call in >> ibmvscsi_tgt to go along with your patch chunk above and make sure we do >> not trigger the WARN_ON. >> > > Nice catch. :) > > With target_wait_for_sess_cmd() usage pre 00d909a107 doing a list-splice > in target_sess_cmd_list_set_waiting(), this particular usage in > ibmvscsi_tgt has always been list_empty(&sess->sess_wait_list) = true > (eg: no se_cmd I/O is quiesced, because no se_cmd in sess_wait_list) > since commit 712db3eb in 4.9.y code. > > That said, ibmvscsi_tgt usage is very similar to vhost/scsi in the > respect individual /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/ > endpoints used by VMs do not remove their I_T nexus while the VM is > active. > > So AFAICT, ibmvscsi_tgt doesn't strictly need target_sess_wait_for_cmd() > at all if this is true. > VMs do not remove the I_T nexus while the VM is active, so we can remove the target_wait_for_sess_cmd() call. -Bryant
Bryant, > VMs do not remove the I_T nexus while the VM is active, so we can remove > the target_wait_for_sess_cmd() call. Patch forthcoming?
Martin, > On Oct 15, 2018, at 11:13 PM, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > Bryant, > >> VMs do not remove the I_T nexus while the VM is active, so we can remove >> the target_wait_for_sess_cmd() call. > > Patch forthcoming? I can put out a patch, I’m trying to get someone at IBM to test it though. -Bryant
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 86c0156..fc3093d2 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref) if (se_sess) { spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); list_del_init(&se_cmd->se_cmd_list); - if (list_empty(&se_sess->sess_cmd_list)) + if (se_sess->sess_tearing_down && list_empty(&se_sess->sess_cmd_list)) wake_up(&se_sess->cmd_list_wq); spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); } @@ -2907,7 +2907,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) spin_lock_irq(&se_sess->sess_cmd_lock); do { - ret = wait_event_interruptible_lock_irq_timeout( + ret = wait_event_lock_irq_timeout( se_sess->cmd_list_wq, list_empty(&se_sess->sess_cmd_list), se_sess->sess_cmd_lock, 180 * HZ);