From patchwork Tue Oct 4 09:11:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hannes Reinecke X-Patchwork-Id: 9361399 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 389B9600C8 for ; Tue, 4 Oct 2016 09:11:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 244E52896E for ; Tue, 4 Oct 2016 09:11:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 149F428984; Tue, 4 Oct 2016 09:11:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 908C32896E for ; Tue, 4 Oct 2016 09:11:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753210AbcJDJLY (ORCPT ); Tue, 4 Oct 2016 05:11:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:51598 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752594AbcJDJLX (ORCPT ); Tue, 4 Oct 2016 05:11:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 007B0AC93; Tue, 4 Oct 2016 09:11:21 +0000 (UTC) Subject: Re: iscsi_trx going into D state To: Johannes Thumshirn , Robert LeBlanc References: <20161004075545.j52mg3a2jckrchlp@linux-x5ow.site> Cc: linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org From: Hannes Reinecke Message-ID: <5cfc7eb8-c59d-4b7a-3dee-99e17d72f251@suse.de> Date: Tue, 4 Oct 2016 11:11:18 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161004075545.j52mg3a2jckrchlp@linux-x5ow.site> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 10/04/2016 09:55 AM, Johannes Thumshirn wrote: > On Fri, Sep 30, 2016 at 11:14:57AM -0600, Robert LeBlanc wrote: >> We are having a reoccurring problem where iscsi_trx is going into D >> state. It seems like it is waiting for a session tear down to happen >> or something, but keeps waiting. We have to reboot these targets on >> occasion. This is running the 4.4.12 kernel and we have seen it on >> several previous 4.4.x and 4.2.x kernels. There is no message in dmesg >> or /var/log/messages. This seems to happen with increased frequency >> when we have a disruption in our Infiniband fabric, but can happen >> without any changes to the fabric (other than hosts rebooting). >> >> # ps aux | grep iscsi | grep D >> root 4185 0.0 0.0 0 0 ? D Sep29 0:00 [iscsi_trx] >> root 18505 0.0 0.0 0 0 ? D Sep29 0:00 [iscsi_np] >> >> # cat /proc/4185/stack >> [] target_wait_for_sess_cmds+0x49/0x1a0 >> [] isert_wait_conn+0x1ab/0x2f0 [ib_isert] >> [] iscsit_close_connection+0x162/0x840 >> [] iscsit_take_action_for_connection_exit+0x7f/0x100 >> [] iscsi_target_rx_thread+0x5a0/0xe80 >> [] kthread+0xd8/0xf0 >> [] ret_from_fork+0x3f/0x70 >> [] 0xffffffffffffffff >> >> # cat /proc/18505/stack >> [] iscsit_stop_session+0x1b1/0x1c0 >> [] iscsi_check_for_session_reinstatement+0x1e6/0x270 >> [] iscsi_target_check_for_existing_instances+0x30/0x40 >> [] iscsi_target_do_login+0x140/0x640 >> [] iscsi_target_start_negotiation+0x1c/0xb0 >> [] iscsi_target_login_thread+0xa9b/0xfc0 >> [] kthread+0xd8/0xf0 >> [] ret_from_fork+0x3f/0x70 >> [] 0xffffffffffffffff >> >> What can we do to help get this resolved? >> >> Thanks, >> >> ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Hi, > I've encountered the same issue and found a hack to fix it at [1] but I think > the correct way for handling this issue would be like you said to tear down > the session in case a TASK ABORT times out. Unfortunately I'm not really > familiar with the target code myself so I mainly use this reply to get me into > the Cc loop. > > [1] http://marc.info/?l=linux-scsi&m=147282568910535&w=2 > > Hmm. Looking at the code it looks as we might miss some calls to 'complete'. Can you try with the attached patch? Cheers, Hannes From d481d8c27df8c09ea3798ce4a7217a26c3533161 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Tue, 4 Oct 2016 11:05:46 +0200 Subject: [PATCH] iscsi_target: sanitze sess_wait_on_completion When closing a session we only should set 'sess_wait_on_completion' if we are actually calling wait_for_completion(). And we should indeed call 'complete' in these cases, too. And add some WARN_ON() if we mess up with calculating the number of completions, too. Signed-off-by: Hannes Reinecke --- drivers/target/iscsi/iscsi_target.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 39b928c..313724c 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -4287,6 +4287,7 @@ int iscsit_close_connection( if (!atomic_read(&sess->session_reinstatement) && atomic_read(&sess->session_fall_back_to_erl0)) { spin_unlock_bh(&sess->conn_lock); + WARN_ON(atomic_read(&sess->sleep_on_sess_wait_comp)); iscsit_close_session(sess); return 0; @@ -4557,7 +4558,6 @@ int iscsit_free_session(struct iscsi_session *sess) int is_last; spin_lock_bh(&sess->conn_lock); - atomic_set(&sess->sleep_on_sess_wait_comp, 1); list_for_each_entry_safe(conn, conn_tmp, &sess->sess_conn_list, conn_list) { @@ -4585,7 +4585,10 @@ int iscsit_free_session(struct iscsi_session *sess) if (atomic_read(&sess->nconn)) { spin_unlock_bh(&sess->conn_lock); + atomic_inc(&sess->sleep_on_sess_wait_comp); wait_for_completion(&sess->session_wait_comp); + atomic_dec(&sess->sleep_on_sess_wait_comp); + WARN_ON(atomic_read(&sess->sleep_on_sess_wait_comp)); } else spin_unlock_bh(&sess->conn_lock); @@ -4603,8 +4606,6 @@ void iscsit_stop_session( int is_last; spin_lock_bh(&sess->conn_lock); - if (session_sleep) - atomic_set(&sess->sleep_on_sess_wait_comp, 1); if (connection_sleep) { list_for_each_entry_safe(conn, conn_tmp, &sess->sess_conn_list, @@ -4636,7 +4637,10 @@ void iscsit_stop_session( if (session_sleep && atomic_read(&sess->nconn)) { spin_unlock_bh(&sess->conn_lock); + atomic_inc(&sess->sleep_on_sess_wait_comp); wait_for_completion(&sess->session_wait_comp); + atomic_dec(&sess->sleep_on_sess_wait_comp); + WARN_ON(atomic_read(&sess->sleep_on_sess_wait_comp); } else spin_unlock_bh(&sess->conn_lock); } -- 2.6.6