From patchwork Sun Nov 26 12:18:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 10075283 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1272B6057C for ; Sun, 26 Nov 2017 12:18:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BCA128C82 for ; Sun, 26 Nov 2017 12:18:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0090328C85; Sun, 26 Nov 2017 12:18:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF13428C82 for ; Sun, 26 Nov 2017 12:18:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752144AbdKZMSP (ORCPT ); Sun, 26 Nov 2017 07:18:15 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:43969 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752140AbdKZMSO (ORCPT ); Sun, 26 Nov 2017 07:18:14 -0500 Received: by mail-wr0-f193.google.com with SMTP id z34so628771wrz.10; Sun, 26 Nov 2017 04:18:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4cumC/CHEvRWGc18hucNq3qf0BEIbdTcoIVqjX/iz/A=; b=pYLCyD16NC50RxNvP01OL1BMi8vvBO1cnYxPYH9Aqqmp4CFf4r9oTytbpcRqIJgzTC rpVG/DMk+CTsClc07aWvK9IKU2nMs+fFl/Y3qQQtwHsmNQQBjX4ULN5NmcGuHKGY4989 iMpH8yFI6feYQkMVVt7V9U8ZjS2oB6OAkxZoCUMRXPsdgFllaTNcpGPugnlVrcMsAncF VSXFIFZtCs50i4T6Uz/E5YV4gB2X1c7U/NPg37mvYFtnoQmyepNEpQsOBnicdyQaIG1n yqt0il0ywHpBDUNvWl909pq3RUNSGqLomMWy0ky2KylDvwrvq8E+ya2EiigGF6I3emSz 8e3g== X-Gm-Message-State: AJaThX69uFSNVUr22jQChHC+WAQo6FVBMazQjdvEKlZ9zkGtRNFUFl05 ykLzoRGzV6UY8O0cQLTRxR91xoIv X-Google-Smtp-Source: AGs4zMZCCe4VInB7xyxD8vkQS8AimJvlPfrP8FZ/GvSeNREJugI7W0J6y4pKrBjyN8otaZlDZIJFsg== X-Received: by 10.223.136.250 with SMTP id g55mr29898492wrg.54.1511698693140; Sun, 26 Nov 2017 04:18:13 -0800 (PST) Received: from [192.168.64.169] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id n143sm30895867wmd.31.2017.11.26.04.18.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2017 04:18:12 -0800 (PST) Subject: Re: possible isert bug in tear down sequence To: "Amrani, Ram" , "linux-rdma@vger.kernel.org" , target-devel Cc: "Kalderon, Michal" , "Elior, Ariel" References: From: Sagi Grimberg Message-ID: <77fdee90-b51c-1f38-16ea-0183a223f06c@grimberg.me> Date: Sun, 26 Nov 2017 14:18:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hey Ram, > Let me add a third possibility, that is what we are hitting: > I see that isert uses isert_cma_handler() and in the following cases > drain won't be invoked: > case RDMA_CM_EVENT_REJECTED: /* FALLTHRU */ > isert_info("Connection rejected: %s\n", > rdma_reject_msg(cma_id, event->status)); > case RDMA_CM_EVENT_UNREACHABLE: /* FALLTHRU */ > case RDMA_CM_EVENT_CONNECT_ERROR: > ret = isert_connect_error(cma_id); > break; > > Specifically, I hit the rejected case. See dmesg below with added prints (rrr...). > We Are using > > [ 2455.241978] rrr created QP ffff880e984d6c00 > [ 2455.241982] isert: isert_login_post_recv: Setup sge: addr: eb19e4000 length: 8268 0x00000000 > [ 2455.241987] rrr post_recv qp=ffff880e984d6c00, wr_id=ffff880eb19e6064 > [ 2455.242108] isert: isert_cma_handler: rejected (8): status 10 id ffff880eb1f9b000 np ffff8810454d2c40 > [ 2455.242114] isert: isert_cma_handler: Connection rejected: stale conn > [ 2455.242121] isert: isert_release_kref: conn ffff880eb19e2000 final kref kworker/7:2/6058 > [ 2455.242127] isert: isert_connect_release: conn ffff880eb19e2000 > [ 2455.242156] rrr poll_recv qp=ffff880e984d6c00 RDMA_CQE_RESP_STS_WORK_REQUEST_FLUSHED_ERR, wr_id=ffff880eb19e6064 > [ 2455.242157] rrr destroyed QP ffff880e984d6c00 > [ 2455.242164] Modules linked in: netconsole target_core_user target_core_pscsi target_core_file target_core_iblock > [ 2455.242183] BUG: unable to handle kernel > [ 2455.242202] [] isert_login_recv_done+0x23/0x160 [ib_isert] > > A QP gets created, post_recv is invoked, poll_cq as well (flushed) the QP is destroyed and then the workqueue tries to dereference the QP... > > I'm checking why the connection got stale, but anyway I think ib_drain_qp() should be invoked. 100% correct :) > What do you think? Does this fix your issue: --- -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index ceabdb85df8b..9d4785ba24cb 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) { struct isert_conn *isert_conn = cma_id->qp->qp_context; + ib_drain_qp(isert_conn->qp); list_del_init(&isert_conn->node); isert_conn->cm_id = NULL; isert_put_conn(isert_conn);