From patchwork Fri Feb 1 18:50:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Haakon Bugge X-Patchwork-Id: 10793521 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0B2713B4 for ; Fri, 1 Feb 2019 18:51:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C558F32A9A for ; Fri, 1 Feb 2019 18:51:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B9BA332CCC; Fri, 1 Feb 2019 18:51:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB51432A9A for ; Fri, 1 Feb 2019 18:51:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730454AbfBASv2 (ORCPT ); Fri, 1 Feb 2019 13:51:28 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:34346 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728334AbfBASv1 (ORCPT ); Fri, 1 Feb 2019 13:51:27 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x11IhhB6159538; Fri, 1 Feb 2019 18:51:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=itmQR2vfK2LjAoFXRd2prm4AnmmAjlkYCXmZ+qSrXpI=; b=AryzL8SN5stdlEVHWcn5PxSNw3k6sehOtARht9XYmyYzxGuMAGDJrbUCPvkMocQo6CZo 416zcWoRTCQwPhMGokgBewNBJ+bBmgf+jUzuSKa1K+Ch6FTL5qE/7S4zw5yCaJhk8d5w ecC4iFwXZIH+DUsHYLl9oY1Obvr7syfhG23paJ1sBVmx8OMlIoGLOpmi8PzNUc8X8L97 j8gQAqf7mjXylgdiddB6dtPuhUgUA6lWacQEy0ikKBbNLgfv0PwHgkGjIIyqCnRO20l7 wMfxN4H+BgGWZQ/kH4wEjddmCFaAUCxralvutjoieWU6vY54QyjRaCI7fQqfxgvujrdA SQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2q8d2ergsh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 01 Feb 2019 18:51:19 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x11IpIlh019581 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 1 Feb 2019 18:51:19 GMT Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x11IpIEM020453; Fri, 1 Feb 2019 18:51:18 GMT Received: from lab02.no.oracle.com (/10.172.144.56) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 01 Feb 2019 10:51:17 -0800 From: =?utf-8?q?H=C3=A5kon_Bugge?= To: jgg@mellanox.com Cc: dledford@redhat.com, hal@dev.mellanox.co.il, sean.hefty@intel.com, leon@kernel.org, linux-rdma@vger.kernel.org, ira.weiny@intel.com, aron.silverton@oracle.com, mark.haywood@oracle.com Subject: [PATCH v4 3/4] ibacm: Unable to resurrect an interface Date: Fri, 1 Feb 2019 19:50:48 +0100 Message-Id: <20190201185049.239177-4-haakon.bugge@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190201185049.239177-1-haakon.bugge@oracle.com> References: <20190201185049.239177-1-haakon.bugge@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9154 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902010138 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When an IB port has been brought back to Active state, after being down, ibacm gets an event about it. It will then (re) enumerate the devices, and does so by executing an ioctl with SIOCGIFCONF. This particular ioctl will only return interfaces that are "running". There may be a delay after the IB port becomes Active until its address has been provisioned, and becomes "running". If ibacm attempts to associate IPoIB interfaces to the port during this interval, it will not see the interface because it is not "running". Later, when ibacm is asked for a Path Record (PR) using the IP address of the resurrected IPoIB interface, it will not be able to find the associated end-point (EP), and the following is printed in the log: acm_svr_resolve_path: notice - unknown local end point address The bug can be provoked by the following script. We have a single HCA with two ports, the IPoIB interfaces are named stib{0,1}, the IP address of the first interface is 192.168.200.200, and the remote IP address is 192.168.200.202. The LID of the IB switch is 1 and the switch port number connected to port 1 of the HCA is 22. This fix depends on the commit that re-factors the use of ioctl in acm_if_iter(), and instead uses netlink. Now, by reducing the requirements of the state of the interface, the EP is added, and afterwards, when an address is assigned, it is associated with the EP. This commit is a new implementation of https://patchwork.kernel.org/patch/10748357, which was NAKed. Signed-off-by: Håkon Bugge Reviewed-by: Ira Weiny v1 -> v2: * Removed Gerrit's Change-Id tag (Håkon) --- ibacm/src/acm_util.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/ibacm/src/acm_util.c b/ibacm/src/acm_util.c index fecb6c89..8807579d 100644 --- a/ibacm/src/acm_util.c +++ b/ibacm/src/acm_util.c @@ -180,19 +180,13 @@ static void acm_if_iter(struct nl_object *obj, void *_ctx_and_cb) uint16_t pkey; int addr_len; char *label; - int flags; - int ret; int af; link = rtnl_link_get(link_cache, rtnl_addr_get_ifindex(addr)); - flags = rtnl_link_get_flags(link); if (rtnl_link_get_arptype(link) != ARPHRD_INFINIBAND) return; - if (!(flags & IFF_RUNNING)) - return; - if (!(a = rtnl_addr_get_local(addr))) return; @@ -206,20 +200,18 @@ static void acm_if_iter(struct nl_object *obj, void *_ctx_and_cb) return; label = rtnl_addr_get_label(addr); - if (!label) - return; link_addr = rtnl_link_get_addr(link); + /* gid has a 4 byte offset into the link address */ memcpy(sgid.raw, nl_addr_get_binary_addr(link_addr) + 4, sizeof(sgid)); - ret = acm_if_get_pkey(rtnl_link_get_name(link), &pkey); - if (ret) + if (acm_if_get_pkey(rtnl_link_get_name(link), &pkey)) return; acm_log(2, "name: %5s label: %9s index: %2d flags: %s addr: %s pkey: 0x%04x guid: 0x%lx\n", rtnl_link_get_name(link), label, rtnl_addr_get_ifindex(addr), - rtnl_link_flags2str(flags, flags_str, sizeof(flags_str)), + rtnl_link_flags2str(rtnl_link_get_flags(link), flags_str, sizeof(flags_str)), nl_addr2str(a, ip_str, sizeof(ip_str)), pkey, be64toh(sgid.global.interface_id));