From patchwork Tue May 1 18:05:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Ledford X-Patchwork-Id: 10374283 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 089D160234 for ; Tue, 1 May 2018 18:05:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAF7228716 for ; Tue, 1 May 2018 18:05:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB35028D05; Tue, 1 May 2018 18:05:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 31E7628716 for ; Tue, 1 May 2018 18:05:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756295AbeEASFU (ORCPT ); Tue, 1 May 2018 14:05:20 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:39626 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756287AbeEASFU (ORCPT ); Tue, 1 May 2018 14:05:20 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B23EBEC014; Tue, 1 May 2018 18:05:19 +0000 (UTC) Received: from haswell-e.nc.xsintricity.com (ovpn-122-18.rdu2.redhat.com [10.10.122.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 69218111AF34; Tue, 1 May 2018 18:05:19 +0000 (UTC) Message-ID: <1525197913.11756.129.camel@redhat.com> Subject: Re: [PATCH rdma-rc] RDMA/cma: Zero out qp and ah attribute From: Doug Ledford To: Jason Gunthorpe , Leon Romanovsky Cc: Parav Pandit , RDMA mailing list , Raju Rangoju , Leon Romanovsky Date: Tue, 01 May 2018 14:05:13 -0400 In-Reply-To: <20180430215643.GE28402@mellanox.com> References: <20180429074646.10043-1-leon@kernel.org> <20180430215643.GE28402@mellanox.com> Organization: Red Hat, Inc. Mime-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 01 May 2018 18:05:19 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 01 May 2018 18:05:19 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'dledford@redhat.com' RCPT:'' Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > From: Parav Pandit > > > > Commit given in fixes tag introduced an accurate check to validate > > device, port, index by referring to the cache layer for querying GIDs > > for all link layers (IB, RoCE and iWarp). > > > > When rdmacm tries to modify the QP to RTR state for kernel consumers, > > qp and ah attributes are uninitialized. Each transport layer (IB/iWarp) > > initializes them depending on transport type. > > However qp ah_attr are not used for iWarp and remained uninitialized, > > which is further used in ib_query_gid() call. This results into a > > failure to query the GID due to an invalid GID index coming from > > the uninitialized stack memory. > > This is reported and discussed in thread [1]. > > What is ib_query_gid supposed to do for iWarp? Return the sole entry in the GID table. But, you pass the index, so you have to know to pass index 0. In this case, because of uninitialized memory in the ah_attr, it's being told to try and return another entry that is invalid. However, I see no benefit to doing this bandaid fix while leaving the real bug: that we are calling ib_query_gid() for no purpose in this function now a days. Just clear out the bad usage and move on. To that end, I've written a patch for this: [dledford@haswell-e linus (k.o/wip/dl-for-rc *)]$ git diff --- Steve/Shiraz, can either of you test this real quick to make sure it still works in your specific situation (I can't imagine it not working, but a test would be nice)? I think we should also be at a point where we can remove that BUG_ON()... diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8364223422d0..a693fcd4c513 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -868,7 +868,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; - union ib_gid sgid; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { @@ -891,12 +890,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, if (ret) goto out; - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, - rdma_ah_read_grh(&qp_attr.ah_attr)->sgid_index, - &sgid, NULL); - if (ret) - goto out; - BUG_ON(id_priv->cma_dev->device != id_priv->id.device); if (conn_param)