From patchwork Thu Jan 22 14:31:33 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Doug Ledford <dledford@redhat.com>
X-Patchwork-Id: 5685211
Return-Path: <linux-rdma-owner@kernel.org>
X-Original-To: patchwork-linux-rdma@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 1FD269F358
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 22 Jan 2015 14:32:14 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 487B7202F8
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 22 Jan 2015 14:32:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 3763920303
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Thu, 22 Jan 2015 14:32:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754213AbbAVOcF (ORCPT
	<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
	Thu, 22 Jan 2015 09:32:05 -0500
Received: from mx1.redhat.com ([209.132.183.28]:46114 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753536AbbAVOcB (ORCPT <rfc822;linux-rdma@vger.kernel.org>);
	Thu, 22 Jan 2015 09:32:01 -0500
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t0MEVt3r006923
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=FAIL); Thu, 22 Jan 2015 09:31:55 -0500
Received: from linux-ws.xsintricity.com ([10.10.116.28])
	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id t0MEVlC3011531; Thu, 22 Jan 2015 09:31:55 -0500
From: Doug Ledford <dledford@redhat.com>
To: linux-rdma@vger.kernel.org, roland@kernel.org
Cc: Amir Vadai <amirv@mellanox.com>, Eyal Perry <eyalpe@mellanox.com>,
	Or Gerlitz <gerlitz.or@gmail.com>, Erez Shitrit <erezsh@mellanox.com>,
	Doug Ledford <dledford@redhat.com>
Subject: [PATCH FIX For-3.19 v5 08/10] IB/ipoib: fix ipoib_mcast_restart_task
Date: Thu, 22 Jan 2015 09:31:33 -0500
Message-Id: 
 <207b490655084b686a5f2ec52d65d964d5a77fe4.1421936879.git.dledford@redhat.com>
In-Reply-To: <cover.1421936879.git.dledford@redhat.com>
References: <cover.1421936879.git.dledford@redhat.com>
In-Reply-To: <cover.1421936879.git.dledford@redhat.com>
References: <cover.1421936879.git.dledford@redhat.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	T_RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

There are two intertwined problems here.

First, if we are downing the interface, we will get called as part of
the process.  When that happens, we don't really want to do anything
here as the next thing to happen is ipoib_mcast_dev_flush will get
called.  We just want to let it do all the work since we don't need to
do any address matches and such.  So, we should be testing for whether
or not the interface is up at the beginning of this function and
returning if it isn't.

Second, the attempt to grab the rtnl_lock to prevent racing against
ipoib_mcast_dev_flush was mistaken.  The removed mcast groups are on a
stack local list head and so once we've removed them from the mcast
rbtree, ipoib_mcast_dev_flush can no longer find them and race us on
removing them.  However, because we attempt to take the lock, and then
check the status of the interface, if our interface has gone down, we
simply return.  This is absolutely the wrong thing to do as it means we
just leaked all of the mcast entries on our remove list.

The fix is to check for the interface being up at the very beginning of
the function, and then to always remove our entries on the remove list
no matter what as they will get leaked otherwise, and skip taking the
rtnl_lock since it isn't needed anyway.

Signed-off-by: Doug Ledford <dledford@redhat.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 972fa15a225..10e05437c83 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -871,6 +871,9 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 	unsigned long flags;
 	struct ib_sa_mcmember_rec rec;
 
+	if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+		return;
+
 	ipoib_dbg_mcast(priv, "restarting multicast task\n");
 
 	local_irq_save(flags);
@@ -965,21 +968,6 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 		if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
 			wait_for_completion(&mcast->done);
 
-	/*
-	 * We have to cancel outside of the spinlock, but we have to
-	 * take the rtnl lock or else we race with the removal of
-	 * entries from the remove list in mcast_dev_flush as part
-	 * of ipoib_stop().  We detect the drop of the ADMIN_UP flag
-	 * to signal that we have hit this particular race, and we
-	 * return since we know we don't need to do anything else
-	 * anyway.
-	 */
-	while (!rtnl_trylock()) {
-		if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
-			return;
-		else
-			msleep(20);
-	}
 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
 		ipoib_mcast_leave(mcast->dev, mcast);
 		ipoib_mcast_free(mcast);
@@ -988,7 +976,6 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 	 * Restart our join task thread if needed
 	 */
 	__ipoib_mcast_continue_join_thread(priv, NULL, 0);
-	rtnl_unlock();
 }
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG