From patchwork Tue Sep 22 18:27:25 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roland Dreier X-Patchwork-Id: 49383 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n8MIQeqG012202 for ; Tue, 22 Sep 2009 18:27:30 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750803AbZIVS10 (ORCPT ); Tue, 22 Sep 2009 14:27:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750957AbZIVS10 (ORCPT ); Tue, 22 Sep 2009 14:27:26 -0400 Received: from sj-iport-1.cisco.com ([171.71.176.70]:16818 "EHLO sj-iport-1.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750803AbZIVS1Z (ORCPT ); Tue, 22 Sep 2009 14:27:25 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEANOzuEqrR7O6/2dsb2JhbADAUIhPAZASBYQb X-IronPort-AV: E=Sophos;i="4.44,432,1249257600"; d="scan'208";a="245324806" Received: from sj-dkim-2.cisco.com ([171.71.179.186]) by sj-iport-1.cisco.com with ESMTP; 22 Sep 2009 18:27:29 +0000 Received: from sj-core-3.cisco.com (sj-core-3.cisco.com [171.68.223.137]) by sj-dkim-2.cisco.com (8.12.11/8.12.11) with ESMTP id n8MIRTxC001870; Tue, 22 Sep 2009 11:27:29 -0700 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-3.cisco.com (8.13.8/8.14.3) with ESMTP id n8MIRTq8016161; Tue, 22 Sep 2009 18:27:29 GMT Received: from xfe-sjc-212.amer.cisco.com ([171.70.151.187]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 22 Sep 2009 11:27:28 -0700 Received: from roland-conroe ([10.33.42.9]) by xfe-sjc-212.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 22 Sep 2009 11:27:28 -0700 Received: by roland-conroe (Postfix, from userid 33217) id 2B10AE71D8; Tue, 22 Sep 2009 11:27:25 -0700 (PDT) From: Roland Dreier To: "Sean Hefty" Cc: "Bart Van Assche" , "Hal Rosenstock" , , Subject: Re: [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code References: X-Message-Flag: Warning: May contain useful information Date: Tue, 22 Sep 2009 11:27:25 -0700 In-Reply-To: (Sean Hefty's message of "Wed, 9 Sep 2009 14:22:28 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.91 (gnu/linux) MIME-Version: 1.0 X-OriginalArrivalTime: 22 Sep 2009 18:27:28.0491 (UTC) FILETIME=[571853B0:01CA3BB2] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=2245; t=1253644049; x=1254508049; c=relaxed/simple; s=sjdkim2002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rdreier@cisco.com; z=From:=20Roland=20Dreier=20 |Subject:=20Re=3A=20[PATCH/RFC]=20IB/mad=3A=20Fix=20lock-lo ck-timer=20deadlock=20in=20RMPP=20code |Sender:=20; bh=xb/pJ14xmMlBBT4n69mfn0XbE29jJrIIwRrbCSTTCS8=; b=M1Aw8fmLeB7Nwx7EPIC0QyZ8wH3WHspUYIfpeYxSvwJ1lhx7BiBdI0f5fD CjT7Ax6lcAY4KvoaQ1/p9CcZ4YqAGP3IPAEwKChQAcs/U5SEzDl8phTzKaYl qPr7SiKvas; Authentication-Results: sj-dkim-2; header.From=rdreier@cisco.com; dkim=pass ( sig from cisco.com/sjdkim2002 verified; ); Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org > The locking is needed to protect against items being removed from rmpp_list in > recv_timeout_handler() and recv_cleanup_handler(). No new items should be added > to the rmpp_list when ib_cancel_rmpp_recvs() is running (or there's a separate > bug). OK so how about something like this? Just hold the lock to mark the items on the list as being canceled, and then actually cancel the delayed work without the lock. I think this doesn't leave any races or holes where the delayed work can mess up the cancel. Reviewed-by: Sean Hefty --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c index 57a3c6f..4e0f282 100644 --- a/drivers/infiniband/core/mad_rmpp.c +++ b/drivers/infiniband/core/mad_rmpp.c @@ -37,7 +37,8 @@ enum rmpp_state { RMPP_STATE_ACTIVE, RMPP_STATE_TIMEOUT, - RMPP_STATE_COMPLETE + RMPP_STATE_COMPLETE, + RMPP_STATE_CANCELING }; struct mad_rmpp_recv { @@ -87,18 +88,22 @@ void ib_cancel_rmpp_recvs(struct ib_mad_agent_private *agent) spin_lock_irqsave(&agent->lock, flags); list_for_each_entry(rmpp_recv, &agent->rmpp_list, list) { + if (rmpp_recv->state != RMPP_STATE_COMPLETE) + ib_free_recv_mad(rmpp_recv->rmpp_wc); + rmpp_recv->state = RMPP_STATE_CANCELING; + } + spin_unlock_irqrestore(&agent->lock, flags); + + list_for_each_entry(rmpp_recv, &agent->rmpp_list, list) { cancel_delayed_work(&rmpp_recv->timeout_work); cancel_delayed_work(&rmpp_recv->cleanup_work); } - spin_unlock_irqrestore(&agent->lock, flags); flush_workqueue(agent->qp_info->port_priv->wq); list_for_each_entry_safe(rmpp_recv, temp_rmpp_recv, &agent->rmpp_list, list) { list_del(&rmpp_recv->list); - if (rmpp_recv->state != RMPP_STATE_COMPLETE) - ib_free_recv_mad(rmpp_recv->rmpp_wc); destroy_rmpp_recv(rmpp_recv); } } @@ -260,6 +265,10 @@ static void recv_cleanup_handler(struct work_struct *work) unsigned long flags; spin_lock_irqsave(&rmpp_recv->agent->lock, flags); + if (rmpp_recv->state == RMPP_STATE_CANCELING) { + spin_unlock_irqrestore(&rmpp_recv->agent->lock, flags); + return; + } list_del(&rmpp_recv->list); spin_unlock_irqrestore(&rmpp_recv->agent->lock, flags); destroy_rmpp_recv(rmpp_recv);