From patchwork Sun May 8 14:16:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Al Viro X-Patchwork-Id: 9039001 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DD34E9F372 for ; Sun, 8 May 2016 14:19:08 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id EEEBF20138 for ; Sun, 8 May 2016 14:19:07 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 082312012D for ; Sun, 8 May 2016 14:19:07 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1azPWA-0006RV-1U; Sun, 08 May 2016 14:17:10 +0000 Received: from [2002:c35c:fd02::1] (helo=ZenIV.linux.org.uk) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1azPW5-0006QA-Md for linux-arm-kernel@lists.infradead.org; Sun, 08 May 2016 14:17:07 +0000 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.86_2 #1 (Red Hat Linux)) id 1azPVV-0005zF-QU; Sun, 08 May 2016 14:16:29 +0000 Date: Sun, 8 May 2016 15:16:29 +0100 From: Al Viro To: Tony Lindgren Subject: Re: NFSroot hangs with bad unlock balance in Linux next Message-ID: <20160508141629.GF2694@ZenIV.linux.org.uk> References: <20160505220344.GE5995@atomide.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160505220344.GE5995@atomide.com> User-Agent: Mutt/1.6.0 (2016-04-01) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160508_071705_914261_5E7D9B75 X-CRM114-Status: GOOD ( 17.33 ) X-Spam-Score: -1.1 (-) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-nfs@vger.kernel.org, Trond Myklebust , Anna Schumaker , linux-omap@vger.kernel.org, Christoph Hellwig , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, May 05, 2016 at 03:03:44PM -0700, Tony Lindgren wrote: > Hi, > > Looks like Linux next with NFSroot hangs for me at some point booting > into init. Then after a while it produces "BUG: bad unlock balance > detected!". > > This happens at least with omap5-uevm and igepv5. Not sure yet if it > also happens on other boards, the ones I'm seeing it happen both have > USB Ethernet controller. They usually hang after the system starts > being idle some tens of seconds into booting. > > I tried to bisect it down with no luck. I do have the following > trace, does that provide any clues? > kworker/0:2/112 is trying to release lock (&nfsi->rmdir_sem) at: > [] nfs_async_unlink_release+0x20/0x68 > but there are no more locks to release! Very strange. We grab that rwsem at the entry into nfs_call_unlink() and then either release it there and return or call nfs_do_call_unlink(). Which arranges for eventual call of nfs_async_unlink_release() (via ->rpc_release); nfs_async_unlink_release() releases the rwsem. Nobody else releases it (on the read side, that is). The only kinda-sorta possibility I see here is that the inode we are unlocking in that nfs_async_unlink_release() is not the one we'd locked in nfs_call_unlink() that has lead to it. That really shouldn't happen, though... Just to verify whether that's what we are hitting, could you try to reproduce that thing with the patch below on top of -next and see if it triggers any of those WARN_ON? diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c index d367b06..dbbb4c9 100644 --- a/fs/nfs/unlink.c +++ b/fs/nfs/unlink.c @@ -64,6 +64,10 @@ static void nfs_async_unlink_release(void *calldata) struct dentry *dentry = data->dentry; struct super_block *sb = dentry->d_sb; + if (WARN_ON(data->parent != dentry->d_parent) || + WARN_ON(data->parent_inode != dentry->d_parent->d_inode)) { + printk(KERN_ERR "WTF2[%pd4]", dentry); + } up_read(&NFS_I(d_inode(dentry->d_parent))->rmdir_sem); d_lookup_done(dentry); nfs_free_unlinkdata(data); @@ -114,7 +118,8 @@ static void nfs_do_call_unlink(struct nfs_unlinkdata *data) static int nfs_call_unlink(struct dentry *dentry, struct nfs_unlinkdata *data) { - struct inode *dir = d_inode(dentry->d_parent); + struct dentry *parent = dentry->d_parent; + struct inode *dir = d_inode(parent); struct dentry *alias; down_read(&NFS_I(dir)->rmdir_sem); @@ -152,6 +157,12 @@ static int nfs_call_unlink(struct dentry *dentry, struct nfs_unlinkdata *data) return ret; } data->dentry = alias; + data->parent = parent; + data->parent_inode = dir; + if (WARN_ON(parent != alias->d_parent) || + WARN_ON(dir != parent->d_inode)) { + printk(KERN_ERR "WTF1[%pd4]", alias); + } nfs_do_call_unlink(data); return 1; } diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index ee8491d..b01a7f1 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1471,6 +1471,8 @@ struct nfs_unlinkdata { struct nfs_removeargs args; struct nfs_removeres res; struct dentry *dentry; + struct dentry *parent; + struct inode *parent_inode; wait_queue_head_t wq; struct rpc_cred *cred; struct nfs_fattr dir_attr;