From patchwork Thu Jun 19 21:50:41 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 4386431 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 85B81BEEAA for ; Thu, 19 Jun 2014 21:45:46 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6F05A203AA for ; Thu, 19 Jun 2014 21:45:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 457FD203A9 for ; Thu, 19 Jun 2014 21:45:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965602AbaFSVpk (ORCPT ); Thu, 19 Jun 2014 17:45:40 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:64416 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965284AbaFSVpi (ORCPT ); Thu, 19 Jun 2014 17:45:38 -0400 Received: from pps.filterd (m0004060 [127.0.0.1]) by mx0b-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id s5JLjR80013943; Thu, 19 Jun 2014 14:45:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=message-id : date : from : mime-version : to : cc : subject : references : in-reply-to : content-type : content-transfer-encoding; s=facebook; bh=K6QyxL+lrhXxEyLvB6H11I/KwN4OATYG27VsI4f4Y1Q=; b=opiXbflv+hhBNhMTGw7T6RFQETZtYHX1TXHJVhHY835QbBzCMap6UuJ8lbKHd8E7BnSy PWEHzEmcZtdpAj0K5rtiC/NulNQ+6JncjtrZT1Vl3WHemaM2mGQYCtpEJQztGI310wbR JcwpMmfHjkW+kd5KtEtrccR6CvqxbuMXtZw= Received: from mail.thefacebook.com (mailwest.thefacebook.com [173.252.71.148]) by mx0b-00082601.pphosted.com with ESMTP id 1mkx0q1n77-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK); Thu, 19 Jun 2014 14:45:27 -0700 Received: from [172.25.45.198] (192.168.16.4) by mail.thefacebook.com (192.168.16.23) with Microsoft SMTP Server (TLS) id 14.3.174.1; Thu, 19 Jun 2014 14:45:20 -0700 Message-ID: <53A35B31.4060203@fb.com> Date: Thu, 19 Jun 2014 17:50:41 -0400 From: Chris Mason User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Waiman Long CC: Marc Dionne , Josef Bacik , , Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected References: <53A20FFF.3010807@hp.com> <53A2125B.3050701@fb.com> <53A21702.8090109@hp.com> <53A21C78.1040809@fb.com> <53A21E84.2050103@hp.com> <53A22064.7080400@fb.com> <53A2212E.7090907@hp.com> <53A2268F.3080807@fb.com> <53A22A01.7080505@hp.com> <53A246C6.5050408@fb.com> <53A2573B.1060901@hp.com> <53A31514.8030308@fb.com> <53A32353.5000104@hp.com> <53A343B9.3000900@fb.com> In-Reply-To: <53A343B9.3000900@fb.com> X-Enigmail-Version: 1.6 X-Originating-IP: [192.168.16.4] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52, 1.0.14, 0.0.0000 definitions=2014-06-19_05:2014-06-19, 2014-06-19, 1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 kscore.is_bulkscore=2.89993312696524e-09 kscore.compositescore=0 circleOfTrustscore=267.601810951001 compositescore=0.533125042697551 urlsuspect_oldscore=0.533125042697551 suspectscore=0 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=1996008 rbsscore=0.533125042697551 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1406190251 X-FB-Internal: deliver Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 06/19/2014 04:10 PM, Chris Mason wrote: > On 06/19/2014 01:52 PM, Waiman Long wrote: >> On 06/19/2014 12:51 PM, Chris Mason wrote: >>> On 06/18/2014 11:21 PM, Waiman Long wrote: >>>> On 06/18/2014 10:11 PM, Chris Mason wrote: >>>>> On 06/18/2014 10:03 PM, Marc Dionne wrote: >>>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc >>>>>> Dionne wrote: >>>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long >>>>>>> wrote: >>>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote: >>>>>> And for an additional data point, just removing those >>>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent >>>>>> the symptoms when lockdep is not enabled. >>>>> Ok, somehow we've added a lock inversion here that wasn't here before. >>>>> Thanks for confirming, I'll nail it down. >>>>> >>>>> -chris >>>>> >>>> I am pretty sure that the hangup is caused by the following kind of code >>>> fragment in the locking.c file: >>>> >>>> if (eb->lock_nested) { >>>> read_lock(&eb->lock); >>>> if (eb->lock_nested&& current->pid == >>>> eb->lock_owner) { >>>> >>>> Is it possible to do the check without taking the read_lock? >>> I think you're right, we haven't added any new recursive takers of the >>> lock. The path where we are deadlocking has an extent buffer that isn't >>> in the path yet locked. I think we're taking the read lock while that >>> one is write locked. >>> >>> Reworking the nesting a big here. >>> >>> -chris >> >> I would like to take back my comments. I took out the read_lock, but the >> process still hang while doing file activities on btrfs filesystem. So >> the problem is trickier than I thought. Below are the stack backtraces >> of some of the relevant processes. >> > > You weren't wrong, but it was also the tree trylock code. Our trylocks > only back off if the blocking lock is held. btrfs_next_leaf needs it to > be a true trylock. The confusing part is this hasn't really changed, > but one of the callers must be a spinner where we used to have a blocker. This is what I have queued up, it's working here. -chris commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e Author: Chris Mason Date: Thu Jun 19 14:16:52 2014 -0700 Btrfs: fix deadlocks with trylock on tree nodes The Btrfs tree trylock function is poorly named. It always takes the spinlock and backs off if the blocking lock is held. This can lead to surprising lockups because people expect it to really be a trylock. This commit makes it a pure trylock, both for the spinlock and the blocking lock. It also reworks the nested lock handling slightly to avoid taking the read lock while a spinning write lock might be held. Signed-off-by: Chris Mason --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c index 01277b8..5665d21 100644 --- a/fs/btrfs/locking.c +++ b/fs/btrfs/locking.c @@ -33,14 +33,14 @@ static void btrfs_assert_tree_read_locked(struct extent_buffer *eb); */ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); - } + /* + * no lock is required. The lock owner may change if + * we have a read lock, but it won't change to or away + * from us. If we have the write lock, we are the owner + * and it'll never change. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) + return; if (rw == BTRFS_WRITE_LOCK) { if (atomic_read(&eb->blocking_writers) == 0) { WARN_ON(atomic_read(&eb->spinning_writers) != 1); @@ -65,14 +65,15 @@ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw) */ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); - } + /* + * no lock is required. The lock owner may change if + * we have a read lock, but it won't change to or away + * from us. If we have the write lock, we are the owner + * and it'll never change. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) + return; + if (rw == BTRFS_WRITE_LOCK_BLOCKING) { BUG_ON(atomic_read(&eb->blocking_writers) != 1); write_lock(&eb->lock); @@ -99,6 +100,9 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw) void btrfs_tree_read_lock(struct extent_buffer *eb) { again: + BUG_ON(!atomic_read(&eb->blocking_writers) && + current->pid == eb->lock_owner); + read_lock(&eb->lock); if (atomic_read(&eb->blocking_writers) && current->pid == eb->lock_owner) { @@ -132,7 +136,9 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb) if (atomic_read(&eb->blocking_writers)) return 0; - read_lock(&eb->lock); + if (!read_trylock(&eb->lock)) + return 0; + if (atomic_read(&eb->blocking_writers)) { read_unlock(&eb->lock); return 0; @@ -151,7 +157,10 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb) if (atomic_read(&eb->blocking_writers) || atomic_read(&eb->blocking_readers)) return 0; - write_lock(&eb->lock); + + if (!write_trylock(&eb->lock)) + return 0; + if (atomic_read(&eb->blocking_writers) || atomic_read(&eb->blocking_readers)) { write_unlock(&eb->lock); @@ -168,14 +177,15 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb) */ void btrfs_tree_read_unlock(struct extent_buffer *eb) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - eb->lock_nested = 0; - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); + /* + * if we're nested, we have the write lock. No new locking + * is needed as long as we are the lock owner. + * The write unlock will do a barrier for us, and the lock_nested + * field only matters to the lock owner. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) { + eb->lock_nested = 0; + return; } btrfs_assert_tree_read_locked(eb); WARN_ON(atomic_read(&eb->spinning_readers) == 0); @@ -189,14 +199,15 @@ void btrfs_tree_read_unlock(struct extent_buffer *eb) */ void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - eb->lock_nested = 0; - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); + /* + * if we're nested, we have the write lock. No new locking + * is needed as long as we are the lock owner. + * The write unlock will do a barrier for us, and the lock_nested + * field only matters to the lock owner. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) { + eb->lock_nested = 0; + return; } btrfs_assert_tree_read_locked(eb); WARN_ON(atomic_read(&eb->blocking_readers) == 0); @@ -244,6 +255,7 @@ void btrfs_tree_unlock(struct extent_buffer *eb) BUG_ON(blockers > 1); btrfs_assert_tree_locked(eb); + eb->lock_owner = 0; atomic_dec(&eb->write_locks); if (blockers) {