From patchwork Mon Nov 3 13:56:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 5216811 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 39EC99F295 for ; Mon, 3 Nov 2014 13:57:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 424B320142 for ; Mon, 3 Nov 2014 13:57:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 11C232013A for ; Mon, 3 Nov 2014 13:56:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751437AbaKCN4z (ORCPT ); Mon, 3 Nov 2014 08:56:55 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:27859 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751130AbaKCN4y (ORCPT ); Mon, 3 Nov 2014 08:56:54 -0500 Received: from pps.filterd (m0004347 [127.0.0.1]) by m0004347.ppops.net (8.14.5/8.14.5) with SMTP id sA3Dq3dp005857 for ; Mon, 3 Nov 2014 05:56:53 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : mime-version : content-type; s=facebook; bh=je7C2X329h/Ask3vDk0IokotvvfHzQATDntyxi7OzSo=; b=W4b5n/SHkNcGIdDkDso3T8TnGWHuCVdm2XvJh+Zf9GR9DyAVbfYlr/3jPRkmChS7JQQm fe3unOVeCH5o2U0q4e1yPs1REGY9qOXD/+NPC0v+wPoqNrXwmtM0PG/Ue75EKPEMCUyC nKVAEmh5z4nzR+Lcs7xKYeW4YaRbILb79cw= Received: from mail.thefacebook.com ([199.201.64.23]) by m0004347.ppops.net with ESMTP id 1qe19ws3tn-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK) for ; Mon, 03 Nov 2014 05:56:53 -0800 Received: from localhost (192.168.57.29) by mail.thefacebook.com (192.168.16.23) with Microsoft SMTP Server (TLS) id 14.3.195.1; Mon, 3 Nov 2014 05:56:52 -0800 From: Josef Bacik To: Subject: [PATCH] Btrfs: don't take the chunk_mutex/dev_list mutex in statfs V2 Date: Mon, 3 Nov 2014 08:56:50 -0500 Message-ID: <1415023010-12988-1-git-send-email-jbacik@fb.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 X-Originating-IP: [192.168.57.29] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52, 1.0.28, 0.0.0000 definitions=2014-11-03_02:2014-11-03, 2014-11-03, 1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 kscore.is_bulkscore=4.9960036108132e-16 kscore.compositescore=0 circleOfTrustscore=514.84 compositescore=0.996321851895651 urlsuspect_oldscore=0.996321851895651 suspectscore=3 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=64355 rbsscore=0.996321851895651 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1411030127 X-FB-Internal: deliver Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Our gluster boxes get several thousand statfs() calls per second, which begins to suck hardcore with all of the lock contention on the chunk mutex and dev list mutex. We don't really need to hold these things, if we have transient weirdness with statfs() because of the chunk allocator we don't care, so remove this locking. We still need the dev_list lock if you mount with -o alloc_start however, which is a good argument for nuking that thing from orbit, but that's a patch for another day. Thanks, Signed-off-by: Josef Bacik --- V1->V2: make sure ->alloc_start is set before doing the dev extent lookup logic. fs/btrfs/super.c | 72 ++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 47 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 54bd91e..dc337d1 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1644,8 +1644,20 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) int i = 0, nr_devices; int ret; + /* + * We aren't under the device list lock, so this is racey-ish, but good + * enough for our purposes. + */ nr_devices = fs_info->fs_devices->open_devices; - BUG_ON(!nr_devices); + if (!nr_devices) { + smp_mb(); + nr_devices = fs_info->fs_devices->open_devices; + ASSERT(nr_devices); + if (!nr_devices) { + *free_bytes = 0; + return 0; + } + } devices_info = kmalloc_array(nr_devices, sizeof(*devices_info), GFP_NOFS); @@ -1670,11 +1682,17 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) else min_stripe_size = BTRFS_STRIPE_LEN; - list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (fs_info->alloc_start) + mutex_lock(&fs_devices->device_list_mutex); + rcu_read_lock(); + list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { if (!device->in_fs_metadata || !device->bdev || device->is_tgtdev_for_dev_replace) continue; + if (i >= nr_devices) + break; + avail_space = device->total_bytes - device->bytes_used; /* align with stripe_len */ @@ -1689,24 +1707,32 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) skip_space = 1024 * 1024; /* user can set the offset in fs_info->alloc_start. */ - if (fs_info->alloc_start + BTRFS_STRIPE_LEN <= - device->total_bytes) + if (fs_info->alloc_start && + fs_info->alloc_start + BTRFS_STRIPE_LEN <= + device->total_bytes) { + rcu_read_unlock(); skip_space = max(fs_info->alloc_start, skip_space); - /* - * btrfs can not use the free space in [0, skip_space - 1], - * we must subtract it from the total. In order to implement - * it, we account the used space in this range first. - */ - ret = btrfs_account_dev_extents_size(device, 0, skip_space - 1, - &used_space); - if (ret) { - kfree(devices_info); - return ret; - } + /* + * btrfs can not use the free space in + * [0, skip_space - 1], we must subtract it from the + * total. In order to implement it, we account the used + * space in this range first. + */ + ret = btrfs_account_dev_extents_size(device, 0, + skip_space - 1, + &used_space); + if (ret) { + kfree(devices_info); + mutex_unlock(&fs_devices->device_list_mutex); + return ret; + } - /* calc the free space in [0, skip_space - 1] */ - skip_space -= used_space; + rcu_read_lock(); + + /* calc the free space in [0, skip_space - 1] */ + skip_space -= used_space; + } /* * we can use the free space in [0, skip_space - 1], subtract @@ -1725,6 +1751,9 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) i++; } + rcu_read_unlock(); + if (fs_info->alloc_start) + mutex_unlock(&fs_devices->device_list_mutex); nr_devices = i; @@ -1787,8 +1816,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) * holding chunk_muext to avoid allocating new chunks, holding * device_list_mutex to avoid the device being removed */ - mutex_lock(&fs_info->fs_devices->device_list_mutex); - mutex_lock(&fs_info->chunk_mutex); rcu_read_lock(); list_for_each_entry_rcu(found, head, list) { if (found->flags & BTRFS_BLOCK_GROUP_DATA) { @@ -1826,15 +1853,10 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_bavail = total_free_data; ret = btrfs_calc_avail_data_space(fs_info->tree_root, &total_free_data); - if (ret) { - mutex_unlock(&fs_info->chunk_mutex); - mutex_unlock(&fs_info->fs_devices->device_list_mutex); + if (ret) return ret; - } buf->f_bavail += div_u64(total_free_data, factor); buf->f_bavail = buf->f_bavail >> bits; - mutex_unlock(&fs_info->chunk_mutex); - mutex_unlock(&fs_info->fs_devices->device_list_mutex); buf->f_type = BTRFS_SUPER_MAGIC; buf->f_bsize = dentry->d_sb->s_blocksize;