From patchwork Wed Apr 3 19:38:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13616612 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D71014885B for ; Wed, 3 Apr 2024 19:37:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173032; cv=none; b=RgK4/RAvCqWjotgvXXNx1spq7CGqHGdZHj3IdVAHLW0TOGsFm3Wfu0bR1vrybcFyd3Bf+ClViW4ezAn3o4yXT3JVuDWP9Ir3NkY4Ouq3a9rJQJBJ8FX/4u9jW7d9Eojm0hzfojwL723M3l5u45kGh3qyA+jbKg2HfqmbmhBGoKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173032; c=relaxed/simple; bh=oD0RFZTfbfGcxlIBSBqIOWI4Y+l13QV4xGid1BrLTMc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EVsK0KQ/BtZnKIcqJTLy5hwUwt9IM6/vsB2/vqxoOyTQ+Rw04InibbvA8IzC+GCYo6LvllYbVxXv5o984Kb0YjimPEe76P+xmjyg2wvgjd9TX9CEaCDaUFQkdK7M0dhD2sXjTl/llk9h1A2SBYXyp5EpFzfWcqpABlbH3lduzeA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=kdKRJ6Vu; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=XQH0FjNZ; arc=none smtp.client-ip=64.147.123.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="kdKRJ6Vu"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="XQH0FjNZ" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 664DC3200A59; Wed, 3 Apr 2024 15:37:09 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 03 Apr 2024 15:37:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1712173028; x= 1712259428; bh=agyvQiwfEEzHG1IgP3HCGQnRFlRAILl6xaGkONpPqlM=; b=k dKRJ6VuBhEei14J6My+iIX9xpz0lvswuDyHVDuN3miDQSGZVVQpC1J93dr8yOLEy jDtiYMPAgWTFezRUm+VFiLL9bNEksa4BO4P2zPjSahdY6AqXCk8sqN1ocSMqel2H BowcPamm+Gzu1Z61C7wDsekaB8swSNAEOPH9h5xfQI9V8T802v27s87XoU2ln1iV mgJRmieg/vCBwapMpRkT90ONC35Lk/PhmJs0IMjHnddhTnuLKDLb9hyrhnmmbBAv p73NMJKIgvP8yW2lTgSRqcRBSQ24srJYe5ePT3fJK7HbpeAJQu8fkPo/JUG7I5gI EG36XXydZQxf1SWI9kuww== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1712173028; x=1712259428; bh=agyvQiwfEEzHG 1IgP3HCGQnRFlRAILl6xaGkONpPqlM=; b=XQH0FjNZevWvQj3eTNlIU+kQzqNmh PhkZU1ZsbJJueZO1Er06p+TXrdAp0OpaWd56GSXWrBcbKe1gdjwOCrxDMHSIYQAG XvMuiPtn2mhSmnG9MaiXJ9OYe67rT1WFXyGNJa8a19+z8GNE/Yl5w5jXEjavrMuA GWuW1yVi9Qe1gDnPCoqpDBqm+8e+IgJs3ZBQQIqsoJyu4q/lKYK98okvr/m/EPPD q8EU3hDKmLME6E89qQQwyVfgkItZJuDr1sCeP0EkWSUhZVDWalZMVhzROolvdtX+ oEhd/Pua22Hs7I/x6/m4FTwJJZG4yYl6DguI40erFlAYzxjnZMXLAws3g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefiedgleeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Apr 2024 15:37:08 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/6] btrfs: report reclaim stats in sysfs Date: Wed, 3 Apr 2024 12:38:47 -0700 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When evaluating various reclaim strategies/thresholds against each other, it is useful to collect data about the amount of reclaim happening. Expose a count and byte count via sysfs per space_info. Signed-off-by: Boris Burkov Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 10 ++++++++++ fs/btrfs/space-info.h | 12 ++++++++++++ fs/btrfs/sysfs.c | 4 ++++ 3 files changed, 26 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1e09aeea69c2..fd10e3b3f4f2 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1821,6 +1821,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) list_sort(NULL, &fs_info->reclaim_bgs, reclaim_bgs_cmp); while (!list_empty(&fs_info->reclaim_bgs)) { u64 zone_unusable; + u64 reclaimed; int ret = 0; bg = list_first_entry(&fs_info->reclaim_bgs, @@ -1913,11 +1914,20 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) div64_u64(bg->used * 100, bg->length), div64_u64(zone_unusable * 100, bg->length)); trace_btrfs_reclaim_block_group(bg); + reclaimed = bg->used; ret = btrfs_relocate_chunk(fs_info, bg->start); if (ret) { btrfs_dec_block_group_ro(bg); btrfs_err(fs_info, "error relocating chunk %llu", bg->start); + spin_lock(&space_info->lock); + space_info->reclaim_count++; + spin_unlock(&space_info->lock); + } else { + spin_lock(&space_info->lock); + space_info->reclaim_count++; + space_info->reclaim_bytes += reclaimed; + spin_unlock(&space_info->lock); } next: diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index a733458fd13b..b42db020eba6 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -165,6 +165,18 @@ struct btrfs_space_info { struct kobject kobj; struct kobject *block_group_kobjs[BTRFS_NR_RAID_TYPES]; + + /* + * Monotonically increasing counter of block group reclaim attempts + * Exposed in /sys/fs//allocation//reclaim_count + */ + u64 reclaim_count; + + /* + * Monotonically increasing counter of reclaimed bytes + * Exposed in /sys/fs//allocation//reclaim_bytes + */ + u64 reclaim_bytes; }; struct reserve_ticket { diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index c6387a8ddb94..0f3675c0f64f 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -894,6 +894,8 @@ SPACE_INFO_ATTR(bytes_readonly); SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); +SPACE_INFO_ATTR(reclaim_count); +SPACE_INFO_ATTR(reclaim_bytes); BTRFS_ATTR_RW(space_info, chunk_size, btrfs_chunk_size_show, btrfs_chunk_size_store); BTRFS_ATTR(space_info, size_classes, btrfs_size_classes_show); @@ -949,6 +951,8 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold), BTRFS_ATTR_PTR(space_info, chunk_size), BTRFS_ATTR_PTR(space_info, size_classes), + BTRFS_ATTR_PTR(space_info, reclaim_count), + BTRFS_ATTR_PTR(space_info, reclaim_bytes), #ifdef CONFIG_BTRFS_DEBUG BTRFS_ATTR_PTR(space_info, force_chunk_alloc), #endif From patchwork Wed Apr 3 19:38:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13616613 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D775A14885B for ; Wed, 3 Apr 2024 19:37:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173035; cv=none; b=KIVcyOIihHu54IBLXhe43JEaFG2FXFR2VP1nsOoolMnP/oe2h54K9qkbo5wUQ/HPO0LNC/kpSdWjBvct5swZOrDkTWk4mze02OabsGOJ+PE8yaYY6EKdBnoVtnPNwo1lQWciwKuUdKimruG6Yo6pfYLWdPQV54Y6bKFMoM3iHfo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173035; c=relaxed/simple; bh=Tdgu2K+kWW8n6CKKhFmBv0XTQ4+dWVYb6NRXPSFB2Bc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=a2p7aJ9Wu+GS6Shr/Y93KSifm+21btUVaq9pJW9pC0GTcL0SukyPZBYvYk7iyokdGZpwoS39DZdRzaZSYd33PwfcAv82TZuqu5jvTRuJh6PAjjnSmwa3u1GE/TMBNTKBELmRgCR+FseDkOolxWxBMniRghjcGH3GkmjXWnYVJwE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=DZdg0plV; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=h+9c/xNh; arc=none smtp.client-ip=64.147.123.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="DZdg0plV"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="h+9c/xNh" Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id D5D0B3200A57; Wed, 3 Apr 2024 15:37:12 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 03 Apr 2024 15:37:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1712173032; x= 1712259432; bh=VoRznIRtCpssyfvsOc/SlWCzm5VGTTOwqSkHFs7wnvU=; b=D Zdg0plVVh/GBUCfDiga8vvbHITJm8Xar3hDJjTwf//f01L1Bv9DUYdZZAWfd09Ff p8k+fs1xefUtmpl+yBfiqsxnbiA45k4lNgxgYIMYOGONGSkOwbOgcZkC8KE0ZR+v f8NUvic8+g0YBnSRgjE4Mh6GraxpgkEtqFbt6C8DtLbo+PniuxybGpYNgq2CT9Zx F/HIv8iBd4DHf7EQ7vk5LPXOjFj0DmNNrbp3zNwaf3LlHjA/4YOLQQ9BAx/DsaLI u2OyFgf1t7mK1NTwQn8kzVfGxxxE91DMcJePLgNkX45eg52ZaO8W7hn1j7Rjk26d jBliBSuS/Z/w1MFmMCNKg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1712173032; x=1712259432; bh=VoRznIRtCpssy fvsOc/SlWCzm5VGTTOwqSkHFs7wnvU=; b=h+9c/xNh8qkBqWggBoDMWhrAXaJY0 k3oSlRpREBSs8xrLkyR5yia1bgQDbpLGXIYi0Vy+qFjcIus1jGc29r77L6+pu5Sz FunIH0h0okYMLxgVhbNB2yBDz0MilJJXVS1EPeJ7JU7+x7ljegjuLTv7AEbHL2yG uDGoaJ/yVvR+xNgciUiyiyuHTnweRbGaBRNXKKOYIKUSlN2lMIxCw46i7OZO+d0f WgsTOZ7WsOLB3us0jachxorDou9gqIU0tqE0dUkEMDJIYQyvFXVZfoYUwaGia7XY 9DXh9hrdVPw0V3fL6wgrP+8I4J+aZv+HofSoqOoxF33hWtMJ2zefJ5q3Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefiedgleeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Apr 2024 15:37:11 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/6] btrfs: store fs_info on space_info Date: Wed, 3 Apr 2024 12:38:48 -0700 Message-ID: <6f56853ca8437b8dd7d343adff2982ad9b099543.1712168477.git.boris@bur.io> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is handy when computing space_info dynamic reclaim thresholds where we do not have access to a block group. We could add it to the various functions as a parameter, but it seems reasonable for space_info to have an fs_info pointer. Signed-off-by: Boris Burkov Reviewed-by: Johannes Thumshirn --- fs/btrfs/space-info.c | 1 + fs/btrfs/space-info.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index d620323d08ea..d20a27f293e9 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -232,6 +232,7 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags) if (!space_info) return -ENOMEM; + space_info->fs_info = info; for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) INIT_LIST_HEAD(&space_info->block_groups[i]); init_rwsem(&space_info->groups_sem); diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index b42db020eba6..2eb7f74070b2 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -94,6 +94,7 @@ enum btrfs_flush_state { }; struct btrfs_space_info { + struct btrfs_fs_info *fs_info; spinlock_t lock; u64 total_bytes; /* total bytes in the space, From patchwork Wed Apr 3 19:38:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13616614 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 920A21553AE for ; Wed, 3 Apr 2024 19:37:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173038; cv=none; b=jvip5JNpCnjlXEkxokYGM0o5k6ahoBwC7HNnq+s1nY1o/il3sinbiz3o+R03xGz+cb5y9gNj4BC2opWQELTeMnoi4S3eHqhw4FMD2QbGrhV3rzv3x2cFnUZhK48pXjw8MWRVUsGmvxegZz6OaKFa9AFl41sp3CAzLKbOkeW28ds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173038; c=relaxed/simple; bh=cakhPD0cMeCpVNDuKjOE2W/BDTzNd1sznRt/0bREEk4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WyBpa5YGUMJPfeM+yS8tKMtf047XSO0WZ4z+Qmp58+zdhM5g2F2kx83dfdyApczdLbACfT/ug+0IonxSPvpss75OGn3FkkU4DEHDCH9RgnUKrMIcAVL3u9H7bUq/tPBqSR3hcyesvZvT6IAoFvQNi4Hh3Agvn9mW0xJzZgKWqw0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=ACwK023+; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=GrMYXez8; arc=none smtp.client-ip=64.147.123.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="ACwK023+"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="GrMYXez8" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 8DD013200A4D; Wed, 3 Apr 2024 15:37:15 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 03 Apr 2024 15:37:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1712173035; x= 1712259435; bh=oZ3KRvxL9CY8K366WVRRZxpo31O76xHqkxJINf386Uw=; b=A CwK023+f21zc6wLnyNMmTveYVZ+fmY/NCB1O8P+x0IATwfacQCmRqBAJapAvxQFz yD2RgWywAWXudEuXKPszEIxRB97h7eb5ZiIn8D49FbHpH4CuhRPEHm4BDuZg+BFq qF/mvUlcuCc6LXP/4PcKCJXwEkGT5vBdWhbKJadD/C9tITKHg2GXKL9bEEFixh1J 54cQWd0hXB7bCTYGyny02maOv34XHuA3bcqK9hU+SuzD2GKd4xl+rBElX4MZk+DM 2DHrJnhJuNBzCgr63zEQIRmu+HJrFjRQ/6pEaWB0K1iQ6WGvQD8PiVKgYPEzYi3C yf98xy0S2osVTKAjl5gZg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1712173035; x=1712259435; bh=oZ3KRvxL9CY8K 366WVRRZxpo31O76xHqkxJINf386Uw=; b=GrMYXez8wXqhLQzg6bWokN12MJ7gy PGBGYFS7trWNpr098DuCrkd2tQcSMzSkFM115gMDP89yUR7PPh3to2PkeGnTkvM1 la1LH0l4/pjNCL/b2xQ+S6//3dAyMzEjHOpBXRH7pV+iw443FVSXFVmdDxEWDgEB y60alkGMVzMyB77LGOKUNwjsZoCIdS+1KnbFLF06I4E5T3+o5mCjtQpNbeYjapGQ TUWY2MEhq+M7NuAb8JQFGU2lwFVvlDsx3i+90NKq0zjbp+PH1Mxu18YthOIlK3fq ZnVRew0O61E1I7WTa0FdiVas+Z7hIvTSI2C0qx91DRkPE7qgrL9B1dSqw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefiedgleeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeelgfeikefhgfekkeeltdelffehhedvhfevjeetvd ehtdeljefhfeektdeggfelheenucffohhmrghinhepsghurhdrihhonecuvehluhhsthgv rhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsghorhhishessghurhdrih ho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Apr 2024 15:37:14 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/6] btrfs: dynamic block_group reclaim threshold Date: Wed, 3 Apr 2024 12:38:49 -0700 Message-ID: <198348dc7188360fe33083ce06421b1e7f7874d9.1712168477.git.boris@bur.io> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We can currently recover allocated block_groups by: - explicitly starting balance operations - "auto reclaim" via bg_reclaim_threshold The latter works by checking against a fixed threshold on frees. If we pass from above the threshold to below, relocation triggers and the block group will get reclaimed by the cleaner thread (assuming it is still eligible) Picking a threshold is challenging. Too high, and you end up trying to reclaim very full block_groups which is quite costly, and you don't do reclaim on block_groups that don't get quite THAT full, but could still be quite fragmented and stranding a lot of space. Too low, and you similarly miss out on reclaim even if you badly need it to avoid running out of unallocated space, if you have heavily fragmented block groups living above the threshold. No matter the threshold, it suffers from a workload that happens to bounce around that threshold, which can introduce arbitrary amounts of reclaim waste. To improve this situation, introduce a dynamic threshold. The basic idea behind this threshold is that it should be very lax when there is plenty of unallocated space, and increasingly aggressive as we approach zero unallocated space. To that end, it sets a target for unallocated space (10 chunks) and then linearly increases the threshold as the amount of space short of the target we are increases. The formula is: (target - unalloc) / target I tested this by running it on three interesting workloads: 1. bounce allocations around X% full. 2. fill up all the way and introduce full fragmentation. 3. write in a fragmented way until the filesystem is just about full. 1. and 2. attack the weaknesses of a fixed threshold; fixed either works perfectly or fully falls apart, depending on the threshold. Dynamic always handles these cases well. 3. attacks dynamic by checking whether it is too zealous to reclaim in conditions with low unallocated and low unused. It tends to claw back 1GiB of unallocated fairly aggressively, but not much more. Early versions of dynamic threshold struggled on this test. graphs of the data and results from those experiments can be found at: https://bur.io/dyn-rec/ I did not put in extra effort to "fully" exploit 1. and 2. to make fixed threshold look REALLY bad, vs merely worse, but it would not be hard to do. Additional work could be done to intelligently ratchet up the urgency of reclaim in very low unallocated conditions or to make the allocator try to break up allocations rather than use critical last block groups. Some of this space is currently being explored by Johannes for zoned filesystems. That is mostly independent of the choice of reclaim strategy. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 18 ++++--- fs/btrfs/space-info.c | 115 +++++++++++++++++++++++++++++++++++++---- fs/btrfs/space-info.h | 8 +++ fs/btrfs/sysfs.c | 43 ++++++++++++++- 4 files changed, 164 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index fd10e3b3f4f2..d6f8364ac598 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1757,24 +1757,21 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info) static bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed) { - const struct btrfs_space_info *space_info = bg->space_info; - const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold); + const int thresh_pct = btrfs_calc_reclaim_threshold(bg->space_info); + u64 thresh_bytes = mult_perc(bg->length, thresh_pct); const u64 new_val = bg->used; const u64 old_val = new_val + bytes_freed; - u64 thresh; - if (reclaim_thresh == 0) + if (thresh_bytes == 0) return false; - thresh = mult_perc(bg->length, reclaim_thresh); - /* * If we were below the threshold before don't reclaim, we are likely a * brand new block group and we don't want to relocate new block groups. */ - if (old_val < thresh) + if (old_val < thresh_bytes) return false; - if (new_val >= thresh) + if (new_val >= thresh_bytes) return false; return true; } @@ -1835,6 +1832,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) /* Don't race with allocators so take the groups_sem */ down_write(&space_info->groups_sem); + spin_lock(&space_info->lock); spin_lock(&bg->lock); if (bg->reserved || bg->pinned || bg->ro) { /* @@ -1844,6 +1842,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) * this block group. */ spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; } @@ -1862,6 +1861,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) if (!btrfs_test_opt(fs_info, DISCARD_ASYNC)) btrfs_mark_bg_unused(bg); spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; @@ -1878,10 +1878,12 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) */ if (!should_reclaim_block_group(bg, bg->length)) { spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; } spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); /* * Get out fast, in case we're read-only or unmounting the diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index d20a27f293e9..90e472a49784 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include "misc.h" #include "ctree.h" #include "space-info.h" @@ -190,6 +191,8 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info) */ #define BTRFS_DEFAULT_ZONED_RECLAIM_THRESH (75) +#define BTRFS_UNALLOC_BLOCK_GROUP_TARGET (10ULL) + /* * Calculate chunk size depending on volume type (regular or zoned). */ @@ -341,11 +344,27 @@ struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, return NULL; } +static u64 calc_effective_data_chunk_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_space_info *data_sinfo; + u64 data_chunk_size; + /* + * Calculate the data_chunk_size, space_info->chunk_size is the + * "optimal" chunk size based on the fs size. However when we actually + * allocate the chunk we will strip this down further, making it no more + * than 10% of the disk or 1G, whichever is smaller. + */ + data_sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA); + data_chunk_size = min(data_sinfo->chunk_size, + mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); + return min_t(u64, data_chunk_size, SZ_1G); + +} + static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, enum btrfs_reserve_flush_enum flush) { - struct btrfs_space_info *data_sinfo; u64 profile; u64 avail; u64 data_chunk_size; @@ -369,16 +388,7 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, if (avail == 0) return 0; - /* - * Calculate the data_chunk_size, space_info->chunk_size is the - * "optimal" chunk size based on the fs size. However when we actually - * allocate the chunk we will strip this down further, making it no more - * than 10% of the disk or 1G, whichever is smaller. - */ - data_sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA); - data_chunk_size = min(data_sinfo->chunk_size, - mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); - data_chunk_size = min_t(u64, data_chunk_size, SZ_1G); + data_chunk_size = calc_effective_data_chunk_size(fs_info); /* * Since data allocations immediately use block groups as part of the @@ -1869,3 +1879,86 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) return free_bytes; } + +static u64 calc_pct_ratio(u64 x, u64 y) +{ + int err; + + if (!y) + return 0; +again: + err = check_mul_overflow(100, x, &x); + if (err) + goto lose_precision; + return div64_u64(x, y); +lose_precision: + x >>= 10; + y >>= 10; + if (!y) + y = 1; + goto again; +} + +/* + * A reasonable buffer for unallocated space is 10 data block_groups. + * If we claw this back repeatedly, we can still achieve efficient + * utilization when near full, and not do too much reclaim while + * always maintaining a solid buffer for workloads that quickly + * allocate and pressure the unallocated space. + */ +static u64 calc_unalloc_target(struct btrfs_fs_info *fs_info) +{ + return BTRFS_UNALLOC_BLOCK_GROUP_TARGET * calc_effective_data_chunk_size(fs_info); +} + +/* + * The fundamental goal of automatic reclaim is to protect the filesystem's + * unallocated space and thus minimize the probability of the filesystem going + * read only when a metadata allocation failure causes a transaction abort. + * + * However, relocations happen into the space_info's unused space, therefore + * automatic reclaim must also back off as that space runs low. There is no + * value in doing trivial "relocations" of re-writing the same block group + * into a fresh one. + * + * Furthermore, we want to avoid doing too much reclaim even if there are good + * candidates. This is because the allocator is pretty good at filling up the + * holes with writes. So we want to do just enough reclaim to try and stay + * safe from running out of unallocated space but not be wasteful about it. + * + * Therefore, the dynamic reclaim threshold is calculated as follows: + * - calculate a target unallocated amount of 5 block group sized chunks + * - ratchet up the intensity of reclaim depending on how far we are from + * that target by using a formula of unalloc / target to set the threshold. + * + * Typically with 10 block groups as the target, the discrete values this comes + * out to are 0, 10, 20, ... , 80, 90, and 99. + */ +static int calc_dynamic_reclaim_threshold(struct btrfs_space_info *space_info) +{ + struct btrfs_fs_info *fs_info = space_info->fs_info; + u64 unalloc = atomic64_read(&fs_info->free_chunk_space); + u64 target = calc_unalloc_target(fs_info); + u64 alloc = space_info->total_bytes; + u64 used = btrfs_space_info_used(space_info, false); + u64 unused = alloc - used; + u64 want = target > unalloc ? target - unalloc : 0; + u64 data_chunk_size = calc_effective_data_chunk_size(fs_info); + /* Cast to int is OK because want <= target */ + int ratio = calc_pct_ratio(want, target); + + /* If we have no unused space, don't bother, it won't work anyway */ + if (unused < data_chunk_size) + return 0; + + return ratio; +} + +int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) +{ + lockdep_assert_held(&space_info->lock); + + if (READ_ONCE(space_info->dynamic_reclaim)) + return calc_dynamic_reclaim_threshold(space_info); + return READ_ONCE(space_info->bg_reclaim_threshold); +} diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 2eb7f74070b2..6879c68a0e63 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -178,6 +178,12 @@ struct btrfs_space_info { * Exposed in /sys/fs//allocation//reclaim_bytes */ u64 reclaim_bytes; + + /* + * If true, use the dynamic relocation threshold, instead of the + * fixed bg_reclaim_threshold. + */ + bool dynamic_reclaim; }; struct reserve_ticket { @@ -260,4 +266,6 @@ void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info); void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); +int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info); + #endif /* BTRFS_SPACE_INFO_H */ diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 0f3675c0f64f..4bce08ea08ab 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -904,8 +904,12 @@ static ssize_t btrfs_sinfo_bg_reclaim_threshold_show(struct kobject *kobj, char *buf) { struct btrfs_space_info *space_info = to_space_info(kobj); + ssize_t ret; - return sysfs_emit(buf, "%d\n", READ_ONCE(space_info->bg_reclaim_threshold)); + spin_lock(&space_info->lock); + ret = sysfs_emit(buf, "%d\n", btrfs_calc_reclaim_threshold(space_info)); + spin_unlock(&space_info->lock); + return ret; } static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj, @@ -916,6 +920,9 @@ static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj, int thresh; int ret; + if (READ_ONCE(space_info->dynamic_reclaim)) + return -EINVAL; + ret = kstrtoint(buf, 10, &thresh); if (ret) return ret; @@ -932,6 +939,39 @@ BTRFS_ATTR_RW(space_info, bg_reclaim_threshold, btrfs_sinfo_bg_reclaim_threshold_show, btrfs_sinfo_bg_reclaim_threshold_store); +static ssize_t btrfs_sinfo_dynamic_reclaim_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + + return sysfs_emit(buf, "%d\n", READ_ONCE(space_info->dynamic_reclaim)); +} + +static ssize_t btrfs_sinfo_dynamic_reclaim_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + int dynamic_reclaim; + int ret; + + ret = kstrtoint(buf, 10, &dynamic_reclaim); + if (ret) + return ret; + + if (dynamic_reclaim < 0) + return -EINVAL; + + WRITE_ONCE(space_info->dynamic_reclaim, dynamic_reclaim != 0); + + return len; +} + +BTRFS_ATTR_RW(space_info, dynamic_reclaim, + btrfs_sinfo_dynamic_reclaim_show, + btrfs_sinfo_dynamic_reclaim_store); + /* * Allocation information about block group types. * @@ -949,6 +989,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold), + BTRFS_ATTR_PTR(space_info, dynamic_reclaim), BTRFS_ATTR_PTR(space_info, chunk_size), BTRFS_ATTR_PTR(space_info, size_classes), BTRFS_ATTR_PTR(space_info, reclaim_count), From patchwork Wed Apr 3 19:38:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13616615 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B5FE155315 for ; Wed, 3 Apr 2024 19:37:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173041; cv=none; b=drgILTQpIlDJXVhaRegdV7ooZ4opi6UZC9ouXL2VjFGQpOTcA85kz0Hz+Kly12a/MFWlRvFvDu8udW9OSrIKyqzRyZntYm2/c1Qtew6W555nJD3T8fCYoknija5fmfUOxxVIhw3OXdAiqbPLXphSBlU6o84MtVO4royTkegi5oQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173041; c=relaxed/simple; bh=4Ey6836iivGRU1Gb5Nfuzl7x9MsB6tsX5ciSJxjm4TA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Sud7WljNCcJvioMl4IfUUVlSCNfbvooDtlbuPrh8VTlPoFFaQ0O9DPOyZYuzqHsoW5PJiiB3Ur3AOR8LC4FMYRa0s61wUqBPY7+gj4fW8wMQxuYiB4xpPHLP0dHB+KcYvFCk9lhjtjXUYqujUWR/6y7/5fQAtGvNQXk8zvaxPCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=cGZO9Zth; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=OboABfaQ; arc=none smtp.client-ip=64.147.123.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="cGZO9Zth"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="OboABfaQ" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 46EEB3200A57; Wed, 3 Apr 2024 15:37:18 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 03 Apr 2024 15:37:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1712173037; x= 1712259437; bh=cgbkBsw6NnaVznfNtwSdyiCYE5lRXssIJ5uAkkzktmo=; b=c GZO9ZthxxyctMjFkp8/Q/RsyjS7Esy0goFI0TP/KUVVBA/mrsdzRy3gLY+7eK0S8 nnZ/3nrOqLecSLE5Ihn4sLLUQUIh017hHbzpftS1UID/8wEXjVGCOsZP8Pr2AAOT V3y31H5z2hH4/bBHoWaSMaHMYd5gQLz3T4z3iyMqYG9j3L82lqt9GuFIYEozK2nd h4K6wCljO9pcJ5u/D8kRpCPBk7v9XoDKAP4DMBpfXqs5iDGEukAkyd0s69zJtMhp fu+bKbYS8jOLZRSHC1CR+RHRYoB7huhxh3VohrMl8P4m7IA5zj9e576B5jKXtMAi Od6Is+O/NS9CSUmZHjDkQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1712173037; x=1712259437; bh=cgbkBsw6NnaVz nfNtwSdyiCYE5lRXssIJ5uAkkzktmo=; b=OboABfaQ+TDVFLgPpYa8jrRoCz3lM LXe9k5r9dGZm/s2Di8swNqTZebyy8Bxw0CstMl/GqXCTdi54FHwJeMF+R7iZYdZ+ OjKyKEP7eKYA8WCeEO4Rt6gyJG9yvI5yxlzcdMGIXGZcKFDwEovxn0YeMzfvopqi cIzItIbHslM/b25Y1H2923hKxRswVp0IKJKQJ1HI1xvIwiP1SwByOE4wHZzfeHV+ 0sFLQhXUd+2plTmB4FgAwK8l4vsFR+eq19iUwfndSMJLeCdSaIYYXmFl+VV8G2HU j+omlT/Lmaby43CuyKmMZWljFofmmYFi0yslgyzMtcvg4h6zDDtG0Vidw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefiedgleeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Apr 2024 15:37:17 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/6] btrfs: periodic block_group reclaim Date: Wed, 3 Apr 2024 12:38:50 -0700 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We currently employ a edge-triggered block group reclaim strategy which marks block groups for reclaim as they free down past a threshold. With a dynamic threshold, this is worse than doing it in a level-triggered fashion periodically. That is because the reclaim itself happens periodically, so the threshold at that point in time is what really matters, not the threshold at freeing time. If we mark the reclaim in a big pass, then sort by usage and do reclaim, we also benefit from a negative feedback loop preventing unnecessary reclaims as we crunch through the "best" candidates. Since this is quite a different model, it requires some additional support. The edge triggered reclaim has a good heuristic for not reclaiming fresh block groups, so we need to replace that with a typical GC sweep mark which skips block groups that have seen an allocation since the last sweep. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/block-group.h | 1 + fs/btrfs/space-info.c | 51 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/space-info.h | 7 ++++++ fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++ 5 files changed, 95 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index d6f8364ac598..2a42edc3476b 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1960,6 +1960,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info) { + btrfs_reclaim_sweep(fs_info); spin_lock(&fs_info->unused_bgs_lock); if (!list_empty(&fs_info->reclaim_bgs)) queue_work(system_unbound_wq, &fs_info->reclaim_bgs_work); @@ -3658,6 +3659,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, old_val += num_bytes; cache->used = old_val; cache->reserved -= num_bytes; + cache->reclaim_mark = 0; space_info->bytes_reserved -= num_bytes; space_info->bytes_used += num_bytes; space_info->disk_used += num_bytes * factor; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 85e2d4cd12dc..8656b38f1fa5 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -263,6 +263,7 @@ struct btrfs_block_group { struct work_struct zone_finish_work; struct extent_buffer *last_eb; enum btrfs_block_group_size_class size_class; + u64 reclaim_mark; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 90e472a49784..422fb7d4b4e1 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1962,3 +1962,54 @@ int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) return calc_dynamic_reclaim_threshold(space_info); return READ_ONCE(space_info->bg_reclaim_threshold); } + +static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, int raid) +{ + struct btrfs_block_group *bg; + int thresh_pct; + + spin_lock(&space_info->lock); + thresh_pct = btrfs_calc_reclaim_threshold(space_info); + spin_unlock(&space_info->lock); + + down_read(&space_info->groups_sem); + list_for_each_entry(bg, &space_info->block_groups[raid], list) { + u64 thresh; + bool reclaim = false; + + btrfs_get_block_group(bg); + spin_lock(&bg->lock); + thresh = mult_perc(bg->length, thresh_pct); + if (bg->used < thresh && bg->reclaim_mark) + reclaim = true; + bg->reclaim_mark++; + spin_unlock(&bg->lock); + if (reclaim) + btrfs_mark_bg_to_reclaim(bg); + btrfs_put_block_group(bg); + } + up_read(&space_info->groups_sem); + return 0; +} + +int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info) +{ + int ret; + int raid; + struct btrfs_space_info *space_info; + + list_for_each_entry(space_info, &fs_info->space_info, list) { + if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) + continue; + if (!READ_ONCE(space_info->periodic_reclaim)) + continue; + for (raid = 0; raid < BTRFS_NR_RAID_TYPES; raid++) { + ret = do_reclaim_sweep(fs_info, space_info, raid); + if (ret) + return ret; + } + } + + return ret; +} diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 6879c68a0e63..6f1f530d9c3b 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -184,6 +184,12 @@ struct btrfs_space_info { * fixed bg_reclaim_threshold. */ bool dynamic_reclaim; + + /* + * Periodically check all block groups against the reclaim + * threshold in the cleaner thread. + */ + bool periodic_reclaim; }; struct reserve_ticket { @@ -267,5 +273,6 @@ void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info); +int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info); #endif /* BTRFS_SPACE_INFO_H */ diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4bce08ea08ab..7c2e31e40435 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -972,6 +972,39 @@ BTRFS_ATTR_RW(space_info, dynamic_reclaim, btrfs_sinfo_dynamic_reclaim_show, btrfs_sinfo_dynamic_reclaim_store); +static ssize_t btrfs_sinfo_periodic_reclaim_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + + return sysfs_emit(buf, "%d\n", READ_ONCE(space_info->periodic_reclaim)); +} + +static ssize_t btrfs_sinfo_periodic_reclaim_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + int periodic_reclaim; + int ret; + + ret = kstrtoint(buf, 10, &periodic_reclaim); + if (ret) + return ret; + + if (periodic_reclaim < 0) + return -EINVAL; + + WRITE_ONCE(space_info->periodic_reclaim, periodic_reclaim != 0); + + return len; +} + +BTRFS_ATTR_RW(space_info, periodic_reclaim, + btrfs_sinfo_periodic_reclaim_show, + btrfs_sinfo_periodic_reclaim_store); + /* * Allocation information about block group types. * @@ -994,6 +1027,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, size_classes), BTRFS_ATTR_PTR(space_info, reclaim_count), BTRFS_ATTR_PTR(space_info, reclaim_bytes), + BTRFS_ATTR_PTR(space_info, periodic_reclaim), #ifdef CONFIG_BTRFS_DEBUG BTRFS_ATTR_PTR(space_info, force_chunk_alloc), #endif From patchwork Wed Apr 3 19:38:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13616616 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E84AA1553BE for ; Wed, 3 Apr 2024 19:37:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173043; cv=none; b=b3s9bu/3THiT3VF+0hsjBrXLZkqDzfaPgIHBEol/wllkiveWnM6M+4VBz1SGD8uYmrMgwu6KeCBQ9UY/7/SWk0r6ao2g7uHUzyjMbi5LXxWiDVwQNgLjikmqTDjeG/0SberM0zYBW1q1TJ0FvTKBK0KcR0jjCU0cZqg1ezqWEao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173043; c=relaxed/simple; bh=vMlK68YznXOWlwJWSpCkjo18MiRpJ2pJMJz+PK369RQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KlXHAeFNx5OlAKU1Zk+QzTAts4IJmJmYgfSFUP3Duj9KLOBEX2Emw0ljdR8LNursb3N/MWqS6zesjq6VoqZ4GgqCO0AVP7uk8ZFs6k73e+wHlIABdEGK/+T1ZX2CEmhJCxUwhFk4R2ClSty9/Zzqj4n1K5UcnStXvL5L+YVXjyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=T7Zdd0fj; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=Ep5940Rp; arc=none smtp.client-ip=64.147.123.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="T7Zdd0fj"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Ep5940Rp" Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id E60253200A59; Wed, 3 Apr 2024 15:37:20 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 03 Apr 2024 15:37:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1712173040; x= 1712259440; bh=g5CuIlCpi99e0TX4hO70+lSp9Ds7op4KRzQwIEoADNI=; b=T 7Zdd0fjb9u+D1fCOtnYFDHFOQfkUEEMjHYN3Dcby9PsxE9SCi8aWQc+hW5VZydux TX1KZ7XEvpND6iI6d92VhwJw0p95iBERZquF1Q7jow0yC7MLRHZLZfChpWle3Pua bhn/ZKsP2cHWwgmOX/a4011S9NS3mzK71XzzkwbAGeV815SNnCFjcCFrlXEwzDV8 IV877ABF8qyziSEKNbE4f/NgaDqTVZvYRCUDFxCirHla8RJYz0jJiIy+3vkSTft2 icNFqVgF28fF0+1065uaWW9oe1bPYiGxEkmq2EcSJcpUryTMHJe7QPYE0kD+eXu4 RTqwN0G62/I9E+VABkYtA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1712173040; x=1712259440; bh=g5CuIlCpi99e0 TX4hO70+lSp9Ds7op4KRzQwIEoADNI=; b=Ep5940Rpf4KY7r1l3Ex2rZaMJs7YO Q7oMxKnqpPrtE2ws5uTbUhcmYJi5KEeukQxWPiobfqyQE+SAY5owqtYzWvUZyXtu aCEJ933JHVNUVNF779WNPSv2ViZug88DFkhU0W3jPVccbd0neB2Raf+1AOrgq+Dr 2Lv059fcabSKHrWBZMRJs+VAuG2WoxozedbeFOD3W2ljxqKuMWw1HOudM+XnaPc9 r4OgjjRjsqXVaJ6jP1bj2I0W3jF7zXCP1Zava8+6YeqbE2Pu5pkqM79/ERmDg1ur upBnwZyHZt28he7FLNH830iwtaFJeG5TIbrDvfDyTkMOoM8lLHnlFl7tA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefiedgleeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Apr 2024 15:37:19 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/6] btrfs: prevent pathological periodic reclaim loops Date: Wed, 3 Apr 2024 12:38:51 -0700 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Periodic reclaim runs the risk of getting stuck in a state where it keeps reclaiming the same block group over and over. This can happen if 1. reclaiming that block_group fails 2. reclaiming that block_group fails to move any extents into existing block_groups and just allocates a fresh chunk and moves everything. Currently, 1. is a very tight loop inside the reclaim worker. That is critical for edge triggered reclaim or else we risk forgetting about a reclaimable group. On the other hand, with level triggered reclaim we can break out of that loop and get it later. With that fixed, 2. applies to both failures and "successes" with no progress. If we have done a periodic reclaim on a space_info and nothing has changed in that space_info, there is not much point to trying again, so don't, until enough space gets free, which we capture with a heuristic of needing to net free 1 chunk. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 12 ++++++--- fs/btrfs/space-info.c | 56 ++++++++++++++++++++++++++++++++++++------ fs/btrfs/space-info.h | 14 +++++++++++ 3 files changed, 71 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 2a42edc3476b..a84454670d6a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1924,6 +1924,8 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) bg->start); spin_lock(&space_info->lock); space_info->reclaim_count++; + if (READ_ONCE(space_info->periodic_reclaim)) + space_info->periodic_reclaim_ready = false; spin_unlock(&space_info->lock); } else { spin_lock(&space_info->lock); @@ -1933,7 +1935,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) } next: - if (ret) + if (ret && !READ_ONCE(space_info->periodic_reclaim)) btrfs_mark_bg_to_reclaim(bg); btrfs_put_block_group(bg); @@ -3663,6 +3665,8 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, space_info->bytes_reserved -= num_bytes; space_info->bytes_used += num_bytes; space_info->disk_used += num_bytes * factor; + if (READ_ONCE(space_info->periodic_reclaim)) + btrfs_space_info_update_reclaimable(space_info, -num_bytes); spin_unlock(&cache->lock); spin_unlock(&space_info->lock); } else { @@ -3672,8 +3676,10 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, btrfs_space_info_update_bytes_pinned(info, space_info, num_bytes); space_info->bytes_used -= num_bytes; space_info->disk_used -= num_bytes * factor; - - reclaim = should_reclaim_block_group(cache, num_bytes); + if (READ_ONCE(space_info->periodic_reclaim)) + btrfs_space_info_update_reclaimable(space_info, num_bytes); + else + reclaim = should_reclaim_block_group(cache, num_bytes); spin_unlock(&cache->lock); spin_unlock(&space_info->lock); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 422fb7d4b4e1..149623cbd2d4 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 +#include "linux/spinlock.h" #include #include "misc.h" #include "ctree.h" @@ -1908,7 +1909,9 @@ static u64 calc_pct_ratio(u64 x, u64 y) */ static u64 calc_unalloc_target(struct btrfs_fs_info *fs_info) { - return BTRFS_UNALLOC_BLOCK_GROUP_TARGET * calc_effective_data_chunk_size(fs_info); + u64 chunk_sz = calc_effective_data_chunk_size(fs_info); + + return BTRFS_UNALLOC_BLOCK_GROUP_TARGET * chunk_sz; } /* @@ -1944,14 +1947,13 @@ static int calc_dynamic_reclaim_threshold(struct btrfs_space_info *space_info) u64 unused = alloc - used; u64 want = target > unalloc ? target - unalloc : 0; u64 data_chunk_size = calc_effective_data_chunk_size(fs_info); - /* Cast to int is OK because want <= target */ - int ratio = calc_pct_ratio(want, target); - /* If we have no unused space, don't bother, it won't work anyway */ + /* If we have no unused space, don't bother, it won't work anyway. */ if (unused < data_chunk_size) return 0; - return ratio; + /* Cast to int is OK because want <= target. */ + return calc_pct_ratio(want, target); } int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) @@ -1993,6 +1995,46 @@ static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, return 0; } +void btrfs_space_info_update_reclaimable(struct btrfs_space_info *space_info, s64 bytes) +{ + u64 chunk_sz = calc_effective_data_chunk_size(space_info->fs_info); + + assert_spin_locked(&space_info->lock); + space_info->reclaimable_bytes += bytes; + + if (space_info->reclaimable_bytes >= chunk_sz) + btrfs_set_periodic_reclaim_ready(space_info, true); +} + +void btrfs_set_periodic_reclaim_ready(struct btrfs_space_info *space_info, bool ready) +{ + assert_spin_locked(&space_info->lock); + if (!READ_ONCE(space_info->periodic_reclaim)) + return; + if (ready != space_info->periodic_reclaim_ready) { + space_info->periodic_reclaim_ready = ready; + if (!ready) + space_info->reclaimable_bytes = 0; + } +} + +bool btrfs_should_periodic_reclaim(struct btrfs_space_info *space_info) +{ + bool ret; + + if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) + return false; + if (!READ_ONCE(space_info->periodic_reclaim)) + return false; + + spin_lock(&space_info->lock); + ret = space_info->periodic_reclaim_ready; + btrfs_set_periodic_reclaim_ready(space_info, false); + spin_unlock(&space_info->lock); + + return ret; +} + int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info) { int ret; @@ -2000,9 +2042,7 @@ int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info) struct btrfs_space_info *space_info; list_for_each_entry(space_info, &fs_info->space_info, list) { - if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) - continue; - if (!READ_ONCE(space_info->periodic_reclaim)) + if (!btrfs_should_periodic_reclaim(space_info)) continue; for (raid = 0; raid < BTRFS_NR_RAID_TYPES; raid++) { ret = do_reclaim_sweep(fs_info, space_info, raid); diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 6f1f530d9c3b..8637be8de44f 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -190,6 +190,17 @@ struct btrfs_space_info { * threshold in the cleaner thread. */ bool periodic_reclaim; + + /* + * Periodic reclaim should be a no-op if a space_info hasn't + * freed any space since the last time we tried. + */ + bool periodic_reclaim_ready; + + /* + * Net bytes freed or allocated since the last reclaim pass. + */ + s64 reclaimable_bytes; }; struct reserve_ticket { @@ -272,6 +283,9 @@ void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info); void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); +void btrfs_space_info_update_reclaimable(struct btrfs_space_info *space_info, s64 bytes); +void btrfs_set_periodic_reclaim_ready(struct btrfs_space_info *space_info, bool ready); +bool btrfs_should_periodic_reclaim(struct btrfs_space_info *space_info); int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info); int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info); From patchwork Wed Apr 3 19:38:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13616617 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90951524DC for ; Wed, 3 Apr 2024 19:37:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173047; cv=none; b=F8trnNie0MlgFPHReaHcdV+A+Cw7yb4KaIHf/l64EEz6uc8xwx3cU/IyIz2/yCi+2ZYTU8fUgUzoh6lOYE4zCNjAL6ZnJ3gTytiTPwBxQKoNYRfsHiLBY3NLd607qFb216NTXs21vRdYSFy75aSi6HRiazNvkiqqe5PHL2PPvkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712173047; c=relaxed/simple; bh=xCAB1dz0G60fkwKoGzRwNA+R6E2Zwl69rvjxYvHOEO0=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eZCR0Y5yM8FxZDf02jORASoYLnUFLyaIa12B0SvxgBgpWTOIL+yDn+PaGTbx9p210uA5GXqemPzZsnZCnLD7mdhPw7lw2inxTkXXLQCoLKb1HmPb+zZi8d3M2Xkh+B2Zt2npkn9ZGxnvwowrY+mZUcgnHpCd94HlfF74wqONC0w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=KFmTx1SE; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=np8VFtPQ; arc=none smtp.client-ip=64.147.123.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="KFmTx1SE"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="np8VFtPQ" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id A32103200A4D; Wed, 3 Apr 2024 15:37:23 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 03 Apr 2024 15:37:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1712173043; x= 1712259443; bh=v5X+W5b6ba3edalkMnml9tOiy5Ds+VhC4vp4iVhCoAA=; b=K FmTx1SE9q2SUT3J6Y06A7vuKcle4ajX1+DLtXLI8yUgh+9TAF0RVqR7sHZo0H+TU BviWqF4n8rixX1JH/FXmOf4WPLpz1K8eJWwYCexeQhh/CJk+vEd2TEVBSPSi/GZp MFpj6+CLWp7rGOrh46wuJQzZDOAKOlbiwWIcmJv/f/usMVpPt8s06+Qw446gBhj5 vAdr4AmZzFPi9BwMauywEk/DDe6gXEiaTsYktRXmMs1V6fWaqzj84IINHsKTDoDp tkQ841RLBkkXYRvN9kQBOyzwWtAptVExfT2qCwuF+7aAznV2Fsrb4RxFDF2laWu5 cbsqmmxWF2WwZ2E1VmVyw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1712173043; x=1712259443; bh=v5X+W5b6ba3ed alkMnml9tOiy5Ds+VhC4vp4iVhCoAA=; b=np8VFtPQFhQ+QnrXzocNpfH9pNHpV vfmVJaok8wZnk6+dkOSVURqgFBZKnxzbrHGw8+qmEx8RaAz7rQKL9HQzXYLSZIks 9JLXZDLmu0x9zpG55HhmwMFJ+v5822LbbywuFQX6vx1opH/43HTqwPmYhplORXnf 5cwF5+/eJ3gfyhW8Fl24RdnAxYkWHn9zuB+ZOznm8/uhllowBt8kbLVOWnk+iRi9 a1/VNyMq+mtO1pNxv5agA0Q2x+J+5gFDO07r1cETAF37fpJNmIXkgRA28U8dg9a4 w4rNCVRFgGu6M3xBuiyc6IcoOouDtEbghQJ5bhByM2Pi95SjmkYB9gYYA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefiedgleeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Apr 2024 15:37:22 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 6/6] btrfs: urgent periodic reclaim pass Date: Wed, 3 Apr 2024 12:38:52 -0700 Message-ID: <7d32872b06daf6f9d8b79acde2e762bd5840e94b.1712168477.git.boris@bur.io> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Periodic reclaim attempts to avoid block_groups seeing active use with a sweep mark that gets cleared on allocation and set on a sweep. In urgent conditions where we have very little unallocated space (less than one chunk used by the threshold calculation for the unallocated target), we want to be able to override this mechanism. Introduce a second pass that only happens if we fail to find a reclaim candidate and reclaim is urgent. In that case, do a second pass where all block groups are eligible. Signed-off-by: Boris Burkov --- fs/btrfs/space-info.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 149623cbd2d4..fa42668d3fc3 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1965,17 +1965,35 @@ int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) return READ_ONCE(space_info->bg_reclaim_threshold); } +/* + * Under "urgent" reclaim, we will reclaim even fresh block groups that have + * recently seen successful allocations, as we are desperate to reclaim + * whatever we can to avoid ENOSPC in a transaction leading to a readonly fs. + */ +static bool is_reclaim_urgent(struct btrfs_space_info *space_info) +{ + struct btrfs_fs_info *fs_info = space_info->fs_info; + u64 unalloc = atomic64_read(&fs_info->free_chunk_space); + u64 data_chunk_size = calc_effective_data_chunk_size(fs_info); + + return unalloc < data_chunk_size; +} + static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, int raid) { struct btrfs_block_group *bg; int thresh_pct; + bool try_again = true; + bool urgent; spin_lock(&space_info->lock); + urgent = is_reclaim_urgent(space_info); thresh_pct = btrfs_calc_reclaim_threshold(space_info); spin_unlock(&space_info->lock); down_read(&space_info->groups_sem); +again: list_for_each_entry(bg, &space_info->block_groups[raid], list) { u64 thresh; bool reclaim = false; @@ -1983,14 +2001,29 @@ static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, btrfs_get_block_group(bg); spin_lock(&bg->lock); thresh = mult_perc(bg->length, thresh_pct); - if (bg->used < thresh && bg->reclaim_mark) + if (bg->used < thresh && bg->reclaim_mark) { + try_again = false; reclaim = true; + } bg->reclaim_mark++; spin_unlock(&bg->lock); if (reclaim) btrfs_mark_bg_to_reclaim(bg); btrfs_put_block_group(bg); } + + /* + * In situations where we are very motivated to reclaim (low unalloc) + * use two passes to make the reclaim mark check best effort. + * + * If we have any staler groups, we don't touch the fresher ones, but if we + * really need a block group, do take a fresh one. + */ + if (try_again && urgent) { + try_again = false; + goto again; + } + up_read(&space_info->groups_sem); return 0; }