From patchwork Tue Feb 20 12:24:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13563982 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B44E6994F for ; Tue, 20 Feb 2024 12:24:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708431882; cv=none; b=LREL6RQ6U9Pk+CrqObjmtA22jTx2J1Tncib4rhwy99Gm0xrGAiOGJjtxzUMYQINXAm6PiDBpdOEuY85ZbltPoIQ1bEbXaMOHiKbbpjeuuDvJpaJKd+oXuhHjq0HM/Ft7GLc638VqwHaNa0KXIYcSbw7kBXH1YI270mKfxCJ5dWw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708431882; c=relaxed/simple; bh=ISK9fgpSrdFcxqKx9Eh5QAlkcR5uAvwWFDaZfXGYQto=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bKQ4BM5kbAZMSZPXVkXMQVfcD/THrBDQ6XA9viDwu+2/u9zxySTe/c4lm/+Lw/9Jm318wU0nE0uoekvijQysWAaORpqvhE0GaV1+8A0yL/eOagAOD8E8Sac275bAkmo7S2TDXUKP8FU0VgEzrZ3by37rtFgU/WtXCQk89Mpf2Ck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ip3MPX+2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ip3MPX+2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD940C433F1 for ; Tue, 20 Feb 2024 12:24:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708431882; bh=ISK9fgpSrdFcxqKx9Eh5QAlkcR5uAvwWFDaZfXGYQto=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ip3MPX+2aKL0pQ+j6iPlOr8K9Xz8g+0itVsr1sQ9kT1XUpPBDZ1XQvnMNm7XDCp4s bDeuRYcv5OnxV2+Eu2X8LDtKl4f824fRZhHypJtfkFsXODDFH/wIfIdeqgYogeXQC7 xYNIQH6cA2lybQ0jJlyGbRURWSOkNUXmcx+XL9gGKMN/qLOGdo23iA4Uw4p9cMIhKb lTmLqCcJv3za0IXtt+5D9FD3PORbRx+A0Ov7ydx6xZMnzL35lGuAuYnwz8jmtQfYf/ Ew76e244O+1HN28tbLq5qoiuOlx1p497ysCJJFHqujL8ekS85V+Rl5+1p/BblGGtcq tocUr3UdoFNnQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/2] btrfs: fix data races when accessing the reserved amount of block reserves Date: Tue, 20 Feb 2024 12:24:33 +0000 Message-Id: <5ff1a68f4289d5bb870a499b248d329893d417ae.1708429856.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana At space_info.c we have several places where we access the ->reserved field of a block reserve without taking the block reserve's spinlock first, which makes KCSAN warn about a data race since that field is always updated while holding the spinlock. The reports from KCSAN are like the following: [ 117.193526] BUG: KCSAN: data-race in btrfs_block_rsv_release [btrfs] / need_preemptive_reclaim [btrfs] [ 117.195148] read to 0x000000017f587190 of 8 bytes by task 6303 on cpu 3: [ 117.195172] need_preemptive_reclaim+0x222/0x2f0 [btrfs] [ 117.195992] __reserve_bytes+0xbb0/0xdc8 [btrfs] [ 117.196807] btrfs_reserve_metadata_bytes+0x4c/0x120 [btrfs] [ 117.197620] btrfs_block_rsv_add+0x78/0xa8 [btrfs] [ 117.198434] btrfs_delayed_update_inode+0x154/0x368 [btrfs] [ 117.199300] btrfs_update_inode+0x108/0x1c8 [btrfs] [ 117.200122] btrfs_dirty_inode+0xb4/0x140 [btrfs] [ 117.200937] btrfs_update_time+0x8c/0xb0 [btrfs] [ 117.201754] touch_atime+0x16c/0x1e0 [ 117.201789] filemap_read+0x674/0x728 [ 117.201823] btrfs_file_read_iter+0xf8/0x410 [btrfs] [ 117.202653] vfs_read+0x2b6/0x498 [ 117.203454] ksys_read+0xa2/0x150 [ 117.203473] __s390x_sys_read+0x68/0x88 [ 117.203495] do_syscall+0x1c6/0x210 [ 117.203517] __do_syscall+0xc8/0xf0 [ 117.203539] system_call+0x70/0x98 [ 117.203579] write to 0x000000017f587190 of 8 bytes by task 11 on cpu 0: [ 117.203604] btrfs_block_rsv_release+0x2e8/0x578 [btrfs] [ 117.204432] btrfs_delayed_inode_release_metadata+0x7c/0x1d0 [btrfs] [ 117.205259] __btrfs_update_delayed_inode+0x37c/0x5e0 [btrfs] [ 117.206093] btrfs_async_run_delayed_root+0x356/0x498 [btrfs] [ 117.206917] btrfs_work_helper+0x160/0x7a0 [btrfs] [ 117.207738] process_one_work+0x3b6/0x838 [ 117.207768] worker_thread+0x75e/0xb10 [ 117.207797] kthread+0x21a/0x230 [ 117.207830] __ret_from_fork+0x6c/0xb8 [ 117.207861] ret_from_fork+0xa/0x30 So add a helper to get the reserved amount of a block reserve while holding the lock. The value may be not be up to date anymore when used by need_preemptive_reclaim() and btrfs_preempt_reclaim_metadata_space(), but that's ok since the worst it can do is cause more reclaim work do be done sooner rather than later. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana --- fs/btrfs/block-rsv.h | 16 ++++++++++++++++ fs/btrfs/space-info.c | 26 +++++++++++++------------- 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h index b621199b0130..89c99f4e9f16 100644 --- a/fs/btrfs/block-rsv.h +++ b/fs/btrfs/block-rsv.h @@ -108,4 +108,20 @@ static inline bool btrfs_block_rsv_full(const struct btrfs_block_rsv *rsv) return data_race(rsv->full); } +/* + * Get the reserved mount of a block reserve in a context where getting a stale + * value is acceptable, instead of accessing it directly and trigger data race + * warning from KCSAN. + */ +static inline u64 btrfs_block_rsv_reserved(struct btrfs_block_rsv *rsv) +{ + u64 ret; + + spin_lock(&rsv->lock); + ret = rsv->reserved; + spin_unlock(&rsv->lock); + + return ret; +} + #endif /* BTRFS_BLOCK_RSV_H */ diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index a5b652c1650a..d620323d08ea 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -855,7 +855,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info) { - u64 global_rsv_size = fs_info->global_block_rsv.reserved; + const u64 global_rsv_size = btrfs_block_rsv_reserved(&fs_info->global_block_rsv); u64 ordered, delalloc; u64 thresh; u64 used; @@ -955,8 +955,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, ordered = percpu_counter_read_positive(&fs_info->ordered_bytes) >> 1; delalloc = percpu_counter_read_positive(&fs_info->delalloc_bytes); if (ordered >= delalloc) - used += fs_info->delayed_refs_rsv.reserved + - fs_info->delayed_block_rsv.reserved; + used += btrfs_block_rsv_reserved(&fs_info->delayed_refs_rsv) + + btrfs_block_rsv_reserved(&fs_info->delayed_block_rsv); else used += space_info->bytes_may_use - global_rsv_size; @@ -1172,7 +1172,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) enum btrfs_flush_state flush; u64 delalloc_size = 0; u64 to_reclaim, block_rsv_size; - u64 global_rsv_size = global_rsv->reserved; + const u64 global_rsv_size = btrfs_block_rsv_reserved(global_rsv); loops++; @@ -1184,9 +1184,9 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) * assume it's tied up in delalloc reservations. */ block_rsv_size = global_rsv_size + - delayed_block_rsv->reserved + - delayed_refs_rsv->reserved + - trans_rsv->reserved; + btrfs_block_rsv_reserved(delayed_block_rsv) + + btrfs_block_rsv_reserved(delayed_refs_rsv) + + btrfs_block_rsv_reserved(trans_rsv); if (block_rsv_size < space_info->bytes_may_use) delalloc_size = space_info->bytes_may_use - block_rsv_size; @@ -1206,16 +1206,16 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) to_reclaim = delalloc_size; flush = FLUSH_DELALLOC; } else if (space_info->bytes_pinned > - (delayed_block_rsv->reserved + - delayed_refs_rsv->reserved)) { + (btrfs_block_rsv_reserved(delayed_block_rsv) + + btrfs_block_rsv_reserved(delayed_refs_rsv))) { to_reclaim = space_info->bytes_pinned; flush = COMMIT_TRANS; - } else if (delayed_block_rsv->reserved > - delayed_refs_rsv->reserved) { - to_reclaim = delayed_block_rsv->reserved; + } else if (btrfs_block_rsv_reserved(delayed_block_rsv) > + btrfs_block_rsv_reserved(delayed_refs_rsv)) { + to_reclaim = btrfs_block_rsv_reserved(delayed_block_rsv); flush = FLUSH_DELAYED_ITEMS_NR; } else { - to_reclaim = delayed_refs_rsv->reserved; + to_reclaim = btrfs_block_rsv_reserved(delayed_refs_rsv); flush = FLUSH_DELAYED_REFS_NR; } From patchwork Tue Feb 20 12:24:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13563983 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D70DE67C7D for ; Tue, 20 Feb 2024 12:24:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708431883; cv=none; b=RyTuj8heySdV0jjTIl6dShm/6JlnliE73yNkOZTEO0/P37hO3E2UhO2ACSXc1+ZeWcUMdc+YMpQqWAVk9rElO+XKbe+RsEpUd70s1mSVXcEQkIOS5SeznVy5XBocPKeKHlV1qgpqWIT01L7IUV85ZLVblcFP24SUrqlq9Hw9e4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708431883; c=relaxed/simple; bh=Db2zZfagtc5SUyxU9Vw51XrWEwAxgPwVxq9qaOlY20U=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kWsl2v0KmRkFaX9Um572N9fzMbon616wv63n1G/XQDK9Z/nJhlBdW0DiCVE1kyFvHeEDgepiPAqIyFH8DUh869BunFTJ3KR6opHZXz0PeqtzlLm9uractlo7qpy/tlDvkwzedwlLkld3GyeUouhJiGqOzA2+GKN98IZRJXsBZEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UEs3KDkF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UEs3KDkF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC85BC433C7 for ; Tue, 20 Feb 2024 12:24:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708431883; bh=Db2zZfagtc5SUyxU9Vw51XrWEwAxgPwVxq9qaOlY20U=; h=From:To:Subject:Date:In-Reply-To:References:From; b=UEs3KDkFlro8o5WjGY6/P/55xcLgF3/+tuyF+8Y6k78JRGyMzTqmTi93nHkL9QFsB xmGRm/UXPoqD6IFn3KQRsCfeZe/h83K5iVXqU4yVK4GTDwbOUHGyIeLO8yRe6SKgM8 CLXxNUUi1nD+cbZ2MM1jR3qhhhy/02rRgzUP3fRziQVPJ0oMQJnoSHAdtsMX64BIqE GAztGwiUc0K2wBVY9H+Z3FQce0zxUtWmij11dNWO0IOme6jTPGbRnUioWvXVjK3dAH ku3JTBApW/nC1G+rSRvruitkvJqdJ4Trg+nLHneJ/QfcWe+w6OF7BbIMV/gWzkaRcg 8+XZVllSzMsWA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/2] btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve Date: Tue, 20 Feb 2024 12:24:34 +0000 Message-Id: <2fb327ae5cb4c9d401a477de7c38918dd00aa538.1708429856.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Filipe Manana At btrfs_use_block_rsv() we read the size of a block reserve without locking its spinlock, which makes KCSAN complain because the size of a block reserve is always updated while holding its spinlock. The report from KCSAN is the following: [ 653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs] [ 653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0: [ 653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs] [ 653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs] [ 653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs] [ 653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs] [ 653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs] [ 653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs] [ 653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs] [ 653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs] [ 653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs] [ 653.322140] flush_space+0x5e4/0x718 [btrfs] [ 653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs] [ 653.323781] process_one_work+0x3b6/0x838 [ 653.323800] worker_thread+0x75e/0xb10 [ 653.323817] kthread+0x21a/0x230 [ 653.323836] __ret_from_fork+0x6c/0xb8 [ 653.323855] ret_from_fork+0xa/0x30 [ 653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3: [ 653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs] [ 653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs] [ 653.325494] btrfs_free_extent+0x76/0x120 [btrfs] [ 653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs] [ 653.327064] btrfs_dec_ref+0x50/0x70 [btrfs] [ 653.327849] walk_up_proc+0x236/0xa50 [btrfs] [ 653.328633] walk_up_tree+0x21c/0x448 [btrfs] [ 653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs] [ 653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs] [ 653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs] [ 653.331781] kthread+0x21a/0x230 [ 653.331800] __ret_from_fork+0x6c/0xb8 [ 653.331818] ret_from_fork+0xa/0x30 So add a helper to get the size of a block reserve while holding the lock. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana --- fs/btrfs/block-rsv.c | 2 +- fs/btrfs/block-rsv.h | 16 ++++++++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c index 27207dad27c2..95c174f9fd4f 100644 --- a/fs/btrfs/block-rsv.c +++ b/fs/btrfs/block-rsv.c @@ -493,7 +493,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans, block_rsv = get_block_rsv(trans, root); - if (unlikely(block_rsv->size == 0)) + if (unlikely(btrfs_block_rsv_size(block_rsv) == 0)) goto try_reserve; again: ret = btrfs_block_rsv_use_bytes(block_rsv, blocksize); diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h index 89c99f4e9f16..1f53b967d069 100644 --- a/fs/btrfs/block-rsv.h +++ b/fs/btrfs/block-rsv.h @@ -124,4 +124,20 @@ static inline u64 btrfs_block_rsv_reserved(struct btrfs_block_rsv *rsv) return ret; } +/* + * Get the size of a block reserve in a context where getting a stale value is + * acceptable, instead of accessing it directly and trigger data race warning + * from KCSAN. + */ +static inline u64 btrfs_block_rsv_size(struct btrfs_block_rsv *rsv) +{ + u64 ret; + + spin_lock(&rsv->lock); + ret = rsv->size; + spin_unlock(&rsv->lock); + + return ret; +} + #endif /* BTRFS_BLOCK_RSV_H */