From patchwork Thu Mar 27 22:31:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 14031487 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 589611DDA0E for ; Thu, 27 Mar 2025 22:31:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114699; cv=none; b=qrOR2/RocneIN/eD/fBFdRKF/2uZFmDYxN0qicxTd8HRvwtRjcRJ/WWxMAKI/DSf9bzLhjTpwqb1Ht4ao2dKqiEfIilvP5Ii4Y48mFhmRMCi/MtkiNKBczYCezIRNJJ9Whh/ThVEQQBzj6Oj3UnBBOcr6C9GEmcqSgihooiDiAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114699; c=relaxed/simple; bh=gj/r7fzl4OxNxnURl9/HDbo0Y7XhebwKBS+AVKTbY/A=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DeGNYFH70eT3SIqBcXkrGsNP15mSY6dOIBdUSTKTj3WSRqZJ7q+EhDB6+eyOJp6ROJjU9lejLbz9fJmhSNPin4hVqy7oiPqwrmhn8qJ64QCntmY8dOOqAxSDZGkTcfyAGNt6y2IQyt2Ki6YXWGjIjZCnrZX67u+FaQFZMX4fPCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=NInpTuA8; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=NInpTuA8; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="NInpTuA8"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="NInpTuA8" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7AF5F1F38A for ; Thu, 27 Mar 2025 22:31:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114689; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=84YbaJCSLoFSiw8NY4WN16+six+y+O/qvtMGiMpsQxg=; b=NInpTuA8khVtymX5IsR+5shaFVtbn1pHt0LzBreAOkEazR4h2uqXPtXa95xx8Qf5bcjApw QluMYj4R/28jZJMvf/+iFobXxoQbnw886t9W8N9ygmAQIkq+heSmqkhMIt7mllwMXDtdvI CNlj3gKPtJuN9aiLW687eKqXzbrW8bM= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=NInpTuA8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114689; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=84YbaJCSLoFSiw8NY4WN16+six+y+O/qvtMGiMpsQxg=; b=NInpTuA8khVtymX5IsR+5shaFVtbn1pHt0LzBreAOkEazR4h2uqXPtXa95xx8Qf5bcjApw QluMYj4R/28jZJMvf/+iFobXxoQbnw886t9W8N9ygmAQIkq+heSmqkhMIt7mllwMXDtdvI CNlj3gKPtJuN9aiLW687eKqXzbrW8bM= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id B5F95139D4 for ; Thu, 27 Mar 2025 22:31:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id GNHkHcDR5WfMagAAD6G6ig (envelope-from ) for ; Thu, 27 Mar 2025 22:31:28 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/4] btrfs: subpage: fix a bug that blocks large folios Date: Fri, 28 Mar 2025 09:01:02 +1030 Message-ID: <428b3c09f6df2820865640e2cf91a7cc0c1b4119.1743113694.git.wqu@suse.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 7AF5F1F38A X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[99.99%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; TO_DN_NONE(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: Inside the macro, subpage_calc_start_bit(), we needs to calculate the offset to the beginning of the folio. But we're using offset_in_page(), on systems with 4K page size and 4K fs block size, this means we will always return offset 0 for a large folio, causing all kinds of errors. Fix it by using offset_in_folio() instead. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/subpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 5b69c447fec9..5fbdd977121e 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -202,7 +202,7 @@ static void btrfs_subpage_assert(const struct btrfs_fs_info *fs_info, btrfs_blocks_per_folio(fs_info, folio); \ \ btrfs_subpage_assert(fs_info, folio, start, len); \ - __start_bit = offset_in_page(start) >> fs_info->sectorsize_bits; \ + __start_bit = offset_in_folio(folio, start) >> fs_info->sectorsize_bits; \ __start_bit += blocks_per_folio * btrfs_bitmap_nr_##name; \ __start_bit; \ }) From patchwork Thu Mar 27 22:31:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 14031486 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC2381DDA0E for ; Thu, 27 Mar 2025 22:31:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114696; cv=none; b=V4GAAKZU4uKs9C+OWidYi5CKffp69TThdgKNYnuLTL0l7SMy3S03Gmdzm2xg+8jjscUlQAsqZ3lsXG7ul1StZXArsE8wB8Xv4F+uFdGWQ76sN2JcJfpYOTqZ0Kwo/mFj4RZG+w0a90Vy6y0CxZumKIGI5426kKyTIE4J7SgIxOI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114696; c=relaxed/simple; bh=qtbBFBFZJFjCfDvjWBh3y0ASaYSEh41J9/fHEhAtUHA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vFQ/WjCCVRpMkrZ83Owhz8tAVStGB73+VhKKw5yL5Eq0Bi2qBmRKpvJ1M5BpXZVB03cLIZsi9z3HNDKZ8sn5/z2uO3JBSckw+z7WWvCb6FE2o74DcmOLvaBFThGUEWDcUUaGav80OSaM+1x7OCvqRpAQvlu9a7W5mCQoHcxgDGk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=KZQ622fH; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=KZQ622fH; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="KZQ622fH"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="KZQ622fH" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B280D211AB for ; Thu, 27 Mar 2025 22:31:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114690; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kTYD5GtC7kFxM/11UM0l1B+PS0/e+yk1t0DDL4zZEDg=; b=KZQ622fHBkxNS/0TH4L+7P7R0Rms9Cst2KypZKdanOSBKGXqw2wSVH6PpbDp0dhIXwM2TG 6qJUHTBaEdcmN2kO0iqc7RhF7LomE8rx0vhnKBeAEnIIBfbB3ld7EheyTb0Hgzq2joAIGx 9J6HnG03MpruQOMYwhxBS88qfqZyEB8= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=KZQ622fH DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114690; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kTYD5GtC7kFxM/11UM0l1B+PS0/e+yk1t0DDL4zZEDg=; b=KZQ622fHBkxNS/0TH4L+7P7R0Rms9Cst2KypZKdanOSBKGXqw2wSVH6PpbDp0dhIXwM2TG 6qJUHTBaEdcmN2kO0iqc7RhF7LomE8rx0vhnKBeAEnIIBfbB3ld7EheyTb0Hgzq2joAIGx 9J6HnG03MpruQOMYwhxBS88qfqZyEB8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id EDB02139D4 for ; Thu, 27 Mar 2025 22:31:29 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id gIV+K8HR5WfMagAAD6G6ig (envelope-from ) for ; Thu, 27 Mar 2025 22:31:29 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/4] btrfs: refactor how we handle reserved space inside copy_one_range() Date: Fri, 28 Mar 2025 09:01:03 +1030 Message-ID: <4baa663dcabe3a542e035ec725586118a78b0971.1743113694.git.wqu@suse.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: B280D211AB X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; TO_DN_NONE(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO There are several things not ideal inside copy_one_range(): - Unnecessary temporary variables * block_offset * reserve_bytes * dirty_blocks * num_blocks * release_bytes These are utilized to handle short-copy cases. - Inconsistent handling of btrfs_delalloc_release_extents() There is a hidden behavior that, after reserving metadata for X bytes of data write, we have to call btrfs_delalloc_release_extents() with X once and only once. Calling btrfs_delalloc_release_extents(X - 4K) and btrfs_delalloc_release_extents(4K) will cause outstanding extents accounting to go wrong. This is because the outstanding extents mechanism is not designed to handle shrink of reserved space. Improve above situations by: - Use a single @reserved_start and @reserved_len pair Now we reserved space for the initial range, and if a short copy happened and we need to shrink the reserved space, we can easily calculate the new length, and update @reserved_len. - Introduce helpers to shrink reserved data and metadata space This is done by two new helper, shrink_reserved_space() and btrfs_delalloc_shrink_extents(). The later will do a better calculation on if we need to modify the outstanding extents, and the first one will be utlized inside copy_one_range(). - Manually unlock, release reserved space and return if no byte is copied Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/delalloc-space.c | 25 +++++++++ fs/btrfs/delalloc-space.h | 3 +- fs/btrfs/file.c | 104 ++++++++++++++++++++++---------------- 3 files changed, 88 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index 88e900e5a43d..916b62221dde 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -439,6 +439,31 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes) btrfs_inode_rsv_release(inode, true); } +/* Shrink a previously reserved extent to a new length. */ +void btrfs_delalloc_shrink_extents(struct btrfs_inode *inode, u64 reserved_len, + u64 new_len) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + const u32 reserved_num_extents = count_max_extents(fs_info, reserved_len); + const u32 new_num_extents = count_max_extents(fs_info, new_len); + u32 diff_num_extents; + + ASSERT(new_len <= reserved_len); + if (new_num_extents == reserved_num_extents) + return; + + spin_lock(&inode->lock); + diff_num_extents = reserved_num_extents - new_num_extents; + btrfs_mod_outstanding_extents(inode, -diff_num_extents); + btrfs_calculate_inode_block_rsv_size(fs_info, inode); + spin_unlock(&inode->lock); + + if (btrfs_is_testing(fs_info)) + return; + + btrfs_inode_rsv_release(inode, true); +} + /* * Reserve data and metadata space for delalloc * diff --git a/fs/btrfs/delalloc-space.h b/fs/btrfs/delalloc-space.h index 3f32953c0a80..c61580c63caf 100644 --- a/fs/btrfs/delalloc-space.h +++ b/fs/btrfs/delalloc-space.h @@ -27,5 +27,6 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode, int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, u64 disk_num_bytes, bool noflush); void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes); - +void btrfs_delalloc_shrink_extents(struct btrfs_inode *inode, u64 reserved_len, + u64 new_len); #endif /* BTRFS_DELALLOC_SPACE_H */ diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b72fc00bc2f6..63c7a3294eb2 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1151,6 +1151,24 @@ static ssize_t reserve_space(struct btrfs_inode *inode, return reserve_bytes; } +/* Shrink the reserved data and metadata space from @reserved_len to @new_len. */ +static void shrink_reserved_space(struct btrfs_inode *inode, + struct extent_changeset *data_reserved, + u64 reserved_start, u64 reserved_len, + u64 new_len, bool only_release_metadata) +{ + u64 diff = reserved_len - new_len; + + ASSERT(new_len <= reserved_len); + btrfs_delalloc_shrink_extents(inode, reserved_len, new_len); + if (only_release_metadata) + btrfs_delalloc_release_metadata(inode, diff, true); + else + btrfs_delalloc_release_space(inode, data_reserved, + reserved_start + new_len, + diff, true); +} + /* * Do the heavy-lifting work to copy one range into one folio of the page cache. * @@ -1164,14 +1182,11 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct extent_state *cached_state = NULL; - const size_t block_offset = start & (fs_info->sectorsize - 1); size_t write_bytes = min(iov_iter_count(i), PAGE_SIZE - offset_in_page(start)); - size_t reserve_bytes; size_t copied; - size_t dirty_blocks; - size_t num_blocks; + const u64 reserved_start = round_down(start, fs_info->sectorsize); + u64 reserved_len; struct folio *folio = NULL; - u64 release_bytes; int extents_locked; u64 lockstart; u64 lockend; @@ -1190,23 +1205,25 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, &only_release_metadata); if (ret < 0) return ret; - reserve_bytes = ret; - release_bytes = reserve_bytes; + reserved_len = ret; + /* Write range must be inside the reserved range. */ + ASSERT(reserved_start <= start); + ASSERT(start + write_bytes <= reserved_start + reserved_len); again: ret = balance_dirty_pages_ratelimited_flags(inode->vfs_inode.i_mapping, bdp_flags); if (ret) { - btrfs_delalloc_release_extents(inode, reserve_bytes); - release_space(inode, *data_reserved, start, release_bytes, + btrfs_delalloc_release_extents(inode, reserved_len); + release_space(inode, *data_reserved, reserved_start, reserved_len, only_release_metadata); return ret; } ret = prepare_one_folio(&inode->vfs_inode, &folio, start, write_bytes, false); if (ret) { - btrfs_delalloc_release_extents(inode, reserve_bytes); - release_space(inode, *data_reserved, start, release_bytes, + btrfs_delalloc_release_extents(inode, reserved_len); + release_space(inode, *data_reserved, reserved_start, reserved_len, only_release_metadata); return ret; } @@ -1217,8 +1234,8 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, if (!nowait && extents_locked == -EAGAIN) goto again; - btrfs_delalloc_release_extents(inode, reserve_bytes); - release_space(inode, *data_reserved, start, release_bytes, + btrfs_delalloc_release_extents(inode, reserved_len); + release_space(inode, *data_reserved, reserved_start, reserved_len, only_release_metadata); ret = extents_locked; return ret; @@ -1228,42 +1245,43 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, offset_in_folio(folio, start), write_bytes, i); flush_dcache_folio(folio); - /* - * If we get a partial write, we can end up with partially uptodate - * page. Although if sector size < page size we can handle it, but if - * it's not sector aligned it can cause a lot of complexity, so make - * sure they don't happen by forcing retry this copy. - */ if (unlikely(copied < write_bytes)) { + u64 last_block; + + /* + * The original write range doesn't need an uptodate folio as + * the range is block aligned. But now a short copy happened. + * We can not handle it without an uptodate folio. + * + * So just revert the range and we will retry. + */ if (!folio_test_uptodate(folio)) { iov_iter_revert(i, copied); copied = 0; } - } - num_blocks = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); - dirty_blocks = round_up(copied + block_offset, fs_info->sectorsize); - dirty_blocks = BTRFS_BYTES_TO_BLKS(fs_info, dirty_blocks); - - if (copied == 0) - dirty_blocks = 0; - - if (num_blocks > dirty_blocks) { - /* Release everything except the sectors we dirtied. */ - release_bytes -= dirty_blocks << fs_info->sectorsize_bits; - if (only_release_metadata) { - btrfs_delalloc_release_metadata(inode, - release_bytes, true); - } else { - const u64 release_start = round_up(start + copied, - fs_info->sectorsize); - - btrfs_delalloc_release_space(inode, - *data_reserved, release_start, - release_bytes, true); + /* No copied byte, unlock, release reserved space and exit. */ + if (copied == 0) { + if (extents_locked) + unlock_extent(&inode->io_tree, lockstart, lockend, + &cached_state); + else + free_extent_state(cached_state); + btrfs_delalloc_release_extents(inode, reserved_len); + release_space(inode, *data_reserved, reserved_start, reserved_len, + only_release_metadata); + btrfs_drop_folio(fs_info, folio, start, copied); + return 0; } + + /* Release the reserved space beyond the last block. */ + last_block = round_up(start + copied, fs_info->sectorsize); + + shrink_reserved_space(inode, *data_reserved, reserved_start, + reserved_len, last_block - reserved_start, + only_release_metadata); + reserved_len = last_block - reserved_start; } - release_bytes = round_up(copied + block_offset, fs_info->sectorsize); ret = btrfs_dirty_folio(inode, folio, start, copied, &cached_state, only_release_metadata); @@ -1280,10 +1298,10 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, else free_extent_state(cached_state); - btrfs_delalloc_release_extents(inode, reserve_bytes); + btrfs_delalloc_release_extents(inode, reserved_len); if (ret) { btrfs_drop_folio(fs_info, folio, start, copied); - release_space(inode, *data_reserved, start, release_bytes, + release_space(inode, *data_reserved, reserved_start, reserved_len, only_release_metadata); return ret; } From patchwork Thu Mar 27 22:31:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 14031488 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E14611E1DF0 for ; Thu, 27 Mar 2025 22:31:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114703; cv=none; b=AvnF4+o+NN/3t8LCFtI/QIztvwuM/aN+4ss7KUqL8Gu5AWqT3z72VkGseOWjqTkCgxuF1T00IaOhDsyDHDD1CmcoDXz34suLvINLXHwi8QHU1kajwT/ZacGTXLJ/OcCn80gvBfyyEr9/WXPxQruTEnpReaRxSfInimJHX+w1kiE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114703; c=relaxed/simple; bh=OOumOEtqbyDY1x5ve6jMQ2ZT2Z9us/EhxXN05B8jo1Y=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CLsA6pLxxgzBLhQVnLws7JAB89LLw77L26WTb6I10femfVMga+M8WeQ2JegUW2fMPNB9wdrW3DZM79M2YR5eWoDyCLYccDeD3fsm5TcoisLXWKkisxS2LBcO2wpMhFZ55uSxljR8f1HvO0uu5p7k7EG4QH4lV1OwmX/ihiqukJo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=JWC4ZiZf; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=fnv2c+Oh; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="JWC4ZiZf"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="fnv2c+Oh" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EE215211B3 for ; Thu, 27 Mar 2025 22:31:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114692; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=69ig23rLRSNBK/hZgGaBbumMNHNjQuT5Fi2zW0FmmL4=; b=JWC4ZiZfa/fyWZnHDvDbwLDlAFRK3l69QVWQfd5uTBfP1xfQgrQNYFJ/i5Y2BNxEacFO86 z08EClfEYOLtPklltUjGwRuEprpJrpf78bR2lTbfWCrEAwVyMrTuNgxqRZy8012QNAEgc1 Qrza0qrEp2ogAcNA/UlD2ndDAYdGwGQ= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=fnv2c+Oh DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114691; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=69ig23rLRSNBK/hZgGaBbumMNHNjQuT5Fi2zW0FmmL4=; b=fnv2c+OhQO5CWhKLLZMardkZdOVMQgzaogFzo0suMkqFo8cBDE5SBQBKkFvhQYUpI2Is7m Skbfqhd13VcPd+07bVv0qdsRLiU1hNa+8gmNywIqK6Qo4XDiVee4n392+6d8CkESdX/nD3 rEtDnS/bVYH4DUpSuOFit1F03PDVkMU= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 31735139D4 for ; Thu, 27 Mar 2025 22:31:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OBsUOcLR5WfMagAAD6G6ig (envelope-from ) for ; Thu, 27 Mar 2025 22:31:30 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/4] btrfs: prepare btrfs_buffered_write() for large data folios Date: Fri, 28 Mar 2025 09:01:04 +1030 Message-ID: <285fe66e1d13bd9b1aa9b316da12cbaa8cb12c95.1743113694.git.wqu@suse.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: EE215211B3 X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; TO_DN_NONE(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: This involves the following modifications: - Set the order flags for __filemap_get_folio() inside prepare_one_folio() This will allow __filemap_get_folio() to create a large folio if the address space supports it. - Limit the initial @write_bytes inside copy_one_range() If the largest folio boundary splits the initial write range, there is no way we can write beyond the largest folio boundary. This is done by a simple helper function, calc_write_bytes(). - Release exceeding reserved space if the folio is smaller than expected Which is doing the same handling when short copy happened. All these preparation should not change the behavior when the largest folio order is 0. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/file.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 63c7a3294eb2..5d10ae321687 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -861,7 +861,8 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ { unsigned long index = pos >> PAGE_SHIFT; gfp_t mask = get_prepare_gfp_flags(inode, nowait); - fgf_t fgp_flags = (nowait ? FGP_WRITEBEGIN | FGP_NOWAIT : FGP_WRITEBEGIN); + fgf_t fgp_flags = (nowait ? FGP_WRITEBEGIN | FGP_NOWAIT : FGP_WRITEBEGIN) | + fgf_set_order(write_bytes); struct folio *folio; int ret = 0; @@ -1169,6 +1170,16 @@ static void shrink_reserved_space(struct btrfs_inode *inode, diff, true); } +/* Calculate the maximum amount of bytes we can write into one folio. */ +static size_t calc_write_bytes(const struct btrfs_inode *inode, + const struct iov_iter *i, u64 start) +{ + size_t max_folio_size = mapping_max_folio_size(inode->vfs_inode.i_mapping); + + return min(max_folio_size - (start & (max_folio_size - 1)), + iov_iter_count(i)); +} + /* * Do the heavy-lifting work to copy one range into one folio of the page cache. * @@ -1182,7 +1193,7 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct extent_state *cached_state = NULL; - size_t write_bytes = min(iov_iter_count(i), PAGE_SIZE - offset_in_page(start)); + size_t write_bytes = calc_write_bytes(inode, i, start); size_t copied; const u64 reserved_start = round_down(start, fs_info->sectorsize); u64 reserved_len; @@ -1227,6 +1238,20 @@ static int copy_one_range(struct btrfs_inode *inode, struct iov_iter *i, only_release_metadata); return ret; } + /* + * The reserved range goes beyond the current folio, shrink the reserved + * space to the folio boundary. + */ + if (reserved_start + reserved_len > folio_pos(folio) + folio_size(folio)) { + const u64 last_block = folio_pos(folio) + folio_size(folio); + + shrink_reserved_space(inode, *data_reserved, reserved_start, + reserved_len, last_block - reserved_start, + only_release_metadata); + write_bytes = folio_pos(folio) + folio_size(folio) - start; + reserved_len = last_block - reserved_start; + } + extents_locked = lock_and_cleanup_extent_if_need(inode, folio, start, write_bytes, &lockstart, &lockend, nowait, &cached_state); From patchwork Thu Mar 27 22:31:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 14031489 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1A4D1E1DF0 for ; Thu, 27 Mar 2025 22:31:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114710; cv=none; b=HkbikhGdi3yLsEOeQuYB18KxsbHvnPARC+dedR24E+37S4i6b0NVgeWFG8mwK5p5NFIHjLGSpokdVxaf8TIUoruBp8VbbzI4S1axL2tIHbT+medPybp3/zRxFcgdCtAtxQdazPwcZR5lx27YQn9NsYpXR1GuOa4Q/MZeRMaafsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743114710; c=relaxed/simple; bh=XAtwXhZSIuG1VeqpH718NRDxfkAiQGiF99Rm7MoZl5Q=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o8OYeWW9Z++z1RBGq1dltf/UehYh0Xqf/B3pjM5lg8XHyYAWsODTR221hecvYd9YZ9Au1yAsebFKLvuGscsvFYRwajgkWQEOKcH4SRBRZ43NLPEZsC9unShjsb8nmk2czlBwMmgAz4R6UFIRc1wRMc7R4EW22vU2ja/RSIrENVQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=k8wp9YBD; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=k8wp9YBD; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="k8wp9YBD"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="k8wp9YBD" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 31328211BA for ; Thu, 27 Mar 2025 22:31:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114693; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o+2YPOfBZyr5cHswAyt8J8lH/HMrl0X0cRxqfzdgEFg=; b=k8wp9YBDBv2WGLwSFKpjx8zr/4WC/+gWdnC8ITNs/BOp4iAE4a5AllwI4Mxry2hmUxjXdh 3wGm5Uidgu9AKlRyBT+O35laxKujyd94xM5BgBDy1F6t61YUKUsY0TK6M5FXfm7Wthu+f8 DMIJN9JIhkDoHeq1HHugaQlHh9FOjIE= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=k8wp9YBD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1743114693; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o+2YPOfBZyr5cHswAyt8J8lH/HMrl0X0cRxqfzdgEFg=; b=k8wp9YBDBv2WGLwSFKpjx8zr/4WC/+gWdnC8ITNs/BOp4iAE4a5AllwI4Mxry2hmUxjXdh 3wGm5Uidgu9AKlRyBT+O35laxKujyd94xM5BgBDy1F6t61YUKUsY0TK6M5FXfm7Wthu+f8 DMIJN9JIhkDoHeq1HHugaQlHh9FOjIE= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6C64E139D4 for ; Thu, 27 Mar 2025 22:31:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id GH3hC8TR5WfMagAAD6G6ig (envelope-from ) for ; Thu, 27 Mar 2025 22:31:32 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/4] btrfs: prepare btrfs_punch_hole_lock_range() for large data folios Date: Fri, 28 Mar 2025 09:01:05 +1030 Message-ID: <64d8a34bed1360c4771ead6a66e3c6df0ab86a7f.1743113694.git.wqu@suse.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 31328211BA X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; TO_DN_NONE(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO The function btrfs_punch_hole_lock_range() needs to make sure there is no other folio in the range, thus it goes with filemap_range_has_page(), which works pretty fine. But if we have large folios, under the following case filemap_range_has_page() will always return true, forcing btrfs_punch_hole_lock_range() to do a very time consuming busy loop: start end | | |//|//|//|//| | | | | | | | |//|//| \ / \ / Folio A Folio B In above case, folio A and B contains our start/end index, and there is no other folios in the range. Thus there is no other folios and we do not need to retry inside btrfs_punch_hole_lock_range(). To prepare for large data folios, introduce a helper, check_range_has_page(), which will: - Grab all the folios inside the range - Skip any large folios that covers the start and end index - If any other folios is found return true - Otherwise return false This new helper is going to handle both large folios and regular ones. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/file.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5d10ae321687..417c90ffc6fa 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2157,6 +2157,54 @@ static int find_first_non_hole(struct btrfs_inode *inode, u64 *start, u64 *len) return ret; } +/* + * The helper to check if there is no folio in the range. + * + * We can not utilized filemap_range_has_page() in a filemap with large folios + * as we can hit the following false postive: + * + * start end + * | | + * |//|//|//|//| | | | | | | | |//|//| + * \ / \ / + * Folio A Folio B + * + * That large folio A and B covers the start and end index. + * In that case filemap_range_has_page() will always return true, but the above + * case is fine for btrfs_punch_hole_lock_range() usage. + * + * So here we only ensure that no other folio is in the range, excluding the + * head/tail large folio. + */ +static bool check_range_has_page(struct inode *inode, u64 start, u64 end) +{ + struct folio_batch fbatch; + bool ret = false; + const pgoff_t start_index = start >> PAGE_SHIFT; + const pgoff_t end_index = end >> PAGE_SHIFT; + pgoff_t tmp = start_index; + int found_folios; + + folio_batch_init(&fbatch); + found_folios = filemap_get_folios(inode->i_mapping, &tmp, end_index, + &fbatch); + for (int i = 0; i < found_folios; i++) { + struct folio *folio = fbatch.folios[i]; + + /* A large folio begins before the start. Not a target. */ + if (folio->index < start_index) + continue; + /* A large folio extends beyond the end. Not a target. */ + if (folio->index + folio_nr_pages(folio) > end_index) + continue; + /* A folio doesn't cover the head/tail index. Found a target. */ + ret = true; + break; + } + folio_batch_release(&fbatch); + return ret; +} + static void btrfs_punch_hole_lock_range(struct inode *inode, const u64 lockstart, const u64 lockend, @@ -2188,8 +2236,7 @@ static void btrfs_punch_hole_lock_range(struct inode *inode, * locking the range check if we have pages in the range, and if * we do, unlock the range and retry. */ - if (!filemap_range_has_page(inode->i_mapping, page_lockstart, - page_lockend)) + if (!check_range_has_page(inode, page_lockstart, page_lockend)) break; unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,