From patchwork Mon Mar 3 08:35:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998304 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 565421E98FF for ; Mon, 3 Mar 2025 08:35:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990952; cv=none; b=XOR3ntUDHGWk18L1sHj6L7gFnEQAKLG9fsrZy+rIu4qo8uoI8yk47p+wVR7bzvSjCPG/zsvkRHt3PZN8VBmwrmYj3ZRVD5gp9wjF8IdaxM/f2DsMWaRsX+8DLTaaBOfzwQ3EmiBpyfvHC9vJjIUIXvRhFHj9hCkQMatrcXv4RAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990952; c=relaxed/simple; bh=hnJZxVQ+z/9RbxkilA3nPjiZeAvUQkXVzFiyR2Ge3E4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SkZ+omXh+ZhmxEG9zSTBC1rxIVtdnTq5Y0ApD1hbe7+aqYL2iDWbczIMp2BJbyfEYsaXJInB3gV3kbLxAZw6kP2Yts+N8ofOoAihNPnDmaqs5pLX6hjcEYPulanFFvxRqK0xH5YZ0dE5CKDaH2y22QEKu6KJW5UN4Fh7EUkVZp8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=YRPohDAb; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=YRPohDAb; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="YRPohDAb"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="YRPohDAb" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6FD061F393; Mon, 3 Mar 2025 08:35:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990936; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QKM+GmBZboMw9eZIFFMNWU+xJrIPOAfSaDxGooy9JlU=; b=YRPohDAbIc1nOGstQ7JLeHkAlLdX7M0njLkXjS9LHFr4Qn7W92xo0rCvwxzK85ojuyhT1l UKsPwIMYqK/iKgFg459iBzd3lclrVt+umQY4VGPAv9S0b9tse+F17BWVuUvJnXX2t8gpTP tHBthfPTMF5zpwKIt2SQ5TuxgKdjQ3Y= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=YRPohDAb DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990936; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QKM+GmBZboMw9eZIFFMNWU+xJrIPOAfSaDxGooy9JlU=; b=YRPohDAbIc1nOGstQ7JLeHkAlLdX7M0njLkXjS9LHFr4Qn7W92xo0rCvwxzK85ojuyhT1l UKsPwIMYqK/iKgFg459iBzd3lclrVt+umQY4VGPAv9S0b9tse+F17BWVuUvJnXX2t8gpTP tHBthfPTMF5zpwKIt2SQ5TuxgKdjQ3Y= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 332EA13939; Mon, 3 Mar 2025 08:35:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id EI2NOdZpxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:34 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Jens Axboe , Matthew Wilcox Subject: [PATCH v3 1/8] btrfs: disable uncached writes for now Date: Mon, 3 Mar 2025 19:05:09 +1030 Message-ID: <25f0ab13b113ff37ae66cab26be7e458321db74f.1740990125.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 6FD061F393 X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; FROM_EQ_ENVFROM(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: Currently btrfs calls folio_end_writeback() inside a spinlock, to prevent races of async extents for the same folio. The async extent mechanism is utilized for compressed writes, which allows a folio (or part of a folio) to be kept locked, and queue the range into a workqueue and do the compression. After the compression is done, then submit the compressed data and set involved blocks writeback and unlock the range. Such the async extent mechanism disrupts the regular writeback behavior, where normally we submit all the involved blocks inside the same folio in one go. Now with async extent parts of the same folio can be randomly marked writeback at any time, by different threads. Thus for fs block size < page size cases, btrfs always hold a spinlock when setting/clearing the folio writeback flag, to avoid async extents to race on the same folio. But now with the dropbehind folio flag, folio_end_writeback() is no longer safe to be called inside a spinlock: folio_end_writeback() |- folio_end_dropbehind_write() |- if (in_task() && folio_trylock()) | Since all btrfs endio functions happen inside a workqueue, | it will always pass in_task() check. | |- folio_unmap_invalidate() |- folio_launder() !! MAY SLEEP !! And there is no gfp flag to let the fs to avoid sleeping. Furthermore, for fs blocks < page size cases, we can even deadlock on the same subpage spinlock: btrfs_subpage_clear_writeback() |- spin_lock_irqsave(&subpage->lock) |- folio_end_writeback() |- folio_end_dropbehind_write() |- folio_unmap_invalidate() |- filemap_release_folio() |- __btrfs_release_folio() |- wait_subpage_spinlock() |- spin_lock_irq(&subpage->lock); !! DEADLOCK !! So for now let's disable uncached write for btrfs, until we solve all problems with planned solutions: - Use atomic to trace writeback status Need to remove the COW fixup (handling of out-of-band dirty folio) routine first and align the member to iomap_folio_status structure first. - Better async extent handling Instead of leaving the folios locked and set writeback flag after compression, change it to set writeback flags then start compression. Fixes: fcc7e3306e11 ("btrfs: add support for uncached writes (RWF_DONTCACHE)") Cc: Jens Axboe Suggested-by: Matthew Wilcox Signed-off-by: Qu Wenruo --- Thankfully the btrfs uncached writes patch is not yet pushed to upstream, I can remove it from for-next branch if this patch got enough reviews. But I prefer not to, as that commit still contains some good cleanup on the FGP flags. --- fs/btrfs/file.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e87d4a37c929..fe9e98f916f4 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1099,8 +1099,17 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) fgp_flags |= FGP_NOWAIT; } + /* + * DONTCACHE will make folio reclaim happen immediately at + * folio_end_writeback(), for fs block size < page size cases it will + * happen inside a spinlock (due to possible async extents races), + * and such folio_end_writeback() may cause sleep inside a spinlock. + * + * So disable DONTCACHE until we either reworked async extent, or find + * a better way to handle per-block writeback tracking. + */ if (iocb->ki_flags & IOCB_DONTCACHE) - fgp_flags |= FGP_DONTCACHE; + return -EOPNOTSUPP; ret = btrfs_inode_lock(BTRFS_I(inode), ilock_flags); if (ret < 0) @@ -3679,7 +3688,7 @@ const struct file_operations btrfs_file_operations = { #endif .remap_file_range = btrfs_remap_file_range, .uring_cmd = btrfs_uring_cmd, - .fop_flags = FOP_BUFFER_RASYNC | FOP_BUFFER_WASYNC | FOP_DONTCACHE, + .fop_flags = FOP_BUFFER_RASYNC | FOP_BUFFER_WASYNC, }; int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end) From patchwork Mon Mar 3 08:35:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998306 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4857E1EC011 for ; Mon, 3 Mar 2025 08:35:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990959; cv=none; b=eHCmwuQL/ldT6iytCBBu7lRgZKZ+lmdhKsL1Rq8kIE4oVbcGNnVyTJG/RCqiHJRo4eSeNM4KHxTNbRfAEIWFolvg9eeFuW35nLoz6vrNxLVAaeosQ0JGrE5DDK/4A8KxzSY7sYqKF5hmIkjeT4KP+O+g4L6NJCM9Tag+ZQPn/Lo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990959; c=relaxed/simple; bh=8bV2lqIXiPv0EEs2Zt4quK7AhPmww+vYsUbUAzMnxO0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ATEsCa2U1m5cnEkLqzMteblNflOX+pwA125yXPPWo2+dvnN3coj0mZm+Dv5nQCRg5Wx2TZTWVNPLL70IG3EjawoYbNMCr0qOnv8DnZAUctfSh7JX96JN37c3rPs/Yr471Nslx4vOcimmHt3CVXCpNA46GrYIj2/dBF/vZWyH5C4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=aZVtpv1w; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=aZVtpv1w; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="aZVtpv1w"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="aZVtpv1w" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E0FF81F443; Mon, 3 Mar 2025 08:35:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990937; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0dpc9Q9J+Dw5x1lxYSi5q5HxjUJnIWTQCouuvCgRk5A=; b=aZVtpv1w2XjfWhP4w8mTFuJ+iFWlQPcB92EpgJCzKP3UtyKpmY8kwGX42cMaAi51kSqPR7 rglliDkbEbsPV7fK+xvdexFQH+VlPWSSd9x5NUCpUauHnMRveoQkdTDWBzEhAzCwQk63TV Hsolq/REvOoSeIDU6GjqWD54ZQDLaUU= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990937; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0dpc9Q9J+Dw5x1lxYSi5q5HxjUJnIWTQCouuvCgRk5A=; b=aZVtpv1w2XjfWhP4w8mTFuJ+iFWlQPcB92EpgJCzKP3UtyKpmY8kwGX42cMaAi51kSqPR7 rglliDkbEbsPV7fK+xvdexFQH+VlPWSSd9x5NUCpUauHnMRveoQkdTDWBzEhAzCwQk63TV Hsolq/REvOoSeIDU6GjqWD54ZQDLaUU= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E199813939; Mon, 3 Mar 2025 08:35:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 4FGWKNhpxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:36 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 2/8] btrfs: prevent inline data extents read from touching blocks beyond its range Date: Mon, 3 Mar 2025 19:05:10 +1030 Message-ID: <432a8c1f69c0d54e445e91abed056cc99591f89b.1740990125.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.996]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid]; RCVD_TLS_ALL(0.00)[] X-Spam-Score: -2.80 X-Spam-Flag: NO Currently reading an inline data extent will zero out all the remaining range in the page. This is not yet causing problems even for block size < page size (subpage) cases because: 1) An inline data extent always starts at file offset 0 Meaning at page read, we always read the inline extent first, before any other blocks in the page. Then later blocks are properly read out and re-fill the zeroed out ranges. 2) Currently btrfs will read out the whole page if a buffered write is not page aligned So a page is either fully uptodate at buffered write time (covers the whole page), or we will read out the whole page first. Meaning there is nothing to lose for such an inline extent read. But it's still not ideal: - We're zeroing out the page twice One done by read_inline_extent()/uncompress_inline(), one done by btrfs_do_readpage() for ranges beyond i_size. - We're touching blocks that doesn't belong to the inline extent In the incoming patches, we can have a partial uptodate folio, that some dirty blocks can exist while the page is not fully uptodate: The page size is 16K and block size is 4K: 0 4K 8K 12K 16K | | |/////////| | And range [8K, 12K) is dirtied by a buffered write, the remaining blocks are not uptodate. If range [0, 4K) contains an inline data extent, and we try to read the whole page, the current behavior will overwrite range [8K, 12K) with zero and cause data loss. So to make the behavior more consistent and in preparation for future changes, limit the inline data extents read to only zero out the range inside the first block, not the whole page. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ae1405b49a9f..3652c3485e19 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6793,6 +6793,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, { int ret; struct extent_buffer *leaf = path->nodes[0]; + const u32 sectorsize = leaf->fs_info->sectorsize; char *tmp; size_t max_size; unsigned long inline_size; @@ -6809,7 +6810,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, read_extent_buffer(leaf, tmp, ptr, inline_size); - max_size = min_t(unsigned long, PAGE_SIZE, max_size); + max_size = min_t(unsigned long, sectorsize, max_size); ret = btrfs_decompress(compress_type, tmp, folio, 0, inline_size, max_size); @@ -6821,14 +6822,15 @@ static noinline int uncompress_inline(struct btrfs_path *path, * cover that region here. */ - if (max_size < PAGE_SIZE) - folio_zero_range(folio, max_size, PAGE_SIZE - max_size); + if (max_size < sectorsize) + folio_zero_range(folio, max_size, sectorsize - max_size); kfree(tmp); return ret; } static int read_inline_extent(struct btrfs_path *path, struct folio *folio) { + const u32 sectorsize = path->nodes[0]->fs_info->sectorsize; struct btrfs_file_extent_item *fi; void *kaddr; size_t copy_size; @@ -6843,14 +6845,14 @@ static int read_inline_extent(struct btrfs_path *path, struct folio *folio) if (btrfs_file_extent_compression(path->nodes[0], fi) != BTRFS_COMPRESS_NONE) return uncompress_inline(path, folio, fi); - copy_size = min_t(u64, PAGE_SIZE, + copy_size = min_t(u64, sectorsize, btrfs_file_extent_ram_bytes(path->nodes[0], fi)); kaddr = kmap_local_folio(folio, 0); read_extent_buffer(path->nodes[0], kaddr, btrfs_file_extent_inline_start(fi), copy_size); kunmap_local(kaddr); - if (copy_size < PAGE_SIZE) - folio_zero_range(folio, copy_size, PAGE_SIZE - copy_size); + if (copy_size < sectorsize) + folio_zero_range(folio, copy_size, sectorsize - copy_size); return 0; } From patchwork Mon Mar 3 08:35:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998307 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 242661E98FF for ; Mon, 3 Mar 2025 08:36:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990965; cv=none; b=fmrnXiPQtZp/q8KdOaucuB1+HlZ4kqa/5Otr9BqFgLWLU4u/xbgDpBBZf6hqk6gsTwqZ24Z5w9uIjypRUlv3d7ca5Vkr+tDyFpHy7IQm72fNcAClM8DC+fvwyOFiQUqaSnBLcwZDxqXjF+5QG895AvR1oJgjA+ORouIIMRUqCXo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990965; c=relaxed/simple; bh=amig9pL839CNVUFaLN33wLZrMKHpu/ZM/Qtn+uoYJyo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MdCJYUuPUnr8eBW55Fq8tSlob4XpBYLuDAmC45fUtTHmb+0zjQx8XcSJUIgkSYKxEwlAQtap/RbhiC8aeCDR2drfKEYuSJuRcaYXVXdG0GB0CNx/2/avrnZc3FRvXmg7tH6rV9f0SXkNZ0atzKgYG0W0ICNeeyPT/S0YeUk1dWg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=ToH3LS0h; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=ToH3LS0h; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="ToH3LS0h"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="ToH3LS0h" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5F3D01F444; Mon, 3 Mar 2025 08:35:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990939; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WIcU4NXOQab0ESjlHFcPq43JEzZ39O00AK7oQs902nQ=; b=ToH3LS0hXwno2/0DmnC56Mu8f0FEVZA7AzcJ6RSTZRluM4yIoF1KwKh2FK85jlT89qoEgb aVeLjTkC6M8jnUaYhqAYkAzAXBmommFigYFV0rPYrvzY2Xi0PMt/z2Z3t9Bn/HqoKb2+JB iv/EiYiXdmzmVU8Mxkb4OQo9jM1cNmM= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=ToH3LS0h DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990939; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WIcU4NXOQab0ESjlHFcPq43JEzZ39O00AK7oQs902nQ=; b=ToH3LS0hXwno2/0DmnC56Mu8f0FEVZA7AzcJ6RSTZRluM4yIoF1KwKh2FK85jlT89qoEgb aVeLjTkC6M8jnUaYhqAYkAzAXBmommFigYFV0rPYrvzY2Xi0PMt/z2Z3t9Bn/HqoKb2+JB iv/EiYiXdmzmVU8Mxkb4OQo9jM1cNmM= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5F4A013939; Mon, 3 Mar 2025 08:35:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cAa4CNppxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:38 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 3/8] btrfs: fix the qgroup data free range for inline data extents Date: Mon, 3 Mar 2025 19:05:11 +1030 Message-ID: <67cb189930e47272a89cb159f9a070f7ccd12700.1740990125.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 5F3D01F444 X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FUZZY_BLOCKED(0.00)[rspamd.com]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO Inside function __cow_file_range_inline() since the inlined data no longer takes any data space, we need to free up the reserved space. However the code is still using the old page size == sector size assumption, and will not handle subpage case well. Thankfully it is not going to cause any problems because we have two extra safe nets: - Inline data extents creation is disable for sector size < page size cases for now But it won't stay that for long. - btrfs_qgroup_free_data() will only clear ranges which are already reserved So even if we pass a range larger than what we need, it should still be fine, especially there is only reserved space for a single block at file offset 0 for an inline data extent. But just for the sake of consistentcy, fix the call site to use sectorsize instead of page size. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3652c3485e19..c7b0f1173722 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -672,7 +672,7 @@ static noinline int __cow_file_range_inline(struct btrfs_inode *inode, * And at reserve time, it's always aligned to page size, so * just free one page here. */ - btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE, NULL); + btrfs_qgroup_free_data(inode, NULL, 0, fs_info->sectorsize, NULL); btrfs_free_path(path); btrfs_end_transaction(trans); return ret; From patchwork Mon Mar 3 08:35:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998308 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FC501E98FF for ; Mon, 3 Mar 2025 08:36:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990974; cv=none; b=cb/i/DXAPHWv+MoB58hwwXKVHTYCT+9iGEL/nxSrUz1SzZMRfX3wqiiYF4PZBEgSnFc3BgkFr3S0L0idv792xt00UikIVOwCLNGXcR+dGqcbIzQ+wLmBZ34hD2i6+rIPRGWx6dUnCSgpsQ+alYEAm3FQPWLm6aOF6AWeXGUum5w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990974; c=relaxed/simple; bh=GMgIuVBwA39kOqTTyC1imkbrE/j4wah2dP1DXBWLwK4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ht0zgUueViFit5K6ROTPgrKYwMsZ+czCBXRjVVMXZX9+OtScpdSJmE6vXhJHI8h0uobiGtomcjSaI6g8cAKQCGSj8f+X6XXz8tf5b38Q+4JVcT6N7kgx+8EFax4tz0ZEMY/PvdQ0f5hPMDb5JBVslCdX944uHoU97nC6c7Qaczg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=I7LYYjqw; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=I7LYYjqw; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="I7LYYjqw"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="I7LYYjqw" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D2C041F449; Mon, 3 Mar 2025 08:35:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990940; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vb80gJayQH+3mXij7SnQXCEHTGr/QWj0VpokSPXYWe4=; b=I7LYYjqw3ZcUkrdr4l3vHkX5SE770zb0KvwaesJDoxi0VbwupakT6MmKiDDJW71wH9lo9H TaZq1meJbN5v1Yo8ov9oC12kaxQ3JUw8C5b9d7FIi0xGtWkof6cQr0Zs8OmJ1uoVaR7JJ2 ZKEoHOft+IEwH7euUA/8A8TRw7Qk+R0= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=I7LYYjqw DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990940; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vb80gJayQH+3mXij7SnQXCEHTGr/QWj0VpokSPXYWe4=; b=I7LYYjqw3ZcUkrdr4l3vHkX5SE770zb0KvwaesJDoxi0VbwupakT6MmKiDDJW71wH9lo9H TaZq1meJbN5v1Yo8ov9oC12kaxQ3JUw8C5b9d7FIi0xGtWkof6cQr0Zs8OmJ1uoVaR7JJ2 ZKEoHOft+IEwH7euUA/8A8TRw7Qk+R0= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D1BEB13939; Mon, 3 Mar 2025 08:35:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 2IisJNtpxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:39 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 4/8] btrfs: introduce a read path dedicated extent lock helper Date: Mon, 3 Mar 2025 19:05:12 +1030 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: D2C041F449 X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FUZZY_BLOCKED(0.00)[rspamd.com]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: Currently we're using btrfs_lock_and_flush_ordered_range() for both btrfs_read_folio() and btrfs_readahead(), but it has one critical problem for future subpage optimizations: - It will call btrfs_start_ordered_extent() to writeback the involved folios But remember we're calling btrfs_lock_and_flush_ordered_range() at read paths, meaning the folio is already locked by read path. If we really trigger writeback for those already locked folios, this will lead to a deadlock and writeback can not get the folio lock. Such dead lock is prevented by the fact that btrfs always keeps a dirty folio also uptodate, by either dirtying all blocks of the folio, or read the whole folio before dirtying. To prepare for the incoming patch which allows btrfs to skip full folio read if the buffered write is block aligned, we have to start by solving the possible deadlock first. Instead of blindly calling btrfs_start_ordered_extent(), introduce a newer helper, which is smarter in the following ways: - Only wait and flush the ordered extent if * The folio doesn't even have private set * Part of the blocks of the ordered extent are not uptodate This can happen by: * The folio writeback finished, then get invalidated. There are a lot of reason that a folio can get invalidated, from memory pressure to direct IO (which invalidates all folios of the range). But OE not yet finished We have to wait for the ordered extent, as the OE may contain to-be-inserted data checksum. Without waiting, our read can fail due to the missing csum. But either way, the OE should not need any extra flush inside the locked folio range. - Skip the ordered extent completely if * All the blocks are dirty This happens when OE creation is caused by a folio writeback whose file offset is before our folio. E.g. 16K page size and 4K block size 0 8K 16K 24K 32K |//////////////||///////| | The writeback of folio 0 created an OE for range [0, 24K), but since folio 16K is not fully uptodate, a read is triggered for folio 16K. The writeback will never happen (we're holding the folio lock for read), nor will the OE finish. Thus we must skip the range. * All the blocks are uptodate This happens when the writeback finished, but OE not yet finished. Since the blocks are already uptodate, we can skip the OE range. The newer helper, lock_extents_for_read() will do a loop for the target range by: 1) Lock the full range 2) If there is no ordered extent in the remaining range, exit 3) If there is an ordered extent that we can skip Skip to the end of the OE, and continue checking We do not trigger writeback nor wait for the OE. 4) If there is an ordered extent that we can not skip Unlock the whole extent range and start the ordered extent. And also update btrfs_start_ordered_extent() to add two more parameters: @nowriteback_start and @nowriteback_len, to prevent triggering flush for a certain range. This will allow us to handle the following case properly in the future: 16K page size, 4K btrfs block size: 0 4K 8K 12K 16K 20K 24K 28K 32K |/////////////////////////////||////////////////| | | |<-------------------- OE 2 ------------------->| |< OE 1 >| The folio has been written back before, thus we have an OE at [28K, 32K). Although the OE 1 finished its IO, the OE is not yet removed from IO tree. The folio got invalidated after writeback completed and before the ordered extent finished. And [16K, 24K) range is dirty and uptodate, caused by a block aligned buffered write (and future enhancements allowing btrfs to skip full folio read for such case). But writeback for folio 0 has began, thus it generated OE 2, covering range [0, 24K). Since the full folio 16K is not uptodate, if we want to read the folio, the existing btrfs_lock_and_flush_ordered_range() will dead lock, by: btrfs_read_folio() | Folio 16K is already locked |- btrfs_lock_and_flush_ordered_range() |- btrfs_start_ordered_extent() for range [16K, 24K) |- filemap_fdatawrite_range() for range [16K, 24K) |- extent_write_cache_pages() folio_lock() on folio 16K, deadlock. But now we will have the following sequence: btrfs_read_folio() | Folio 16K is already locked |- lock_extents_for_read() |- can_skip_ordered_extent() for range [16K, 24K) | Returned true, the range [16K, 24K) will be skipped. |- can_skip_ordered_extent() for range [28K, 32K) | Returned false. |- btrfs_start_ordered_extent() for range [28K, 32K) with [16K, 32K) as no writeback range No writeback for folio 16K will be triggered. And there will be no more possible deadlock on the same folio. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 186 +++++++++++++++++++++++++++++++++++++++- fs/btrfs/ordered-data.c | 23 +++-- fs/btrfs/ordered-data.h | 8 +- 3 files changed, 209 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ac771e06244e..9ca8172a39de 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1081,6 +1081,188 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, return 0; } +/* + * Check if we can skip waiting the @ordered extent covering the block + * at @fileoff. + * + * @fileoff: Both input and output. + * Input as the file offset where the check should start at. + * Output as where the next check should start at, + * if the function returns true. + * + * Return true if we can skip to @fileoff. The caller needs to check + * the new @fileoff value to make sure it covers the full range, before + * skipping the full OE. + * + * Return false if we must wait for the ordered extent. + */ +static bool can_skip_one_ordered_range(struct btrfs_inode *inode, + struct btrfs_ordered_extent *ordered, + u64 *fileoff) +{ + const struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct folio *folio; + const u32 blocksize = fs_info->sectorsize; + u64 cur = *fileoff; + bool ret; + + folio = filemap_get_folio(inode->vfs_inode.i_mapping, + cur >> PAGE_SHIFT); + + /* + * We should have locked the folio(s) for range [start, end], thus + * there must be a folio and it must be locked. + */ + ASSERT(!IS_ERR(folio)); + ASSERT(folio_test_locked(folio)); + + /* + * We several cases for the folio and OE combination: + * + * 1) Folio has no private flag + * The OE has all its IO done but not yet finished, and folio got + * invalidated. + * + * Have to wait for the OE to finish, as it may contain the + * to-be-inserted data checksum. + * Without the data checksum inserted into the csum tree, read + * will just fail with missing csum. + */ + if (!folio_test_private(folio)) { + ret = false; + goto out; + } + + /* + * 2) The first block is DIRTY. + * + * This means the OE is created by some other folios whose file pos is + * before us. And since we are holding the folio lock, the writeback of + * this folio can not start. + * + * We must skip the whole OE, because it will never start until + * we finished our folio read and unlocked the folio. + */ + if (btrfs_folio_test_dirty(fs_info, folio, cur, blocksize)) { + u64 range_len = min(folio_pos(folio) + folio_size(folio), + ordered->file_offset + ordered->num_bytes) - cur; + + ret = true; + /* + * At least inside the folio, all the remaining blocks should + * also be dirty. + */ + ASSERT(btrfs_folio_test_dirty(fs_info, folio, cur, range_len)); + *fileoff = ordered->file_offset + ordered->num_bytes; + goto out; + } + + /* + * 3) The first block is uptodate. + * + * At least the first block can be skipped, but we are still + * not fully sure. E.g. if the OE has some other folios in + * the range that can not be skipped. + * So we return true and update @next_ret to the OE/folio boundary. + */ + if (btrfs_folio_test_uptodate(fs_info, folio, cur, blocksize)) { + u64 range_len = min(folio_pos(folio) + folio_size(folio), + ordered->file_offset + ordered->num_bytes) - cur; + + /* + * The whole range to the OE end or folio boundary should also + * be uptodate. + */ + ASSERT(btrfs_folio_test_uptodate(fs_info, folio, cur, range_len)); + ret = true; + *fileoff = cur + range_len; + goto out; + } + + /* + * 4) The first block is not uptodate. + * + * This means the folio is invalidated after the writeback is finished, + * but by some other operations (e.g. block aligned buffered write) the + * folio is inserted into filemap. + * Very much the same as case 1). + */ + ret = false; +out: + folio_put(folio); + return ret; +} + +static bool can_skip_ordered_extent(struct btrfs_inode *inode, + struct btrfs_ordered_extent *ordered, + u64 start, u64 end) +{ + const u64 range_end = min(end, ordered->file_offset + ordered->num_bytes - 1); + u64 cur = max(start, ordered->file_offset); + + while (cur < range_end) { + bool can_skip; + + can_skip = can_skip_one_ordered_range(inode, ordered, &cur); + if (!can_skip) + return false; + } + return true; +} + +/* + * To make sure we get a stable view of extent maps for the involved range. + * This is for folio read paths (read and readahead), thus involved range + * should have all the folios locked. + */ +static void lock_extents_for_read(struct btrfs_inode *inode, u64 start, u64 end, + struct extent_state **cached_state) +{ + u64 cur_pos; + + /* Caller must provide a valid @cached_state. */ + ASSERT(cached_state); + + /* + * The range must at least be page aligned, as all read paths + * are folio based. + */ + ASSERT(IS_ALIGNED(start, PAGE_SIZE)); + ASSERT(IS_ALIGNED(end + 1, PAGE_SIZE)); + +again: + lock_extent(&inode->io_tree, start, end, cached_state); + cur_pos = start; + while (cur_pos < end) { + struct btrfs_ordered_extent *ordered; + + ordered = btrfs_lookup_ordered_range(inode, cur_pos, + end - cur_pos + 1); + /* + * No ordered extents in the range, and we hold the + * extent lock, no one can modify the extent maps + * in the range, we're safe to return. + */ + if (!ordered) + break; + + /* Check if we can skip waiting for the whole OE. */ + if (can_skip_ordered_extent(inode, ordered, start, end)) { + cur_pos = min(ordered->file_offset + ordered->num_bytes, + end + 1); + btrfs_put_ordered_extent(ordered); + continue; + } + + /* Now wait for the OE to finish. */ + unlock_extent(&inode->io_tree, start, end, cached_state); + btrfs_start_ordered_extent_nowriteback(ordered, start, end + 1 - start); + btrfs_put_ordered_extent(ordered); + /* We have unlocked the whole range, restart from the beginning. */ + goto again; + } +} + int btrfs_read_folio(struct file *file, struct folio *folio) { struct btrfs_inode *inode = folio_to_inode(folio); @@ -1091,7 +1273,7 @@ int btrfs_read_folio(struct file *file, struct folio *folio) struct extent_map *em_cached = NULL; int ret; - btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); + lock_extents_for_read(inode, start, end, &cached_state); ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL); unlock_extent(&inode->io_tree, start, end, &cached_state); @@ -2376,7 +2558,7 @@ void btrfs_readahead(struct readahead_control *rac) struct extent_map *em_cached = NULL; u64 prev_em_start = (u64)-1; - btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); + lock_extents_for_read(inode, start, end, &cached_state); while ((folio = readahead_folio(rac)) != NULL) btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start); diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 4aca7475fd82..fd33217e4b27 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -842,10 +842,12 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, /* * Start IO and wait for a given ordered extent to finish. * - * Wait on page writeback for all the pages in the extent and the IO completion - * code to insert metadata into the btree corresponding to the extent. + * Wait on page writeback for all the pages in the extent but not in + * [@nowriteback_start, @nowriteback_start + @nowriteback_len) and the + * IO completion code to insert metadata into the btree corresponding to the extent. */ -void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry) +void btrfs_start_ordered_extent_nowriteback(struct btrfs_ordered_extent *entry, + u64 nowriteback_start, u32 nowriteback_len) { u64 start = entry->file_offset; u64 end = start + entry->num_bytes - 1; @@ -865,8 +867,19 @@ void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry) * start IO on any dirty ones so the wait doesn't stall waiting * for the flusher thread to find them */ - if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags)) - filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, end); + if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags)) { + if (!nowriteback_len) { + filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, end); + } else { + if (start < nowriteback_start) + filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, + nowriteback_start - 1); + if (nowriteback_start + nowriteback_len < end) + filemap_fdatawrite_range(inode->vfs_inode.i_mapping, + nowriteback_start + nowriteback_len, + end); + } + } if (!freespace_inode) btrfs_might_wait_for_event(inode->root->fs_info, btrfs_ordered_extent); diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index be36083297a7..1e6b0b182b29 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -192,7 +192,13 @@ void btrfs_add_ordered_sum(struct btrfs_ordered_extent *entry, struct btrfs_ordered_sum *sum); struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *inode, u64 file_offset); -void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry); +void btrfs_start_ordered_extent_nowriteback(struct btrfs_ordered_extent *entry, + u64 nowriteback_start, u32 nowriteback_len); +static inline void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry) +{ + return btrfs_start_ordered_extent_nowriteback(entry, 0, 0); +} + int btrfs_wait_ordered_range(struct btrfs_inode *inode, u64 start, u64 len); struct btrfs_ordered_extent * btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset); From patchwork Mon Mar 3 08:35:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998309 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0D5B1EDA08 for ; Mon, 3 Mar 2025 08:36:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990981; cv=none; b=bOWgeUwRJCA12j5/rG9SE9UoZquMeDN2ktfTxl5ENUUsmg/GZxGV6Yv/Fs8QKyIjh5m6Yy6J/p8VodC5q1Zl89fI6ZdXZCWQPmxLt61AD+JmMcdtTy0w/lg6D69KhiEqYNoZEQSrFq2grMTYvyqJ2wQgF7tNmOiEb0T5Yt48HfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990981; c=relaxed/simple; bh=3XE+XxtIglNw3vWeW+HXm96xapnf3OjYjhneMQ1cU7s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FjWddFfE6w2QgjZfCvJVASam1JvoxiMIUrssuSlHWmh+3plZv/oLSfyHF1ceIBGSSNJyNlhoA4G2GR5Vo1HZC81HC/THpy1hZwe1QkF856AtuMupRN+hQAjpc+i4dhh2iKxqAZysNe43Jfv03k3BQZ16ckT/qE7Y/d6naEDt2S4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=DUjEO+jq; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=DUjEO+jq; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="DUjEO+jq"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="DUjEO+jq" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 51CF01F44E; Mon, 3 Mar 2025 08:35:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990942; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NcEy8pszsIq8WLeatUldToFWiUIV16FQgfd+TSztSeY=; b=DUjEO+jqeqSNEzQz+Lu5VJ9japqX6zLMuVUGukpk2S4ACVofN3rlVcyGosXOP+ED/rQgef ZRu2rjVWRz6IT/eCMS+a/8DxVxrEW4R7WsnIddFkHHuxswOskvD6X/J1W5U+dxO2zjVBop 8OZCXv3vyjyiDU4vXemcRw2L/e60yB8= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=DUjEO+jq DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990942; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NcEy8pszsIq8WLeatUldToFWiUIV16FQgfd+TSztSeY=; b=DUjEO+jqeqSNEzQz+Lu5VJ9japqX6zLMuVUGukpk2S4ACVofN3rlVcyGosXOP+ED/rQgef ZRu2rjVWRz6IT/eCMS+a/8DxVxrEW4R7WsnIddFkHHuxswOskvD6X/J1W5U+dxO2zjVBop 8OZCXv3vyjyiDU4vXemcRw2L/e60yB8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5179F13939; Mon, 3 Mar 2025 08:35:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id +AFRBd1pxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:41 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 5/8] btrfs: make btrfs_do_readpage() to do block-by-block read Date: Mon, 3 Mar 2025 19:05:13 +1030 Message-ID: <9383b8d5e361f0a09270266d0c1f74a9d12c336b.1740990125.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 51CF01F44E X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO Currently if a btrfs has its block size (the older sector size) smaller than the page size, btrfs_do_readpage() will handle the range extent by extent, this is good for performance as it doesn't need to re-lookup the same extent map again and again. (Although get_extent_map() already does extra cached em check, thus the optimization is not that obvious) This is totally fine and is a valid optimization, but it has an assumption that, there is no partial uptodate range in the page. Meanwhile there is an incoming feature, requiring btrfs to skip the full page read if a buffered write range covers a full block but not a full page. In that case, we can have a page that is partially uptodate, and the current per-extent lookup can not handle such case. So here we change btrfs_do_readpage() to do block-by-block read, this simplifies the following things: - Remove the need for @iosize variable Because we just use sectorsize as our increment. - Remove @pg_offset, and calculate it inside the loop when needed It's just offset_in_folio(). - Use a for() loop instead of a while() loop This will slightly reduce the read performance for subpage cases, but for the future where we need to skip already uptodate blocks, it should still be worthy. For block size == page size, this brings no performance change. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 38 ++++++++++++-------------------------- 1 file changed, 12 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9ca8172a39de..0b0af6e11196 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -942,14 +942,11 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); u64 start = folio_pos(folio); const u64 end = start + PAGE_SIZE - 1; - u64 cur = start; u64 extent_offset; u64 last_byte = i_size_read(inode); struct extent_map *em; int ret = 0; - size_t pg_offset = 0; - size_t iosize; - size_t blocksize = fs_info->sectorsize; + const size_t blocksize = fs_info->sectorsize; ret = set_folio_extent_mapped(folio); if (ret < 0) { @@ -960,24 +957,23 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, if (folio_contains(folio, last_byte >> PAGE_SHIFT)) { size_t zero_offset = offset_in_folio(folio, last_byte); - if (zero_offset) { - iosize = folio_size(folio) - zero_offset; - folio_zero_range(folio, zero_offset, iosize); - } + if (zero_offset) + folio_zero_range(folio, zero_offset, + folio_size(folio) - zero_offset); } bio_ctrl->end_io_func = end_bbio_data_read; begin_folio_read(fs_info, folio); - while (cur <= end) { + for (u64 cur = start; cur <= end; cur += blocksize) { enum btrfs_compression_type compress_type = BTRFS_COMPRESS_NONE; + unsigned long pg_offset = offset_in_folio(folio, cur); bool force_bio_submit = false; u64 disk_bytenr; u64 block_start; ASSERT(IS_ALIGNED(cur, fs_info->sectorsize)); if (cur >= last_byte) { - iosize = folio_size(folio) - pg_offset; - folio_zero_range(folio, pg_offset, iosize); - end_folio_read(folio, true, cur, iosize); + folio_zero_range(folio, pg_offset, end - cur + 1); + end_folio_read(folio, true, cur, end - cur + 1); break; } em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached); @@ -991,8 +987,6 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, compress_type = extent_map_compression(em); - iosize = min(extent_map_end(em) - cur, end - cur + 1); - iosize = ALIGN(iosize, blocksize); if (compress_type != BTRFS_COMPRESS_NONE) disk_bytenr = em->disk_bytenr; else @@ -1050,18 +1044,13 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, /* we've found a hole, just zero and go on */ if (block_start == EXTENT_MAP_HOLE) { - folio_zero_range(folio, pg_offset, iosize); - - end_folio_read(folio, true, cur, iosize); - cur = cur + iosize; - pg_offset += iosize; + folio_zero_range(folio, pg_offset, blocksize); + end_folio_read(folio, true, cur, blocksize); continue; } /* the get_extent function already copied into the folio */ if (block_start == EXTENT_MAP_INLINE) { - end_folio_read(folio, true, cur, iosize); - cur = cur + iosize; - pg_offset += iosize; + end_folio_read(folio, true, cur, blocksize); continue; } @@ -1072,12 +1061,9 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, if (force_bio_submit) submit_one_bio(bio_ctrl); - submit_extent_folio(bio_ctrl, disk_bytenr, folio, iosize, + submit_extent_folio(bio_ctrl, disk_bytenr, folio, blocksize, pg_offset); - cur = cur + iosize; - pg_offset += iosize; } - return 0; } From patchwork Mon Mar 3 08:35:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998303 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB8011EBA08 for ; Mon, 3 Mar 2025 08:35:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990947; cv=none; b=hG9c9oeks9hlDthz2dUEYK4eV0wXzyBZH3wii1s1KYdTIiMX4/WaEteiMy24/V7hmUJk0AOTp3/KNgohYsEuNfXjIK70pLq3/p5BpfVL3jCUJHaGsecJFxoA0/sxCUT94Nd02nK6H05ZTE1e5wuSubH3z5en2AagRErSn0GGKIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990947; c=relaxed/simple; bh=GRZUTY+7d81RsBiXA1iNcuftmAnbdxK6BJAqFYj8l0A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C4GFQmhLhBmcCXeBvHROp+v1entjF3j2HCAGEjk+XcpSTZpBWr2qEyBCyiU37TfkeJ33RADSckdo3L8+09Ea6AjeXRNj8PzFNLueXPEeIk9tdEeI6OB/llJU4qToSpPRkh1LIxbWhmpYRMw5/eW/aXjVIDvz98EBDdSQx5b5RyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=mj8ua9h1; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=mj8ua9h1; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="mj8ua9h1"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="mj8ua9h1" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C3E8921168; Mon, 3 Mar 2025 08:35:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990943; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3xIGlFzWKD1gNi+gSIQnJ528NYhJqG5ws4DL1BE1lqE=; b=mj8ua9h1/GPa4Ftm+YQXNNm+6xKlObEssUSwRBUdCYnJv7Kx3nof2a2RqLpHVO19jclLfT Iq7G+YePYThO5M6iNTd+8RGvIpseq4ST+cET9h+WTqSUKqTKNbzYb5abgH5S0lOYZKcd2U zTnkB0d2oP6Lbviur6SKYy8ZAQRDS9Y= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=mj8ua9h1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990943; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3xIGlFzWKD1gNi+gSIQnJ528NYhJqG5ws4DL1BE1lqE=; b=mj8ua9h1/GPa4Ftm+YQXNNm+6xKlObEssUSwRBUdCYnJv7Kx3nof2a2RqLpHVO19jclLfT Iq7G+YePYThO5M6iNTd+8RGvIpseq4ST+cET9h+WTqSUKqTKNbzYb5abgH5S0lOYZKcd2U zTnkB0d2oP6Lbviur6SKYy8ZAQRDS9Y= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C494B13939; Mon, 3 Mar 2025 08:35:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 4K9dId5pxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:42 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 6/8] btrfs: allow buffered write to avoid full page read if it's block aligned Date: Mon, 3 Mar 2025 19:05:14 +1030 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: C3E8921168 X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: [BUG] Since the support of block size (sector size) < page size for btrfs, test case generic/563 fails with 4K block size and 64K page size: --- tests/generic/563.out 2024-04-25 18:13:45.178550333 +0930 +++ /home/adam/xfstests-dev/results//generic/563.out.bad 2024-09-30 09:09:16.155312379 +0930 @@ -3,7 +3,8 @@ read is in range write is in range write -> read/write -read is in range +read has value of 8388608 +read is NOT in range -33792 .. 33792 write is in range ... [CAUSE] The test case creates a 8MiB file, then buffered write into the 8MiB using 4K block size, to overwrite the whole file. On 4K page sized systems, since the write range covers the full block and page, btrfs will no bother reading the page, just like what XFS and EXT4 do. But on 64K page sized systems, although the 4K sized write is still block aligned, it's not page aligned any more, thus btrfs will read the full page, which will be accounted by cgroup and fail the test. As the test case itself expects such 4K block aligned write should not trigger any read. Such expected behavior is an optimization to reduce folio reads when possible, and unfortunately btrfs does not implement such optimization. [FIX] To skip the full page read, we need to do the following modification: - Do not trigger full page read as long as the buffered write is block aligned This is pretty simple by modifying the check inside prepare_uptodate_page(). - Skip already uptodate blocks during full page read Or we can lead to the following data corruption: 0 32K 64K |///////| | Where the file range [0, 32K) is dirtied by buffered write, the remaining range [32K, 64K) is not. When reading the full page, since [0,32K) is only dirtied but not written back, there is no data extent map for it, but a hole covering [0, 64k). If we continue reading the full page range [0, 64K), the dirtied range will be filled with 0 (since there is only a hole covering the whole range). This causes the dirtied range to get lost. With this optimization, btrfs can pass generic/563 even if the page size is larger than fs block size. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/file.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0b0af6e11196..57906c226220 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -976,6 +976,10 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, end_folio_read(folio, true, cur, end - cur + 1); break; } + if (btrfs_folio_test_uptodate(fs_info, folio, cur, blocksize)) { + end_folio_read(folio, true, cur, blocksize); + continue; + } em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached); if (IS_ERR(em)) { end_folio_read(folio, false, cur, end + 1 - cur); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fe9e98f916f4..23a8fabd390e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -804,14 +804,15 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 { u64 clamp_start = max_t(u64, pos, folio_pos(folio)); u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); + const u32 blocksize = inode_to_fs_info(inode)->sectorsize; int ret = 0; if (folio_test_uptodate(folio)) return 0; if (!force_uptodate && - IS_ALIGNED(clamp_start, PAGE_SIZE) && - IS_ALIGNED(clamp_end, PAGE_SIZE)) + IS_ALIGNED(clamp_start, blocksize) && + IS_ALIGNED(clamp_end, blocksize)) return 0; ret = btrfs_read_folio(NULL, folio); From patchwork Mon Mar 3 08:35:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998305 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06FA11E9B31 for ; Mon, 3 Mar 2025 08:35:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990953; cv=none; b=NdfYzCD1B9O6ITOiAZIkIDFB060x/DxNwPJgkbaGOPDyQmK06E8Y7xIKwGIY6Kvv4SPnTvAUuo5afeZLfW1eeq4T1pmOTL2p7dmVUxUnttxgRd7/ZFpMynQyAZpX6R/QvFhK+JqMB8q4eEXf5tljL11i3ig5hD9dTQchHTN8Npo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990953; c=relaxed/simple; bh=7f71+FfVIvmxfb1H7dp+XpUURvWMHgeP0lC6jA4ioaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fk69o7KAMW2tNsQB9zMjIuVEABcMvAO8yoc/dXv8sXv+quFgoZUWv5nlyz5FqkIGAAbcnTXt4uS/+MgAWb4C6LXibHaZWnT3Bd/DwuOR2pRynjKRovqJXELeDkrEYUV6d9qXAtdObGhR8/DBum6ynAnmFJegUGkf+wXF76Xv+Bs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=s7yIP4+v; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=s7yIP4+v; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="s7yIP4+v"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="s7yIP4+v" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 40EFC2117A; Mon, 3 Mar 2025 08:35:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990945; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qd0+jIKIKuvH/EHdK/Y1rCQtBmjvQ1M0AVPDy7ldCww=; b=s7yIP4+vZEJSYqopCkVwqyqbbJCE7eQhBvP4CvWfEPzvM7G+NMxk7Bpz3LLo2YT92iUeI3 /EQEfEuTAhiqie4QY4ZRa8bM0PryPdXytwc39j2r+S44Ntm7+PgdoHQbE3hF5MopBdvd/D xk3eYLLQGie6OPiz933+dn4hu7jVGVQ= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=s7yIP4+v DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990945; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qd0+jIKIKuvH/EHdK/Y1rCQtBmjvQ1M0AVPDy7ldCww=; b=s7yIP4+vZEJSYqopCkVwqyqbbJCE7eQhBvP4CvWfEPzvM7G+NMxk7Bpz3LLo2YT92iUeI3 /EQEfEuTAhiqie4QY4ZRa8bM0PryPdXytwc39j2r+S44Ntm7+PgdoHQbE3hF5MopBdvd/D xk3eYLLQGie6OPiz933+dn4hu7jVGVQ= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4209613939; Mon, 3 Mar 2025 08:35:44 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id YOKiAeBpxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:44 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 7/8] btrfs: allow inline data extents creation if block size < page size Date: Mon, 3 Mar 2025 19:05:15 +1030 Message-ID: <0010fc6e27dbde67022e63e65f68bdfa78202472.1740990125.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 40EFC2117A X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FUZZY_BLOCKED(0.00)[rspamd.com]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO Previously inline data extents creation is disable if the block size (previously called sector size) is smaller than the page size, for the following reasons: - Possible mixed inline and regular data extents However this is also the same if the block size matches the page size, thus we do not treat mixed inline and regular extents as an error. And the chance to cause mixed inline and regular data extents are not even increased, it has the same requirement (compressed inline data extent covering the whole first block, followed by regular extents). - Unable to handle async/inline delalloc range for block size < page size cases This is already fixed since commit 1d2fbb7f1f9e ("btrfs: allow compression even if the range is not page aligned"). This was the major technical blockage, but it's no longer a blockage anymore. With the major technical blockage already removed, we can enable inline data extents creation no matter the block size nor the page size, allowing the btrfs to have the same capacity for all block sizes. Reviewed-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c7b0f1173722..a58505f037b5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -566,19 +566,6 @@ static bool can_cow_file_range_inline(struct btrfs_inode *inode, if (offset != 0) return false; - /* - * Due to the page size limit, for subpage we can only trigger the - * writeback for the dirty sectors of page, that means data writeback - * is doing more writeback than what we want. - * - * This is especially unexpected for some call sites like fallocate, - * where we only increase i_size after everything is done. - * This means we can trigger inline extent even if we didn't want to. - * So here we skip inline extent creation completely. - */ - if (fs_info->sectorsize != PAGE_SIZE) - return false; - /* Inline extents are limited to sectorsize. */ if (size > fs_info->sectorsize) return false; From patchwork Mon Mar 3 08:35:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13998310 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 718B91EB5D0 for ; Mon, 3 Mar 2025 08:36:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990988; cv=none; b=YUubgO74Eic9598QGmgRVGF93TBGyQTsnUlxvtMflmtSc3m7WgV2kVMbDcSpl/TeNczlae1ZPkxJhZRQOZq+wdY7KzgO6j00p4MsmERRQ1Rz0UySR+mQeOftZDPpnnHQOeQw8OcCBnIHkXpW/ajB5CgL96xDbO8vV/j0B0f0ua4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740990988; c=relaxed/simple; bh=TjTxfrFRhEYDIEKZ4++789WB73xgVRnkm+lkL3DWytk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NvEXhq3zgVeebGIySmf0Z9gMfKotTxXFqS21P6ohMZPZJK76CBqiHTgLtfpX4o44z4OPVymTp/wy4CIta2bLDaZmcL9mUyRacfNVO9yF+AnLs9msCuvoAogc7hFneaBh9fxX9VizPBZIiO7gtFo3nLgvxjNtmBmz+/z0VipEXv4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=tC2D9rgX; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=tC2D9rgX; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="tC2D9rgX"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="tC2D9rgX" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B40761F399; Mon, 3 Mar 2025 08:35:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990946; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q3Z5cn3I4QUQ5nbSE8nIWU2dNcopeHABWkKJiYMuIzU=; b=tC2D9rgXJlz0Bde5ldFfZ27yxm/5UCMxptEWVCJmv0GZ+ndScDuVCb4NukpkcYV6jMfSEd aIl2LcZ0V43DRWgX5ACDZzQ7VHRR8ZAq9NEjHw3zUXyHRAoMoy5VpHkg5ycThmvJtTUgDW 6GYOJYEb6n2LvqQl7CYu19Wu70InqeA= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740990946; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q3Z5cn3I4QUQ5nbSE8nIWU2dNcopeHABWkKJiYMuIzU=; b=tC2D9rgXJlz0Bde5ldFfZ27yxm/5UCMxptEWVCJmv0GZ+ndScDuVCb4NukpkcYV6jMfSEd aIl2LcZ0V43DRWgX5ACDZzQ7VHRR8ZAq9NEjHw3zUXyHRAoMoy5VpHkg5ycThmvJtTUgDW 6GYOJYEb6n2LvqQl7CYu19Wu70InqeA= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id B45F313939; Mon, 3 Mar 2025 08:35:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id aOdvHeFpxWdybwAAD6G6ig (envelope-from ); Mon, 03 Mar 2025 08:35:45 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Filipe Manana Subject: [PATCH v3 8/8] btrfs: remove the subpage related warning message Date: Mon, 3 Mar 2025 19:05:16 +1030 Message-ID: <06a7864ebd56d0d3a4678f47b75287210d329d64.1740990125.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Score: -2.80 X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.996]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:mid,suse.com:email]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO X-Spam-Level: Since the initial enablement of block size < page size support for btrfs in v5.15, we have hit several milestones for block size < page size (subpage) support: - RAID56 subpage support In v5.19 - Refactored scrub support to support subpage better In v6.4 - Block perfect (previously requires page aligned ranges) compressed write In v6.13 - Various error handling fixes involving subpage In v6.14 Finally the only missing feature is the pretty simple and harmless inlined data extent creation, just added in previous patches. Now btrfs has all of its features ready for both regular and subpage cases, there is no reason to output a warning about the experimental subpage support, and we can finally remove it now. Acked-by: Filipe Manana Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 52c2335ef62f..0cb559448933 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3410,11 +3410,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device */ fs_info->max_inline = min_t(u64, fs_info->max_inline, fs_info->sectorsize); - if (sectorsize < PAGE_SIZE) - btrfs_warn(fs_info, - "read-write for sector size %u with page size %lu is experimental", - sectorsize, PAGE_SIZE); - ret = btrfs_init_workqueues(fs_info); if (ret) goto fail_sb_buffer;