From patchwork Sun Feb 23 23:46:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987253 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D1CC10E4 for ; Sun, 23 Feb 2025 23:46:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354407; cv=none; b=luEQwNZFMF3CD3CDHNyqaxCjb5ymCTggBjzVt0mCbaX8bUeG9nZHEqfk+0lj51NYJHKF91+vCfFoYAquFl3sp+aTaooqUuNLylLXXmoSADn4qiot2CA73+JBdswtEfimDviacbmh1XhqQ86Vj10Rjy7DNIYdozlZyqSqhAwMGEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354407; c=relaxed/simple; bh=mTjmuqA5BArVsHcb8QLce6phzxvyC8W226UB8rVGJvk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tsgs8wlLOuLSjSgsAsY0bKnOb65O58EPiEC/MIYtmm5RrcVwqylFUNQ7g3+GQEz50lBVJv6fPgd+c6IIabaz83c3TaqsulMAuI8mXWcytCKisSnrKgx6QPTulf8eMd8g0w2kBjhiK3/BBM5sCy+LJxfrfcjRkl8y9YmkMCXfXpY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=gQTYD8uV; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=gQTYD8uV; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="gQTYD8uV"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="gQTYD8uV" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8E6FD1F383 for ; Sun, 23 Feb 2025 23:46:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354402; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3WUCmoGyGI+pXz/FPiyG9ovIsPNjQyoiLpNU/cHXNzM=; b=gQTYD8uVb24diBGHZuovrpUfJuRbIEvAdP9XxLmEPK6NtSfJEWl3XzafBpiIHHjWiLdNpz XQzQ2bJnb6Kxls7enK7kavsd1pJRt6K2kjiqVmiTn+8P80Pd/DcUFrki9s+lu96Z12crJa ky+bMVnUdQRzKg05E+mNWIMIwGRR2vs= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=gQTYD8uV DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354402; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3WUCmoGyGI+pXz/FPiyG9ovIsPNjQyoiLpNU/cHXNzM=; b=gQTYD8uVb24diBGHZuovrpUfJuRbIEvAdP9XxLmEPK6NtSfJEWl3XzafBpiIHHjWiLdNpz XQzQ2bJnb6Kxls7enK7kavsd1pJRt6K2kjiqVmiTn+8P80Pd/DcUFrki9s+lu96Z12crJa ky+bMVnUdQRzKg05E+mNWIMIwGRR2vs= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C259613A42 for ; Sun, 23 Feb 2025 23:46:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id EG5DIGGzu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:41 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/7] btrfs: prevent inline data extents read from touching blocks beyond its range Date: Mon, 24 Feb 2025 10:16:16 +1030 Message-ID: <31a8b02ec99fd3bf6399b7f543ef6de786665fd2.1740354271.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 8E6FD1F383 X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO Currently reading an inline data extent will zero out all the remaining range in the page. This is not yet causing problems even for block size < page size (subpage) cases because: 1) An inline data extent always starts at file offset 0 Meaning at page read, we always read the inline extent first, before any other blocks in the page. Then later blocks are properly read out and re-fill the zeroed out ranges. 2) Currently btrfs will read out the whole page if a buffered write is not page aligned So a page is either fully uptodate at buffered write time (covers the whole page), or we will read out the whole page first. Meaning there is nothing to lose for such an inline extent read. But it's still not ideal: - We're zeroing out the page twice One done by read_inline_extent()/uncompress_inline(), one done by btrfs_do_readpage() for ranges beyond i_size. - We're touching blocks that doesn't belong to the inline extent In the incoming patches, we can have a partial uptodate folio, that some dirty blocks can exist while the page is not fully uptodate: The page size is 16K and block size is 4K: 0 4K 8K 12K 16K | | |/////////| | And range [8K, 12K) is dirtied by a buffered write, the remaining blocks are not uptodate. If range [0, 4K) contains an inline data extent, and we try to read the whole page, the current behavior will overwrite range [8K, 12K) with zero and cause data loss. So to make the behavior more consistent and in preparation for future changes, limit the inline data extents read to only zero out the range inside the first block, not the whole page. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/inode.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c432ccfba56e..fe2c6038064a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6788,6 +6788,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, { int ret; struct extent_buffer *leaf = path->nodes[0]; + const u32 sectorsize = leaf->fs_info->sectorsize; char *tmp; size_t max_size; unsigned long inline_size; @@ -6816,17 +6817,19 @@ static noinline int uncompress_inline(struct btrfs_path *path, * cover that region here. */ - if (max_size < PAGE_SIZE) - folio_zero_range(folio, max_size, PAGE_SIZE - max_size); + if (max_size < sectorsize) + folio_zero_range(folio, max_size, sectorsize - max_size); kfree(tmp); return ret; } -static int read_inline_extent(struct btrfs_path *path, struct folio *folio) +static int read_inline_extent(struct btrfs_fs_info *fs_info, + struct btrfs_path *path, struct folio *folio) { struct btrfs_file_extent_item *fi; void *kaddr; size_t copy_size; + const u32 sectorsize = fs_info->sectorsize; if (!folio || folio_test_uptodate(folio)) return 0; @@ -6844,8 +6847,8 @@ static int read_inline_extent(struct btrfs_path *path, struct folio *folio) read_extent_buffer(path->nodes[0], kaddr, btrfs_file_extent_inline_start(fi), copy_size); kunmap_local(kaddr); - if (copy_size < PAGE_SIZE) - folio_zero_range(folio, copy_size, PAGE_SIZE - copy_size); + if (copy_size < sectorsize) + folio_zero_range(folio, copy_size, sectorsize - copy_size); return 0; } @@ -7020,7 +7023,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, ASSERT(em->disk_bytenr == EXTENT_MAP_INLINE); ASSERT(em->len == fs_info->sectorsize); - ret = read_inline_extent(path, folio); + ret = read_inline_extent(fs_info, path, folio); if (ret < 0) goto out; goto insert; From patchwork Sun Feb 23 23:46:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987256 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E493720013A for ; Sun, 23 Feb 2025 23:46:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354417; cv=none; b=BBeaEJ+JPQ4fTg2D+6ac1fm2OkQqWS2Rkdi0qOuOe+jTQiLQuiBql/wBdpJ4s4rVgoFYxAxvqccZh7Y1CS/Fwb8p5Mbd5BxXnozAjb04I4UvCneB/Q8ixF0Rl7wFWz8WQ3VQPMCBTpN/MDEj2oXSeNBYuOtZGIcqyLL8n/pW8Ws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354417; c=relaxed/simple; bh=i1NREQEOqwedTH9zvqm/iJi62HhGcl4DAz/l318TYsM=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JKyoQarr3fjnPIU3M0jY1JP1VuskoL61Gn128ASGz2mHjUeASoKosbPsw/hBZkRBh0swB7jPfQzX26nDbUvEBGsTkmbs4g3HBz6j9JVkl/ufS82QQvitlmHcoG6KT2AOYnMa1XosQiQZhz4goOYyHEpPKHhdXE5VufOP+kA8E40= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=SM+oztBN; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=SM+oztBN; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="SM+oztBN"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="SM+oztBN" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D6B9021172 for ; Sun, 23 Feb 2025 23:46:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354403; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mfgUKCIFdC4eL72syVHy8yBytO3KZFhikalpDq23VXY=; b=SM+oztBNJq+K4gQA9KXMF65gWPfE2uD4zNpnQVnpJWTRLUfhOftnWMC2LoLkgDyFhioLws 5HIay4YHndvuVvpOuu41z9MsyTRpr3n+JeWKs1nbTesittNahqLev2x++RY6YoWFUDtE0g IeQ5O5n/mmwY6j7ZbUoZQUpF0zkwkMo= Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354403; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mfgUKCIFdC4eL72syVHy8yBytO3KZFhikalpDq23VXY=; b=SM+oztBNJq+K4gQA9KXMF65gWPfE2uD4zNpnQVnpJWTRLUfhOftnWMC2LoLkgDyFhioLws 5HIay4YHndvuVvpOuu41z9MsyTRpr3n+JeWKs1nbTesittNahqLev2x++RY6YoWFUDtE0g IeQ5O5n/mmwY6j7ZbUoZQUpF0zkwkMo= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 11AD713A42 for ; Sun, 23 Feb 2025 23:46:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id GHq/MGKzu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:42 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/7] btrfs: fix the qgroup data free range for inline data extents Date: Mon, 24 Feb 2025 10:16:17 +1030 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Score: -2.80 X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:mid,suse.com:email]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO X-Spam-Level: Inside function __cow_file_range_inline() since the inlined data no longer takes any data space, we need to free up the reserved space. However the code is still using the old page size == sector size assumption, and will not handle subpage case well. Thankfully it is not going to cause any problems because we have two extra safe nets: - Inline data extents creation is disable for sector size < page size cases for now But it won't stay that for long. - btrfs_qgroup_free_data() will only clear ranges which are already reserved So even if we pass a range larger than what we need, it should still be fine, especially there is only reserved space for a single block at file offset 0 for an inline data extent. But just for the sake of consistentcy, fix the call site to use sectorsize instead of page size. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fe2c6038064a..0efe9f005149 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -672,7 +672,7 @@ static noinline int __cow_file_range_inline(struct btrfs_inode *inode, * And at reserve time, it's always aligned to page size, so * just free one page here. */ - btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE, NULL); + btrfs_qgroup_free_data(inode, NULL, 0, fs_info->sectorsize, NULL); btrfs_free_path(path); btrfs_end_transaction(trans); return ret; From patchwork Sun Feb 23 23:46:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987255 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27E6E2054FD for ; Sun, 23 Feb 2025 23:46:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354414; cv=none; b=H5PVdPQLptFhguD9TeALKgjdW/I4d21TDzewpY8VD4nVHRU1fzGAPsoVt2wqyue2V0SzDkOusqdrtz52SF5cJCzWYU0V7jdu9quxfVmF3GW8IEUgvOZnAvGOh1M1k4WqSvz04yeMaMDKNrcVBTWo8+wbJFVGsMYHgxcsyZvDvco= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354414; c=relaxed/simple; bh=nXIVy55cwJFuS9Fxni81Rby/hoUQxMgWxpNQrCM9BF8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LjqLoD9ohK3ByEFE2KPjcuCq2bPSODeNXXudyabhQ8gAbzRzrnyP3MfZ2mgDYMT28+6ZRu55jGSIakcZqflc5BOjUg48DvNfNQkdyFIfI++a3KeuA8wfxb4krd4LCtTlzGlWoradvC7ZA3BzLl7cLfOenif/vEWDiO+5fBrIH7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=tnk2KOvX; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=tnk2KOvX; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="tnk2KOvX"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="tnk2KOvX" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 285F81F385 for ; Sun, 23 Feb 2025 23:46:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354405; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zxun1I+xww0GQUB5Klak+BBzxb2RUkEzrieBl7nOLgw=; b=tnk2KOvXnZMR3M15/HiINPQO8dI8E7lR9yHp7q0+StwXP7pLZMP4Wsv99wI7Xd3VTI5Se6 RW6tyTaLkaWgJ3ffKhKFvIbwaPs/JyUihZ7IbQnh/KI6RcP7AAj0AwdBBOD011qCdHBwjj k6a+hNY/QrlMLtSIFPxsBxnAvJylono= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354405; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zxun1I+xww0GQUB5Klak+BBzxb2RUkEzrieBl7nOLgw=; b=tnk2KOvXnZMR3M15/HiINPQO8dI8E7lR9yHp7q0+StwXP7pLZMP4Wsv99wI7Xd3VTI5Se6 RW6tyTaLkaWgJ3ffKhKFvIbwaPs/JyUihZ7IbQnh/KI6RcP7AAj0AwdBBOD011qCdHBwjj k6a+hNY/QrlMLtSIFPxsBxnAvJylono= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5AE9313A42 for ; Sun, 23 Feb 2025 23:46:44 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 8Fn/BmSzu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:44 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/7] btrfs: introduce a read path dedicated extent lock helper Date: Mon, 24 Feb 2025 10:16:18 +1030 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Score: -2.80 X-Spam-Flag: NO Currently we're using btrfs_lock_and_flush_ordered_range() for both btrfs_read_folio() and btrfs_readahead(), but it has one critical problem for future subpage enhancements: - It will call btrfs_start_ordered_extent() to writeback the involved folios But remember we're calling btrfs_lock_and_flush_ordered_range() at read paths, meaning the folio is already locked by read path. If we really trigger writeback, this will lead to a deadlock and writeback can not hold the folio lock. Such dead lock is prevented by the fact that btrfs always keeps a dirty folio also uptodate, by either dirtying all blocks of the folio, or read the folio before dirtying. But it's not the common behavior, as XFS/EXT4 both allow the folio to be dirty but not uptodate, by allowing buffered write to dirty a full block without reading the full folio. Instead of blindly calling btrfs_start_ordered_extent(), introduce a newer helper, which is smarter in the following ways: - Only wait and flush the ordered extent if * The folio doesn't even have private set * Part of the blocks of the ordered extent is not uptodate This can happen by: * Folio writeback finished, then get invalidated. But OE not yet finished * Direct IO We have to wait for the ordered extent, as it may contain to-be-inserted data checksum. Without waiting, our read can fail due to the missing csum. But either way, the OE should not need any extra flush inside the locked folio range. - Skip the ordered extent completely if * All the blocks are dirty This happens when OE creation is caused by previous folio. The writeback will never happen (we're holding the folio lock for read), nor will the OE finish. Thus we must skip the range. * All the blocks are uptodate This happens when the writeback finished, but OE not yet finished. Since the blocks are already uptodate, we can skip the OE range. The newer helper, lock_extents_for_read() will do a loop for the target range by: 1) Lock the full range 2) If there is no ordered extent in the remaining range, exit 3) If there is an ordered extent that we can skip Skip to the end of the OE, and continue checking We do not trigger writeback nor wait for the OE. 4) If there is an ordered extent that we can not skip Unlock the whole extent range and start the ordered extent. And also update btrfs_start_ordered_extent() to add two more parameters: @nowriteback_start and @nowriteback_len, to prevent triggering flush for a certain range. This will allow us to handle the following case properly in the future: 16K page size, 4K btrfs block size: 16K 20K 24K 28K 32K |/////////////////////////////| | | |<----------- OE 2----------->| |<--- OE 1 --->| The folio has been written back before, thus we have an OE at [28K, 32K). Although the OE 1 finished its IO, the OE is not yet removed from IO tree. Later the folio got invalidated, but OE still exists. And [16K, 24K) range is dirty and uptodate, caused by a block aligned buffered write (and future enhancements allowing btrfs to skip full folio read for such case). Furthermore, OE 2 is created covering range [16K, 24K) by the writeback of previous folio. Since the full folio is not uptodate, if we want to read the folio, the existing btrfs_lock_and_flush_ordered_range() will dead lock, by: btrfs_read_folio() | Folio 16K is already locked |- btrfs_lock_and_flush_ordered_range() |- btrfs_start_ordered_extent() for range [16K, 24K) |- filemap_fdatawrite_range() for range [16K, 24K) |- extent_write_cache_pages() folio_lock() on folio 16K, deadlock. But now we will have the following sequence: btrfs_read_folio() | Folio 16K is already locked |- lock_extents_for_read() |- can_skip_ordered_extent() for range [16K, 24K) | Returned true, the range [16K, 24K) will be skipped. |- can_skip_ordered_extent() for range [28K, 32K) | Returned false. |- btrfs_start_ordered_extent() for range [28K, 32K) with [16K, 32K) as no writeback range No writeback for folio 16K will be triggered. And there will be no more possible deadlock on the same folio. Signed-off-by: Qu Wenruo --- fs/btrfs/defrag.c | 2 +- fs/btrfs/direct-io.c | 2 +- fs/btrfs/extent_io.c | 183 +++++++++++++++++++++++++++++++++++++++- fs/btrfs/file.c | 4 +- fs/btrfs/inode.c | 4 +- fs/btrfs/ordered-data.c | 29 +++++-- fs/btrfs/ordered-data.h | 3 +- 7 files changed, 210 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index 18f0704263f3..4b89094da3de 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -902,7 +902,7 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t break; folio_unlock(folio); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); btrfs_put_ordered_extent(ordered); folio_lock(folio); /* diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c index e8548de319e7..fb6df17fb55c 100644 --- a/fs/btrfs/direct-io.c +++ b/fs/btrfs/direct-io.c @@ -103,7 +103,7 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend, */ if (writing || test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); else ret = nowait ? -EAGAIN : -ENOTBLK; btrfs_put_ordered_extent(ordered); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7b0aa332aedc..a6ffee6f6fd9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1081,6 +1081,185 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, return 0; } +/* + * Check if we can skip waiting the @ordered extent covering the block + * at file pos @cur. + * + * Return true if we can skip to @next_ret. The caller needs to check + * the @next_ret value to make sure if covers the full range, before + * skipping the OE. + * + * Return false if we must wait for the ordered extent. + * + * @cur: The start file offset that we have locked folio for read. + * @next_ret: If we return true, this indiciates the next check start + * range. + */ +static bool can_skip_one_ordered_range(struct btrfs_inode *binode, + struct btrfs_ordered_extent *ordered, + u64 cur, u64 *next_ret) +{ + const struct btrfs_fs_info *fs_info = binode->root->fs_info; + struct folio *folio; + const u32 blocksize = fs_info->sectorsize; + u64 range_len; + bool ret; + + folio = filemap_get_folio(binode->vfs_inode.i_mapping, + cur >> PAGE_SHIFT); + + /* + * We should have locked the folio(s) for range [start, end], thus + * there must be a folio and it must be locked. + */ + ASSERT(!IS_ERR(folio)); + ASSERT(folio_test_locked(folio)); + + /* + * We several cases for the folio and OE combination: + * + * 0) Folio has no private flag + * The OE has all its IO done but not yet finished, and folio got + * invalidated. Or direct IO. + * + * Have to wait for the OE to finish, as it may contain the + * to-be-inserted data checksum. + * Without the data checksum inserted into csum tree, read + * will just fail with missing csum. + */ + if (!folio_test_private(folio)) { + ret = false; + goto out; + } + range_len = min(folio_pos(folio) + folio_size(folio), + ordered->file_offset + ordered->num_bytes) - cur; + + /* + * 1) The first block is DIRTY. + * + * This means the OE is created by some folio before us, but writeback + * has not started. + * We can and must skip the whole OE, because it will never start until + * we finished our folio read and unlocked the folio. + */ + if (btrfs_folio_test_dirty(fs_info, folio, cur, blocksize)) { + ret = true; + /* + * At least inside the folio, all the remaining blocks should + * also be dirty. + */ + ASSERT(btrfs_folio_test_dirty(fs_info, folio, cur, range_len)); + *next_ret = ordered->file_offset + ordered->num_bytes; + goto out; + } + + /* + * 2) The first block is uptodate. + * + * At least the first block can be skipped, but we are still + * not full sure. E.g. if the OE has some other folios in + * the range that can not be skipped. + * So we return true and update @next_ret to the OE/folio boundary. + */ + if (btrfs_folio_test_uptodate(fs_info, folio, cur, blocksize)) { + u64 range_len = min(folio_pos(folio) + folio_size(folio), + ordered->file_offset + ordered->num_bytes) - cur; + + /* + * The whole range to the OE end or folio boundary should also + * be uptodate. + */ + ASSERT(btrfs_folio_test_uptodate(fs_info, folio, cur, range_len)); + ret = true; + *next_ret = cur + range_len; + goto out; + } + + /* + * 3) The first block is not uptodate. + * + * This means the folio is invalidated after the OE finished, or direct IO. + * Very much the same as case 1), just with private flag set. + */ + ret = false; +out: + folio_put(folio); + return ret; +} + +static bool can_skip_ordered_extent(struct btrfs_inode *binode, + struct btrfs_ordered_extent *ordered, + u64 start, u64 end) +{ + u64 range_end = min(end, ordered->file_offset + ordered->num_bytes - 1); + u64 range_start = max(start, ordered->file_offset); + u64 cur = range_start; + + while (cur < range_end) { + bool can_skip; + u64 next_start; + + can_skip = can_skip_one_ordered_range(binode, ordered, cur, + &next_start); + if (!can_skip) + return false; + cur = next_start; + } + return true; +} + +/* + * To make sure we get a stable view of extent maps for the involved range. + * This is for folio read paths (read and readahead), thus involved range + * should have all the folios locked. + */ +static void lock_extents_for_read(struct btrfs_inode *binode, u64 start, u64 end, + struct extent_state **cached_state) +{ + struct btrfs_ordered_extent *ordered; + u64 cur_pos; + + /* Caller must provide a valid @cached_state. */ + ASSERT(cached_state); + + /* + * The range must at least be page aligned, as all read paths + * are folio based. + */ + ASSERT(IS_ALIGNED(start, PAGE_SIZE) && IS_ALIGNED(end + 1, PAGE_SIZE)); + +again: + lock_extent(&binode->io_tree, start, end, cached_state); + cur_pos = start; + while (cur_pos < end) { + ordered = btrfs_lookup_ordered_range(binode, cur_pos, + end - cur_pos + 1); + /* + * No ordered extents in the range, and we hold the + * extent lock, no one can modify the extent maps + * in the range, we're safe to return. + */ + if (!ordered) + break; + + /* Check if we can skip waiting for the whole OE. */ + if (can_skip_ordered_extent(binode, ordered, start, end)) { + cur_pos = min(ordered->file_offset + ordered->num_bytes, + end + 1); + btrfs_put_ordered_extent(ordered); + continue; + } + + /* Now wait for the OE to finish. */ + unlock_extent(&binode->io_tree, start, end, + cached_state); + btrfs_start_ordered_extent(ordered, start, end + 1 - start); + btrfs_put_ordered_extent(ordered); + /* We have unlocked the whole range, restart from the beginning. */ + goto again; + } +} + int btrfs_read_folio(struct file *file, struct folio *folio) { struct btrfs_inode *inode = folio_to_inode(folio); @@ -1091,7 +1270,7 @@ int btrfs_read_folio(struct file *file, struct folio *folio) struct extent_map *em_cached = NULL; int ret; - btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); + lock_extents_for_read(inode, start, end, &cached_state); ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL); unlock_extent(&inode->io_tree, start, end, &cached_state); @@ -2380,7 +2559,7 @@ void btrfs_readahead(struct readahead_control *rac) struct extent_map *em_cached = NULL; u64 prev_em_start = (u64)-1; - btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); + lock_extents_for_read(inode, start, end, &cached_state); while ((folio = readahead_folio(rac)) != NULL) btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e87d4a37c929..00c68b7b2206 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -941,7 +941,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct folio *folio, cached_state); folio_unlock(folio); folio_put(folio); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); btrfs_put_ordered_extent(ordered); return -EAGAIN; } @@ -1855,7 +1855,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) unlock_extent(io_tree, page_start, page_end, &cached_state); folio_unlock(folio); up_read(&BTRFS_I(inode)->i_mmap_lock); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); btrfs_put_ordered_extent(ordered); goto again; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0efe9f005149..e99cb5109967 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2821,7 +2821,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) unlock_extent(&inode->io_tree, page_start, page_end, &cached_state); folio_unlock(folio); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); btrfs_put_ordered_extent(ordered); goto again; } @@ -4873,7 +4873,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len, unlock_extent(io_tree, block_start, block_end, &cached_state); folio_unlock(folio); folio_put(folio); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); btrfs_put_ordered_extent(ordered); goto again; } diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 4aca7475fd82..6075a6fa4817 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -729,7 +729,7 @@ static void btrfs_run_ordered_extent_work(struct btrfs_work *work) struct btrfs_ordered_extent *ordered; ordered = container_of(work, struct btrfs_ordered_extent, flush_work); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); complete(&ordered->completion); } @@ -842,10 +842,12 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, /* * Start IO and wait for a given ordered extent to finish. * - * Wait on page writeback for all the pages in the extent and the IO completion - * code to insert metadata into the btree corresponding to the extent. + * Wait on page writeback for all the pages in the extent but not in + * [@nowriteback_start, @nowriteback_start + @nowriteback_len) and the + * IO completion code to insert metadata into the btree corresponding to the extent. */ -void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry) +void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry, + u64 nowriteback_start, u32 nowriteback_len) { u64 start = entry->file_offset; u64 end = start + entry->num_bytes - 1; @@ -865,8 +867,19 @@ void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry) * start IO on any dirty ones so the wait doesn't stall waiting * for the flusher thread to find them */ - if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags)) - filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, end); + if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags)) { + if (!nowriteback_len) { + filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, end); + } else { + if (start < nowriteback_start) + filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, + nowriteback_start - 1); + if (nowriteback_start + nowriteback_len < end) + filemap_fdatawrite_range(inode->vfs_inode.i_mapping, + nowriteback_start + nowriteback_len, + end); + } + } if (!freespace_inode) btrfs_might_wait_for_event(inode->root->fs_info, btrfs_ordered_extent); @@ -921,7 +934,7 @@ int btrfs_wait_ordered_range(struct btrfs_inode *inode, u64 start, u64 len) btrfs_put_ordered_extent(ordered); break; } - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); end = ordered->file_offset; /* * If the ordered extent had an error save the error but don't @@ -1174,7 +1187,7 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, break; } unlock_extent(&inode->io_tree, start, end, cachedp); - btrfs_start_ordered_extent(ordered); + btrfs_start_ordered_extent(ordered, 0, 0); btrfs_put_ordered_extent(ordered); } } diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index be36083297a7..5e17f03d43c9 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -192,7 +192,8 @@ void btrfs_add_ordered_sum(struct btrfs_ordered_extent *entry, struct btrfs_ordered_sum *sum); struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *inode, u64 file_offset); -void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry); +void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry, + u64 nowriteback_start, u32 nowriteback_len); int btrfs_wait_ordered_range(struct btrfs_inode *inode, u64 start, u64 len); struct btrfs_ordered_extent * btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset); From patchwork Sun Feb 23 23:46:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987257 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FEC210E4 for ; Sun, 23 Feb 2025 23:46:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354422; cv=none; b=ZZiQptWgFeD/aIwnWRiay6nARPAXtjIctFJHRDoLydMHSN4z+/F95Tg47YZ/P97k4VtQitX7LQ+DUaeu28hejpmDV2fPxSu2YP+jn/r1WhrPXx00XadZtaND5tZslGC72xRGZrfTB8zO4TQ7txdLka4dUQFxEsfYvvnCUicxUK4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354422; c=relaxed/simple; bh=C3Q4LIZ5p8LgRDFY+3r4jIw7TaXtuJQo0L0jntRv5O0=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R1dwjGpWJK0TQFN1lBXC6FB9jM3BVTt48EwIcgIlSShjdqafdhIUh2Adk+TdMIDRToO3T0e2dOVAv+/oaA51Dv3tklX9tnnh0iRlK7HK8lUs3D+nZNZKRbqLJ171D/RApxsjhFoVtZEmZTHhyTaA7EJ81msc7xjngXGY4dNhsMY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=GfVOgcnv; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=GfVOgcnv; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="GfVOgcnv"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="GfVOgcnv" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6CCEB1F38C for ; Sun, 23 Feb 2025 23:46:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354406; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bnVOflwM3CrUfaIuDhMzsup2J16GmkpU4GQYiJNBsDo=; b=GfVOgcnvLoaoWjyhNMOE0ZXWu9rA9tXwwj2BZ6R92K3TNNIYV6tB23QYRm5JbYInah8jcb t2Td6LY72ZG/bZfbQRvWI72+NZHYacmJkQ37XQIg9LcwWppvrqHU0YSS5in38H6kP3bc/S 51PT40fG4dI10oMYHTqmTCZwTzAAyHI= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354406; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bnVOflwM3CrUfaIuDhMzsup2J16GmkpU4GQYiJNBsDo=; b=GfVOgcnvLoaoWjyhNMOE0ZXWu9rA9tXwwj2BZ6R92K3TNNIYV6tB23QYRm5JbYInah8jcb t2Td6LY72ZG/bZfbQRvWI72+NZHYacmJkQ37XQIg9LcwWppvrqHU0YSS5in38H6kP3bc/S 51PT40fG4dI10oMYHTqmTCZwTzAAyHI= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id A091B13A42 for ; Sun, 23 Feb 2025 23:46:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id oM33F2Wzu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:45 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/7] btrfs: make btrfs_do_readpage() to do block-by-block read Date: Mon, 24 Feb 2025 10:16:19 +1030 Message-ID: <115f1044e9d341b8507cae894a7e2dece5f41445.1740354271.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Score: -2.80 X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:mid,suse.com:email]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO X-Spam-Level: Currently if a btrfs has its block size (the older sector size) smaller than the page size, btrfs_do_readpage() will handle the range extent by extent, this is good for performance as it doesn't need to re-lookup the same extent map again and again. (Although __get_extent_map() already does extra cached em check, thus the optimization is not that obvious) This is totally fine and is a valid optimization, but it has an assumption that, there is no partial uptodate range in the page. Meanwhile there is an incoming feature, requiring btrfs to skip the full page read if a buffered write range covers a full block but not a full page. In that case, we can have a page that is partially uptodate, and the current per-extent lookup can not handle such case. So here we change btrfs_do_readpage() to do block-by-block read, this simplifies the following things: - Remove the need for @iosize variable Because we just use sectorsize as our increment. - Remove @pg_offset, and calculate it inside the loop when needed It's just offset_in_folio(). - Use a for() loop instead of a while() loop This will slightly reduce the read performance for subpage cases, but for the future where we need to skip already uptodate blocks, it should still be worthy. For block size == page size, this brings no performance change. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/extent_io.c | 37 ++++++++++++------------------------- 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a6ffee6f6fd9..b3a4a94212c9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -947,9 +947,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, u64 last_byte = i_size_read(inode); struct extent_map *em; int ret = 0; - size_t pg_offset = 0; - size_t iosize; - size_t blocksize = fs_info->sectorsize; + const size_t blocksize = fs_info->sectorsize; ret = set_folio_extent_mapped(folio); if (ret < 0) { @@ -960,24 +958,23 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, if (folio_contains(folio, last_byte >> PAGE_SHIFT)) { size_t zero_offset = offset_in_folio(folio, last_byte); - if (zero_offset) { - iosize = folio_size(folio) - zero_offset; - folio_zero_range(folio, zero_offset, iosize); - } + if (zero_offset) + folio_zero_range(folio, zero_offset, + folio_size(folio) - zero_offset); } bio_ctrl->end_io_func = end_bbio_data_read; begin_folio_read(fs_info, folio); - while (cur <= end) { + for (cur = start; cur <= end; cur += blocksize) { enum btrfs_compression_type compress_type = BTRFS_COMPRESS_NONE; + unsigned long pg_offset = offset_in_folio(folio, cur); bool force_bio_submit = false; u64 disk_bytenr; u64 block_start; ASSERT(IS_ALIGNED(cur, fs_info->sectorsize)); if (cur >= last_byte) { - iosize = folio_size(folio) - pg_offset; - folio_zero_range(folio, pg_offset, iosize); - end_folio_read(folio, true, cur, iosize); + folio_zero_range(folio, pg_offset, end - cur + 1); + end_folio_read(folio, true, cur, end - cur + 1); break; } em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached); @@ -991,8 +988,6 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, compress_type = extent_map_compression(em); - iosize = min(extent_map_end(em) - cur, end - cur + 1); - iosize = ALIGN(iosize, blocksize); if (compress_type != BTRFS_COMPRESS_NONE) disk_bytenr = em->disk_bytenr; else @@ -1050,18 +1045,13 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, /* we've found a hole, just zero and go on */ if (block_start == EXTENT_MAP_HOLE) { - folio_zero_range(folio, pg_offset, iosize); - - end_folio_read(folio, true, cur, iosize); - cur = cur + iosize; - pg_offset += iosize; + folio_zero_range(folio, pg_offset, blocksize); + end_folio_read(folio, true, cur, blocksize); continue; } /* the get_extent function already copied into the folio */ if (block_start == EXTENT_MAP_INLINE) { - end_folio_read(folio, true, cur, iosize); - cur = cur + iosize; - pg_offset += iosize; + end_folio_read(folio, true, cur, blocksize); continue; } @@ -1072,12 +1062,9 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, if (force_bio_submit) submit_one_bio(bio_ctrl); - submit_extent_folio(bio_ctrl, disk_bytenr, folio, iosize, + submit_extent_folio(bio_ctrl, disk_bytenr, folio, blocksize, pg_offset); - cur = cur + iosize; - pg_offset += iosize; } - return 0; } From patchwork Sun Feb 23 23:46:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987259 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E847204C0B for ; Sun, 23 Feb 2025 23:47:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354428; cv=none; b=iIB7nsoJrstrlGHAUREck9IYsqefTatsWTL1AdUTR0WT3j4zRauvSVxBibmWrI1n6rf8zduvqoYfdBMD9R+qWYgZpFMFWeYTbeLAlNsgzGgay3ejbafyJUeVom1m5ysQYuP6nVm2Ik/7AuvDl5Zmg5WbNwjwKQwhL1G8fXOycHE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354428; c=relaxed/simple; bh=KoidlOizMMZUYzGI34XkbVT3Cks9FcQEIsBLqxkJw3I=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ox0Z6HfDgT4m4RsGfYi2mxwOSfj7Nsqrn8hSZqpr0ag0+2bZ79v27tX7efBhl7kKof9wLud6HnxAvtgrbj/dAqHMe9jC5u4Mb0jXqGXgjNrQRVjmbBBYkt3A/3VXDF07F7jOi3sak9lBG7pQCGfSxRhLFeKAZEgnmTYvSCBfnmA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=VXqwoiDe; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=VXqwoiDe; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="VXqwoiDe"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="VXqwoiDe" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id AF5CE1F383 for ; Sun, 23 Feb 2025 23:46:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354407; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xwHM9RWzCGB99lXOJuhVuOqEI3pkRt4kvoH5aqhj1g0=; b=VXqwoiDeZB6HL6CgOd8HpnKIJJuvPIavww00B14BTv+SXUgZS8SJUmnSie35m+O1HVIfuw 2C3kCC1fg3+rPGH9r00NbjopNkM7lc4mHVzktVJ/4bEu734kCgqhtxFi/avHxsQcom9CRZ qeVcpmlL834llV8x/EoTJnzxMeHSD40= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=VXqwoiDe DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354407; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xwHM9RWzCGB99lXOJuhVuOqEI3pkRt4kvoH5aqhj1g0=; b=VXqwoiDeZB6HL6CgOd8HpnKIJJuvPIavww00B14BTv+SXUgZS8SJUmnSie35m+O1HVIfuw 2C3kCC1fg3+rPGH9r00NbjopNkM7lc4mHVzktVJ/4bEu734kCgqhtxFi/avHxsQcom9CRZ qeVcpmlL834llV8x/EoTJnzxMeHSD40= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E419013A42 for ; Sun, 23 Feb 2025 23:46:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id COeDKGazu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:46 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/7] btrfs: allow buffered write to avoid full page read if it's block aligned Date: Mon, 24 Feb 2025 10:16:20 +1030 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: AF5CE1F383 X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO [BUG] Since the support of block size (sector size) < page size for btrfs, test case generic/563 fails with 4K block size and 64K page size: --- tests/generic/563.out 2024-04-25 18:13:45.178550333 +0930 +++ /home/adam/xfstests-dev/results//generic/563.out.bad 2024-09-30 09:09:16.155312379 +0930 @@ -3,7 +3,8 @@ read is in range write is in range write -> read/write -read is in range +read has value of 8388608 +read is NOT in range -33792 .. 33792 write is in range ... [CAUSE] The test case creates a 8MiB file, then buffered write into the 8MiB using 4K block size, to overwrite the whole file. On 4K page sized systems, since the write range covers the full block and page, btrfs will no bother reading the page, just like what XFS and EXT4 do. But 64K page sized systems, although the 4K sized write is still block aligned, it's not page aligned any more, thus btrfs will read the full page, causing more read than expected and fail the test case. [FIX] To skip the full page read, we need to do the following modification: - Do not trigger full page read as long as the buffered write is block aligned This is pretty simple by modifying the check inside prepare_uptodate_page(). - Skip already uptodate blocks during full page read Or we can lead to the following data corruption: 0 32K 64K |///////| | Where the file range [0, 32K) is dirtied by buffered write, the remaining range [32K, 64K) is not. When reading the full page, since [0,32K) is only dirtied but not written back, there is no data extent map for it, but a hole covering [0, 64k). If we continue reading the full page range [0, 64K), the dirtied range will be filled with 0 (since there is only a hole covering the whole range). This causes the dirtied range to get lost. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/file.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b3a4a94212c9..d7240e295bfc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -977,6 +977,10 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, end_folio_read(folio, true, cur, end - cur + 1); break; } + if (btrfs_folio_test_uptodate(fs_info, folio, cur, blocksize)) { + end_folio_read(folio, true, cur, blocksize); + continue; + } em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached); if (IS_ERR(em)) { end_folio_read(folio, false, cur, end + 1 - cur); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 00c68b7b2206..e3d63192281d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -804,14 +804,15 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 { u64 clamp_start = max_t(u64, pos, folio_pos(folio)); u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); + const u32 sectorsize = inode_to_fs_info(inode)->sectorsize; int ret = 0; if (folio_test_uptodate(folio)) return 0; if (!force_uptodate && - IS_ALIGNED(clamp_start, PAGE_SIZE) && - IS_ALIGNED(clamp_end, PAGE_SIZE)) + IS_ALIGNED(clamp_start, sectorsize) && + IS_ALIGNED(clamp_end, sectorsize)) return 0; ret = btrfs_read_folio(NULL, folio); From patchwork Sun Feb 23 23:46:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987260 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C92E10E4 for ; Sun, 23 Feb 2025 23:47:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354434; cv=none; b=HIw/0qmSmol6q738+lPgWQOo572YqEZn0cWW4gafK5AeGamnhoV/Mw8OrIAVRLnJi8Pja3H32AsVB19FNUJiPFW9NdDvBES0hoEuOOHfiEuqFc2wY/MZ95Y09Qn9Piy4QRawtfV0ftK2SAIDN+69zFUksamkI/2qEj3gQ1QdrWo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354434; c=relaxed/simple; bh=Guqkvc0ESluSBC6XHg/Vypb72x6G6eY2rRdOHx4MzLo=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S/uFAkKjL5/mxn/e7cPeKopfyIgy6anNurQwtTbxtWVuKN0X60S6x1jwB9vRvNDOnwHnpTZr+eoFyR9BOnKTsT308MWwPlo8e5xmNJmYoGoH0Ht3fJyekMKGA7Enmw++wJS25kojTFuddF+yPKEm9nOCmwa8fN4RcxnKCfyMmdc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=GTcjMilh; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=GTcjMilh; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="GTcjMilh"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="GTcjMilh" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F3B201F38E for ; Sun, 23 Feb 2025 23:46:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354409; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qCpJdceAISRtIdzBsAPLFMOeTR3iK36LOdioueDrdZI=; b=GTcjMilhLoZnLsYNZMbLC01TYGlNR/jCTeCY9rk15E4xVcpmvMG42If023APqcfeTw+9sb 3Ve4pDT3UK6Tzl0EoAWxh+eBPC1YIy4AHMpXY5WpaPFRA7QEYIRmnLgHEen4JIr+g8xy91 ZNp+uti+FTdLCqssqjXKRYAFk1SPMew= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=GTcjMilh DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354409; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qCpJdceAISRtIdzBsAPLFMOeTR3iK36LOdioueDrdZI=; b=GTcjMilhLoZnLsYNZMbLC01TYGlNR/jCTeCY9rk15E4xVcpmvMG42If023APqcfeTw+9sb 3Ve4pDT3UK6Tzl0EoAWxh+eBPC1YIy4AHMpXY5WpaPFRA7QEYIRmnLgHEen4JIr+g8xy91 ZNp+uti+FTdLCqssqjXKRYAFk1SPMew= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 332E213A42 for ; Sun, 23 Feb 2025 23:46:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id SLvrOGezu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:47 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/7] btrfs: allow inline data extents creation if block size < page size Date: Mon, 24 Feb 2025 10:16:21 +1030 Message-ID: <592f1b35af2c203e4bb6dd6431b6bb0c880e81aa.1740354271.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: F3B201F38E X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: Previously inline data extents creation is disable if the block size (previously called sector size) is smaller than the page size, for the following reasons: - Possible mixed inline and regular data extents However this is also the same if the block size matches the page size, thus we do not treat mixed inline and regular extents as an error. And the chance to cause mixed inline and regular data extents are not even increased, it has the same requirement (compressed inline data extent covering the whole first block, followed by regular extents). - Unable to handle async/inline delalloc range for block size < page size cases This is already fixed since commit 1d2fbb7f1f9e ("btrfs: allow compression even if the range is not page aligned"). This was the major technical blockage, but it's no longer a blockage anymore. With the major technical blockage already removed, we can enable inline data extents creation no matter the block size nor the page size, allowing the btrfs to have the same capacity for all block sizes. Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana --- fs/btrfs/inode.c | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e99cb5109967..a1ea93bad80e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -566,19 +566,6 @@ static bool can_cow_file_range_inline(struct btrfs_inode *inode, if (offset != 0) return false; - /* - * Due to the page size limit, for subpage we can only trigger the - * writeback for the dirty sectors of page, that means data writeback - * is doing more writeback than what we want. - * - * This is especially unexpected for some call sites like fallocate, - * where we only increase i_size after everything is done. - * This means we can trigger inline extent even if we didn't want to. - * So here we skip inline extent creation completely. - */ - if (fs_info->sectorsize != PAGE_SIZE) - return false; - /* Inline extents are limited to sectorsize. */ if (size > fs_info->sectorsize) return false; From patchwork Sun Feb 23 23:46:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13987258 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B363020013A for ; Sun, 23 Feb 2025 23:47:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354424; cv=none; b=heJPYiI/RePbw8qbU8/lG6aRkDqYTWthYbuDxp8YnYAEIQ4/67xV/Rj04R/iAbrOtbdgCNTTAUmszL70cCS3OvfHUoylJhWWCrphO8iCrsOpbQQCzfV83wXyylZvLaSj90VM4teBfBjkm5Lg6EOXgqCZOTrSFggWoPpPo0HOOnU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740354424; c=relaxed/simple; bh=U74dwjd6hRkxRHQzqUXvKQ62REXR7Ay8C9J6n17JAog=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JckRYOH6DVTV9/xLWUsTA3UpuyekdODJKIgik4EzyjfiOJ1B8nVWQ2BZtRsXw04WPh14SFeoN+I1hvGVjetaxOnQkA23dQnKzDg6u6l8y+T6NMjtDVKFczYn5TuZmtNtzVNbPNh3JPFUk9VlNAlCqnLzcNEsCvqgzZhckYlEfRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=tXnMMc7h; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=tXnMMc7h; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="tXnMMc7h"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="tXnMMc7h" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 41F0421174 for ; Sun, 23 Feb 2025 23:46:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354410; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=im/0VwaIl9ZzbAkiZuz2ZCfjVukEp+eKztzRaVYUtJI=; b=tXnMMc7he0Wvycv0aNKQ1LDmUFBA0Q/dOut2kK61j9UL466FyHxuUSo2P04/2nz09L/utP ao3oGTURwrsGtZQBkQp8QYN4aEEjTgPXxAiiq9dMJ8lkpp3LzurW1dYwxbxlFwu8sWHQAB yZ7Mu2OMqpUkXzyQBerJ0P8T59AX+hw= Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1740354410; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=im/0VwaIl9ZzbAkiZuz2ZCfjVukEp+eKztzRaVYUtJI=; b=tXnMMc7he0Wvycv0aNKQ1LDmUFBA0Q/dOut2kK61j9UL466FyHxuUSo2P04/2nz09L/utP ao3oGTURwrsGtZQBkQp8QYN4aEEjTgPXxAiiq9dMJ8lkpp3LzurW1dYwxbxlFwu8sWHQAB yZ7Mu2OMqpUkXzyQBerJ0P8T59AX+hw= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 76EC813A42 for ; Sun, 23 Feb 2025 23:46:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id sG/hDWmzu2euTgAAD6G6ig (envelope-from ) for ; Sun, 23 Feb 2025 23:46:49 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 7/7] btrfs: remove the subpage related warning message Date: Mon, 24 Feb 2025 10:16:22 +1030 Message-ID: <7d10ddfc206a73909763f8a9addfef1e10e5fccf.1740354271.git.wqu@suse.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[99.99%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Score: -2.80 X-Spam-Flag: NO Since the initial enablement of block size < page size support for btrfs in v5.15, we have hit several milestones for block size < page size (subpage) support: - RAID56 subpage support In v5.19 - Refactored scrub support to support subpage better In v6.4 - Block perfect (previously requires page aligned ranges) compressed write In v6.13 - Various error handling fixes involving subpage In v6.14 Finally the only missing feature is the pretty simple and harmless inlined data extent creation, just done in previous patches. Now btrfs has all of its features ready for both regular and subpage cases, there is no reason to output a warning about the experimental subpage support, and we can finally remove it now. Signed-off-by: Qu Wenruo Acked-by: Filipe Manana --- fs/btrfs/disk-io.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a799216aa264..c0b40dedceb5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3414,11 +3414,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device */ fs_info->max_inline = min_t(u64, fs_info->max_inline, fs_info->sectorsize); - if (sectorsize < PAGE_SIZE) - btrfs_warn(fs_info, - "read-write for sector size %u with page size %lu is experimental", - sectorsize, PAGE_SIZE); - ret = btrfs_init_workqueues(fs_info); if (ret) goto fail_sb_buffer;