From patchwork Sun Dec 8 02:50:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898468 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81D681B95B; Sun, 8 Dec 2024 02:51:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626290; cv=none; b=RrxjRW2QALIEBzmU/ztPTbyulS0G5cxyjp/W96TM4cX3mgHwSrn9tY6Wy3VEQoSSb3JJp0DkIBgfZlXQC+o/9N1ZR1Pw6ho+CgwYdnF02ZZbm1EBPFb0NvBK0xS/mak4N0gFxlOROeGrsNc2nbeIwtgxNWG4jAfA5SiOmkKQKDw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626290; c=relaxed/simple; bh=AwN3Aaa5CQt1KQnJX+FjCyxAkHDp+VZgPstT6Uw6g6w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ol6n6pEADf3UtzIyMX8Bd5jWqfy2dLVuNOc6Ps511GaDB5ukFWwgn5LQHN7yM9sLtJd83j+fVoiuqIXbYTwnZRsa6e9J2PnFOpV5vO81aZgRb0V5TxOzrnii5mX0u5/5Vb1Djuz6w+/79d2C9yFFY85WOMuMvXQv0y+EgGE6XJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=QxH4piZW; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=QxH4piZW; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="QxH4piZW"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="QxH4piZW" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9BB161F381; Sun, 8 Dec 2024 02:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626285; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cS2HyLCi4s8/hAjytAZT+knu4pqOoOKuA3DhFnmQ5tw=; b=QxH4piZWt34hB0npEkC6pciNM+JnXU7osa71etbkNqf885zSe6GXzBN0Z/Xky+NOG/h6q7 VQU4EAbJzxutwNz5AB+zw7D5C5w2xp/+UmGHDlIpPlTogdURFDoVaStrlj0xRqtHaeUWa7 HVAU3F2DcbZ42qzhxj7LIC3u52Ah8mg= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=QxH4piZW DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626285; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cS2HyLCi4s8/hAjytAZT+knu4pqOoOKuA3DhFnmQ5tw=; b=QxH4piZWt34hB0npEkC6pciNM+JnXU7osa71etbkNqf885zSe6GXzBN0Z/Xky+NOG/h6q7 VQU4EAbJzxutwNz5AB+zw7D5C5w2xp/+UmGHDlIpPlTogdURFDoVaStrlj0xRqtHaeUWa7 HVAU3F2DcbZ42qzhxj7LIC3u52Ah8mg= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 9CCD5133D1; Sun, 8 Dec 2024 02:51:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id CAvkF6wJVWcXcQAAD6G6ig (envelope-from ); Sun, 08 Dec 2024 02:51:24 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH 1/8] btrfs: fix double accounting race when btrfs_run_delalloc_range() failed Date: Sun, 8 Dec 2024 13:20:58 +1030 Message-ID: <5dbaa2ecb29e4f2d6deb8a502d178706d09d48ee.1733624454.git.wqu@suse.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 9BB161F381 X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email]; ASN(0.00)[asn:25478, ipnet:::/0, country:RU]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: [BUG] There are several double accounting case, where the WARN_ON_ONCE() is triggered inside can_finish_ordered_extent(). And all such cases points back to the btrfs_mark_ordered_io_finished() call inside extent_writepage() when it hits some error. [CAUSE] With extra debug patches to show where the error is from, it turns out to be btrfs_run_delalloc_range() can fail with -ENOSPC. Such failure itself is already a symptom of some bad data/metadata space reservation, but here we need to focus on the error handling part. For example, we have the following dirty page layout (4K sector size and 4K page size): 0 16K 32K |/////|/////|/////|/////|/////|/////|/////|/////| Where the range [0, 32K) is dirty and we need to write all the 8 pages back. When handling the first page 0, we go the following sequence: - btrfs_run_delalloc_range() for range [0, 32k) We enter cow_file_range() for [0, 32K) - btrfs_reserve_extent() only returned a 16K data extent. This can be caused by fragmentation, and it's already an indication we're almost running of space. Now we have the following layout: 0 16K 32K |<----- Reserved ------>|/////|/////|/////|/////| The range [0, 16K) has ordered extent allocated. - btrfs_reserve_extent() returned -ENOSPC We really run out of space. But since we have reserved space for range [0, 16K) we need to clean them up. But that cleanup for ordered extent only happens inside btrfs_run_delalloc_range(). - btrfs_run_delalloc_range() cleanup the reserved ordered extent By calling btrfs_mark_ordered_io_finished() for range [0, 32K). It will locate the ordered extent [0, 16K) and mark it as IOERR. Also since the ordered extent is only 16K, we're finishing the whole ordered extent. Thus we call btrfs_queue_ordered_fn() to queue to finish the ordered extent. But still, the ordered extent [0, 16K) is still in the btrfs_inode::ordered_tree. - extent_writepage() cleanup the ordered extent inside the folio We call btrfs_mark_ordered_io_finished() for range [0, 4K). Since the finished ordered extent [0, 16K) is not yet removed (racy, depends on when btrfs_finish_one_ordered() is called), if btrfs_mark_ordered_io_finished() is called before btrfs_finish_one_ordered(), we will double account and trigger the warning inside can_finish_ordered_extent(). So the root cause is, we're relying on btrfs_mark_ordered_io_finished() to handle ranges which is already cleaned up. Unfortunately the bug dates back to the early days when btrfs_mark_ordered_io_finished() is introduced as a no-brain choice for error paths, but such no-brain solution just hides all the race and make us less cautious when handling errors. [FIX] Instead of relying on the btrfs_mark_ordered_io_finished() call to cleanup the whole folio range, record the last successfully ran delalloc range. And combined with bio_ctrl->submit_bitmap to properly clean up any newly created ordered extents. Since we have cleaned up the ordered extents in range, we should not rely on the btrfs_mark_ordered_io_finished() inside extent_writepage() anymore. By this, we ensure btrfs_mark_ordered_io_finished() is only called once when writepage_delalloc() failed. Cc: stable@vger.kernel.org # 5.15+ Fixes: e65f152e4348 ("btrfs: refactor how we finish ordered extent io for endio functions") Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9725ff7f274d..417c710c55ca 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1167,6 +1167,12 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, * last delalloc end. */ u64 last_delalloc_end = 0; + /* + * Save the last successfully ran delalloc range end (exclusive). + * This is for error handling to avoid ranges with ordered extent created + * but no IO will be submitted due to error. + */ + u64 last_finished = page_start; u64 delalloc_start = page_start; u64 delalloc_end = page_end; u64 delalloc_to_write = 0; @@ -1235,11 +1241,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, found_len = last_delalloc_end + 1 - found_start; if (ret >= 0) { + /* + * Some delalloc range may be created by previous folios. + * Thus we still need to clean those range up during error + * handling. + */ + last_finished = found_start; /* No errors hit so far, run the current delalloc range. */ ret = btrfs_run_delalloc_range(inode, folio, found_start, found_start + found_len - 1, wbc); + if (ret >= 0) + last_finished = found_start + found_len; } else { /* * We've hit an error during previous delalloc range, @@ -1274,8 +1288,21 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, delalloc_start = found_start + found_len; } - if (ret < 0) + /* + * It's possible we have some ordered extents created before we hit + * an error, cleanup non-async successfully created delalloc ranges. + */ + if (unlikely(ret < 0)) { + unsigned int bitmap_size = min( + (last_finished - page_start) >> fs_info->sectorsize_bits, + fs_info->sectors_per_page); + + for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size) + btrfs_mark_ordered_io_finished(inode, folio, + page_start + (bit << fs_info->sectorsize_bits), + fs_info->sectorsize, false); return ret; + } out: if (last_delalloc_end) delalloc_end = last_delalloc_end; @@ -1509,13 +1536,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl bio_ctrl->wbc->nr_to_write--; -done: - if (ret) { + if (ret) btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, page_start, PAGE_SIZE, !ret); - mapping_set_error(folio->mapping, ret); - } +done: + if (ret < 0) + mapping_set_error(folio->mapping, ret); /* * Only unlock ranges that are submitted. As there can be some async * submitted ranges inside the folio. From patchwork Sun Dec 8 02:50:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898469 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0A841BC4E; Sun, 8 Dec 2024 02:51:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626291; cv=none; b=Lxtpgrcxp3TG4gBsyxbH9ahnhOOIZwhA1iV+NQVCUY8I+jpYy7OOngREZmhQzJJ5qRtIKs0aIvouljPIEVUY7xOtCoksD4du4OtdxSy60gzgBo4yg340SzP5wpY7UvAm0cyTpuWUZyL4ZjO3x5fTa4gvKeLZ1IFusQGqjgo8X0k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626291; c=relaxed/simple; bh=eFpsJuF5DzMIhXWQ45l67nGBynuSSkYJHyrnWdn/OUU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cJeXVUsZZeAiAyE9ieFHiMtn3bsXrxNQAepD94TLBH7jGlFxfaD+j7kLFnLCdxBl8Z325QxWBpE4eLxCnCLCKIuxy95RvYQQGjanT7vf55HMfb3KmHlKH1uIPvqkotgfwYAcvGC33BCfFwHruLbS5jHMh/4KEKvb8i6g2U33pOc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=GlUoOF27; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=GlUoOF27; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="GlUoOF27"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="GlUoOF27" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 174BD1F45E; Sun, 8 Dec 2024 02:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626287; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1j98ulzATZZAAJ1RR7HoMEKXXEyZMqJiramueCvFvPw=; b=GlUoOF27pg1nwtn3pbqkNID2ClThAR57bwmG5+CcYMptnvFXy/rvByRUqPKUi7zJKW0E+Q 6p6FnDlcqGOTlTWsCxXyywrqKkOIvRwXMc4qm2SJCifDcLs99DtW9EX8ouJuQwam+Ru6QF 8jt9sVJRPJ3h+XnXtsg4q8TMmAR1zj8= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626287; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1j98ulzATZZAAJ1RR7HoMEKXXEyZMqJiramueCvFvPw=; b=GlUoOF27pg1nwtn3pbqkNID2ClThAR57bwmG5+CcYMptnvFXy/rvByRUqPKUi7zJKW0E+Q 6p6FnDlcqGOTlTWsCxXyywrqKkOIvRwXMc4qm2SJCifDcLs99DtW9EX8ouJuQwam+Ru6QF 8jt9sVJRPJ3h+XnXtsg4q8TMmAR1zj8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 18733133D1; Sun, 8 Dec 2024 02:51:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id gEAbM60JVWcXcQAAD6G6ig (envelope-from ); Sun, 08 Dec 2024 02:51:25 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH 2/8] btrfs: fix double accounting race when extent_writepage_io() failed Date: Sun, 8 Dec 2024 13:20:59 +1030 Message-ID: <93eef727e17b97dd4aabd417cedfda03cac7939d.1733624454.git.wqu@suse.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; FUZZY_BLOCKED(0.00)[rspamd.com]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Spam-Score: -2.80 X-Spam-Flag: NO [BUG] If submit_one_sector() failed inside extent_writepage_io() for sector size < page size cases (e.g. 4K sector size and 64K page size), then we can hit double ordered extent accounting error. This should be very rare, as submit_one_sector() only fails when we failed to grab the extent map, and such extent map should exist inside the memory and have been pinned. [CAUSE] For example we have the following folio layout: 0 4K 32K 48K 60K 64K |//| |//////| |///| Where |///| is the dirty range we need to writeback. The 3 different dirty ranges are submitted for regular COW. Now we hit the following sequence: - submit_one_sector() returned 0 for [0, 4K) - submit_one_sector() returned 0 for [32K, 48K) - submit_one_sector() returned error for [60K, 64K) - btrfs_mark_ordered_io_finished() called for the whole folio This will mark the following ranges as finished: * [0, 4K) * [32K, 48K) Both ranges have their IO already submitted, this cleanup will lead to double accounting. * [60K, 64K) That's the correct cleanup. The only good news is, this error is only theoretical, as the target extent map is always pinned, thus we should directly grab it from memory, other than reading it from the disk. [FIX] Instead of calling btrfs_mark_ordered_io_finished() for the whole folio range, which can touch ranges we should not touch, instead move the error handling inside extent_writepage_io(). So that we can cleanup exact sectors that are ought to be submitted but failed. This provide much more accurate cleanup, avoiding the double accounting. Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 417c710c55ca..b6a4f1765b4c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1418,6 +1418,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, struct btrfs_fs_info *fs_info = inode->root->fs_info; unsigned long range_bitmap = 0; bool submitted_io = false; + bool error = false; const u64 folio_start = folio_pos(folio); u64 cur; int bit; @@ -1460,11 +1461,21 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, break; } ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size); - if (ret < 0) - goto out; + if (unlikely(ret < 0)) { + submit_one_bio(bio_ctrl); + /* + * Failed to grab the extent map which should be very rare. + * Since there is no bio submitted to finish the ordered + * extent, we have to manually finish this sector. + */ + btrfs_mark_ordered_io_finished(inode, folio, cur, + fs_info->sectorsize, false); + error = true; + continue; + } submitted_io = true; } -out: + /* * If we didn't submitted any sector (>= i_size), folio dirty get * cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared @@ -1472,8 +1483,11 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, * * Here we set writeback and clear for the range. If the full folio * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag. + * + * If we hit any error, the corresponding sector will still be dirty + * thus no need to clear PAGECACHE_TAG_DIRTY. */ - if (!submitted_io) { + if (!submitted_io && !error) { btrfs_folio_set_writeback(fs_info, folio, start, len); btrfs_folio_clear_writeback(fs_info, folio, start, len); } @@ -1493,7 +1507,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl { struct inode *inode = folio->mapping->host; struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); - const u64 page_start = folio_pos(folio); int ret; size_t pg_offset; loff_t i_size = i_size_read(inode); @@ -1536,10 +1549,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl bio_ctrl->wbc->nr_to_write--; - if (ret) - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, - page_start, PAGE_SIZE, !ret); - done: if (ret < 0) mapping_set_error(folio->mapping, ret); @@ -2319,11 +2328,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f if (ret == 1) goto next_page; - if (ret) { - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, - cur, cur_len, !ret); + if (ret) mapping_set_error(mapping, ret); - } btrfs_folio_end_lock(fs_info, folio, cur, cur_len); if (ret < 0) found_error = true; From patchwork Sun Dec 8 02:51:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898470 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1ED68224D6 for ; Sun, 8 Dec 2024 02:51:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626292; cv=none; b=EzIkH7ZiVPJQjNonYgpL9nMb/bjHkiMLJsAhISspTZllLnTkJKEj5Dc36QBUJU6mMAn40ycXveGj2pPBFSeX9C9RztCEZJ7u+FwZKngHINu534uh2wkKmLOhmsHdWh2Smh6U3XidglwVQ161T4NBd0zX3hoC3UrJ+aWvcE7GPa4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626292; c=relaxed/simple; bh=c09gAvMdZQCT4/yGlwX9IyMt0IjLORAW/hOWA9tOvmE=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PocFy4kks8iYQKZtdp/H107+kJL4bVUQo0LYg/B0jSom6dJfDIV5E95biFTCo1DDJp4VtfAryU7EmOnnjaYpzBQwKAFgi8AlIgtJoHEtqV6gED3OLU8ptgmkp+Gn8grHCxq9hO8WOSYSa1AQhv+/AbnILGNdKLLRQXQfvn5WeSU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=bfvFmA+4; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=bfvFmA+4; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="bfvFmA+4"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="bfvFmA+4" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4C0291F45F for ; Sun, 8 Dec 2024 02:51:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626288; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+wk4KA2ZXySUcG1nFRFY9xIuzbiQNy9uEFymOts2qs=; b=bfvFmA+4G1C1hQT9KA3S1+V9fcn64i81ksHQ9txwcdokRT9/P5lWnuvyTFKFK6zJO9kiw+ wxuIdPU04M6T2FNOjiaP442evwayf/VQLuvEVEpRkskc/t/QAaPanpZvX57+vnsWmpY0Ux B+sRlGZv86o9t4R5tPZXRtBhrq/U/20= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=bfvFmA+4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626288; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+wk4KA2ZXySUcG1nFRFY9xIuzbiQNy9uEFymOts2qs=; b=bfvFmA+4G1C1hQT9KA3S1+V9fcn64i81ksHQ9txwcdokRT9/P5lWnuvyTFKFK6zJO9kiw+ wxuIdPU04M6T2FNOjiaP442evwayf/VQLuvEVEpRkskc/t/QAaPanpZvX57+vnsWmpY0Ux B+sRlGZv86o9t4R5tPZXRtBhrq/U/20= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 893E5133D1 for ; Sun, 8 Dec 2024 02:51:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id sHgAE68JVWcXcQAAD6G6ig (envelope-from ) for ; Sun, 08 Dec 2024 02:51:27 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/8] btrfs: fix the error handling of submit_uncompressed_range() Date: Sun, 8 Dec 2024 13:21:00 +1030 Message-ID: X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 4C0291F45F X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:25478, ipnet:::/0, country:RU]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:mid,suse.com:email]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; RCPT_COUNT_ONE(0.00)[1]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Level: [BUG] If btrfs failed to compress the range, or can not reserve a large enough data extent (e.g. too fragmented free space), btrfs will fall back to submit_uncompressed_range(). But inside submit_uncompressed_range(), run_dealloc_cow() can also fail due to -ENOSPC or whatever other errors. In that case there are 3 bugs in the error handling: 1) Double freeing for the same ordered extent Which can lead to crash due to ordered extent double accounting 2) Start/end writeback without updating the subpage writeback bitmap 3) Unlock the folio without clear the subpage lock bitmap Both bug 2) and 3) will crash the kernel if the btrfs block size is smaller than folio size, as the next time the folio get writeback/lock updates, subpage will find the bitmap already have the range set, triggering an ASSERT(). [CAUSE] Bug 1) happens in the following call chain: submit_uncompressed_range() |- run_dealloc_cow() | |- cow_file_range() | |- btrfs_reserve_extent() | Failed with -ENOSPC or whatever error | |- btrfs_clean_up_ordered_extents() | |- btrfs_mark_ordered_io_finished() | Which cleans all the ordered extents in the async_extent range. | |- btrfs_mark_ordered_io_finished() Which cleans the folio range. The finished ordered extents may not be immediately removed from the ordered io tree, as they are removed inside a work queue. So the second btrfs_mark_ordered_io_finished() may find the finished but not-yet-removed ordered extents, and double free them. Furthermore, the second btrfs_mark_ordered_io_finished() is not subpage compatible, as it uses fixed folio_pos() with PAGE_SIZE, which can cover other ordered extents. Bug 2) and 3) are more straight forward, btrfs just calls folio_unlock(), folio_start_writeback() and folio_end_writeback(), other than the helpers which handle subpage cases. [FIX] For bug 1) since the first btrfs_cleanup_ordered_extents() call is handling the whole range, we should not do the second btrfs_mark_ordered_io_finished() call. And for the first btrfs_cleanup_ordered_extents(), we no longer need to pass the @locked_page parameter, as we are already in the async extent context, thus will never rely on the error handling inside btrfs_run_delalloc_range(). So just let the btrfs_clean_up_ordered_extents() to handle every folio equally. For bug 2) we should not even call folio_start_writeback()/folio_end_writeback() anymore. As the error handling protocol, cow_file_range() should clear dirty flag and start/finish the writeback for the whole range passed in. For bug 3) just change the folio_unlock() to btrfs_folio_end_lock() helper. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c4997200dbb2..aee172abad1c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1129,19 +1129,11 @@ static void submit_uncompressed_range(struct btrfs_inode *inode, &wbc, false); wbc_detach_inode(&wbc); if (ret < 0) { - btrfs_cleanup_ordered_extents(inode, locked_folio, + btrfs_cleanup_ordered_extents(inode, NULL, start, end - start + 1); - if (locked_folio) { - const u64 page_start = folio_pos(locked_folio); - - folio_start_writeback(locked_folio); - folio_end_writeback(locked_folio); - btrfs_mark_ordered_io_finished(inode, locked_folio, - page_start, PAGE_SIZE, - !ret); - mapping_set_error(locked_folio->mapping, ret); - folio_unlock(locked_folio); - } + if (locked_folio) + btrfs_folio_end_lock(inode->root->fs_info, locked_folio, + start, async_extent->ram_size); } } From patchwork Sun Dec 8 02:51:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898471 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B310038DF9; Sun, 8 Dec 2024 02:51:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626293; cv=none; b=KT0ZfzyXl4a8K/TeALvABA5BsQMC5h+cIZ8YP9dXVamhroXkDFdxKwOhA3HhtIEsqafQMVMYfpgCYA+P2fHK4Txg+gGT3QzG9qCbek2YoQObmCuz1AvKFvTnO5RVtHsdFwaV9KrBjclHpo6ikzGnRE7nA9coKKZpw8rWtlh79rQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626293; c=relaxed/simple; bh=bvj9p12tzGwgfftg+kXNMdvccwGxoA1HEZYtyYOdt2c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gGvkpn7xgCigR55EEzHWQZMApZinupqws+QNe6YxZVxWo1TTmChcd61+I4FOLZSNZRbgrIbIjfwxyyOCC3GdWPyCvFIGm2xvrPqtr8Jq7OfMcwCGTGULZ8Uxn5mYM3WYBRPNlJ+ieyNMiqylzjJNAhAYSF1zlN5FyMXAp/pbUBQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=f60tmgIn; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=f60tmgIn; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="f60tmgIn"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="f60tmgIn" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BA9F121178; Sun, 8 Dec 2024 02:51:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626289; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XkhDhUzp/a0GfucP9USm5OIstNqmYmkLwap19rwN3hs=; b=f60tmgInJDG/MecVpgGXDNkEpVESWXmjWRaHQPn4jGJmKsgBcwAYCQ1vrwWawYgY9CtP6y oC0oIAom2wgXhtWyptcO5nJWR/Z4BdZm48vv6LPcKtD2smLiQg7PLjgpKt4KfNfiuktJxx sHLE3r00g4eviAZdBThpIw+XALzLpSw= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=f60tmgIn DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626289; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XkhDhUzp/a0GfucP9USm5OIstNqmYmkLwap19rwN3hs=; b=f60tmgInJDG/MecVpgGXDNkEpVESWXmjWRaHQPn4jGJmKsgBcwAYCQ1vrwWawYgY9CtP6y oC0oIAom2wgXhtWyptcO5nJWR/Z4BdZm48vv6LPcKtD2smLiQg7PLjgpKt4KfNfiuktJxx sHLE3r00g4eviAZdBThpIw+XALzLpSw= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id BE12F133D1; Sun, 8 Dec 2024 02:51:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 8NX7H7AJVWcXcQAAD6G6ig (envelope-from ); Sun, 08 Dec 2024 02:51:28 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH 4/8] btrfs: do proper folio cleanup when cow_file_range() failed Date: Sun, 8 Dec 2024 13:21:01 +1030 Message-ID: X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: BA9F121178 X-Spam-Level: X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; ARC_NA(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; ASN(0.00)[asn:25478, ipnet:::/0, country:RU]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -3.01 X-Spam-Flag: NO [BUG] When testing with COW fixup marked as BUG_ON() (this is involved with the new pin_user_pages*() change, which should not result new out-of-band dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW fixup path. This BUG_ON() happens just after a failed btrfs_run_delalloc_range(): BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1444! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : extent_writepage_io+0x2d4/0x308 [btrfs] lr : extent_writepage_io+0x2d4/0x308 [btrfs] Call trace: extent_writepage_io+0x2d4/0x308 [btrfs] extent_writepage+0x218/0x330 [btrfs] extent_write_cache_pages+0x1d4/0x4b0 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x180/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] That failure is mostly from cow_file_range(), where we can hit -ENOSPC. Although the -ENOSPC is already a bug related to our space reservation code, let's just focus on the error handling. For example, we have the following dirty range [0, 64K) of an inode, with 4K sector size and 4K page size: 0 16K 32K 48K 64K |///////////////////////////////////////| |#######################################| Where |///| means page are still dirty, and |###| means the extent io tree has EXTENT_DELALLOC flag. - Enter extent_writepage() for page 0 - Enter btrfs_run_delalloc_range() for range [0, 64K) - Enter cow_file_range() for range [0, 64K) - Function btrfs_reserve_extent() only reserved one 16K extent So we created extent map and ordered extent for range [0, 16K) 0 16K 32K 48K 64K |////////|//////////////////////////////| |<- OE ->|##############################| And range [0, 16K) has its delalloc flag cleared. But since we haven't yet submit any bio, involved 4 pages are still dirty. - Function btrfs_reserve_extent() return with -ENOSPC Now we have to run error cleanup, which will clear all EXTENT_DELALLOC* flags and clear the dirty flags for the remaining ranges: 0 16K 32K 48K 64K |////////| | | | | Note that range [0, 16K) still has their pages dirty. - Some time later, writeback are triggered again for the range [0, 16K) since the page range still have dirty flags. - btrfs_run_delalloc_range() will do nothing because there is no EXTENT_DELALLOC flag. - extent_writepage_io() find page 0 has no ordered flag Which falls into the COW fixup path, triggering the BUG_ON(). Unfortunately this error handling bug dates back to the introduction of btrfs. Thankfully with the abuse of cow fixup, at least it won't crash the kernel. [FIX] Instead of immediately unlock the extent and folios, we keep the extent and folios locked until either erroring out or the whole delalloc range finished. When the whole delalloc range finished without error, we just unlock the whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked cases), with EXTENT_DELALLOC and EXTENT_LOCKED cleared. And those involved folios will be properly submitted, with their dirty flags cleared during submission. For the error path, it will be a little more complex: - The range with ordered extent allocated (range (1)) We only clear the EXTENT_DELALLOC and EXTENT_LOCKED, as the remaining flags are cleaned up by btrfs_mark_ordered_io_finished()->btrfs_finish_one_ordered(). For folios we finish the IO (clear dirty, start writeback and immediately finish the writeback) and unlock the folios. - The range with reserved extent but no ordered extent (range(2)) - The range we never touched (range(3)) For both range (2) and range(3) the behavior is not changed. Now even if cow_file_range() failed halfway with some successfully reserved extents/ordered extents, we will keep all folios clean, so there will be no future writeback triggered on them. Cc: stable@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 65 ++++++++++++++++++++++++------------------------ 1 file changed, 32 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index aee172abad1c..57e9b7deee88 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1364,6 +1364,17 @@ static noinline int cow_file_range(struct btrfs_inode *inode, alloc_hint = btrfs_get_extent_allocation_hint(inode, start, num_bytes); + /* + * We're not doing compressed IO, don't unlock the first page + * (which the caller expects to stay locked), don't clear any + * dirty bits and don't set any writeback bits + * + * Do set the Ordered (Private2) bit so we know this page was + * properly setup for writepage. + */ + page_ops = (keep_locked ? 0 : PAGE_UNLOCK); + page_ops |= PAGE_SET_ORDERED; + /* * Relocation relies on the relocated extents to have exactly the same * size as the original extents. Normally writeback for relocation data @@ -1423,6 +1434,10 @@ static noinline int cow_file_range(struct btrfs_inode *inode, file_extent.offset = 0; file_extent.compression = BTRFS_COMPRESS_NONE; + /* + * Locked range will be released either during error clean up or + * after the whole range is finished. + */ lock_extent(&inode->io_tree, start, start + cur_alloc_size - 1, &cached); @@ -1468,21 +1483,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode, btrfs_dec_block_group_reservations(fs_info, ins.objectid); - /* - * We're not doing compressed IO, don't unlock the first page - * (which the caller expects to stay locked), don't clear any - * dirty bits and don't set any writeback bits - * - * Do set the Ordered flag so we know this page was - * properly setup for writepage. - */ - page_ops = (keep_locked ? 0 : PAGE_UNLOCK); - page_ops |= PAGE_SET_ORDERED; - - extent_clear_unlock_delalloc(inode, start, start + cur_alloc_size - 1, - locked_folio, &cached, - EXTENT_LOCKED | EXTENT_DELALLOC, - page_ops); if (num_bytes < cur_alloc_size) num_bytes = 0; else @@ -1499,6 +1499,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode, if (ret) goto out_unlock; } + extent_clear_unlock_delalloc(inode, orig_start, end, locked_folio, &cached, + EXTENT_LOCKED | EXTENT_DELALLOC, + page_ops); done: if (done_offset) *done_offset = end; @@ -1519,35 +1522,31 @@ static noinline int cow_file_range(struct btrfs_inode *inode, * We process each region below. */ - clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | - EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV; - page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK; - /* * For the range (1). We have already instantiated the ordered extents * for this region. They are cleaned up by * btrfs_cleanup_ordered_extents() in e.g, - * btrfs_run_delalloc_range(). EXTENT_LOCKED | EXTENT_DELALLOC are - * already cleared in the above loop. And, EXTENT_DELALLOC_NEW | - * EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV are handled by the cleanup - * function. + * btrfs_run_delalloc_range(). + * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV + * are also handled by the cleanup function. * - * However, in case of @keep_locked, we still need to unlock the pages - * (except @locked_folio) to ensure all the pages are unlocked. + * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag, + * and finish the writeback of the involved folios, which will be + * never submitted. */ - if (keep_locked && orig_start < start) { + if (orig_start < start) { + clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC; + page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK; + if (!locked_folio) mapping_set_error(inode->vfs_inode.i_mapping, ret); extent_clear_unlock_delalloc(inode, orig_start, start - 1, - locked_folio, NULL, 0, page_ops); + locked_folio, NULL, clear_bits, page_ops); } - /* - * At this point we're unlocked, we want to make sure we're only - * clearing these flags under the extent lock, so lock the rest of the - * range and clear everything up. - */ - lock_extent(&inode->io_tree, start, end, NULL); + clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | + EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV; + page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK; /* * For the range (2). If we reserved an extent for our delalloc range From patchwork Sun Dec 8 02:51:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898472 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB25B482DD; Sun, 8 Dec 2024 02:51:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626294; cv=none; b=dxAF7c2Eo+Ag46dE3jyC/as4MpCREOxcAFsJNmZ9mIfUQ6zGzzfrSWflXoflnL4xlO1LV1OkG0WVcQSuuO8k5wROHVgSt39N0L2eAHtYVDFeBMm6y7Z6m5fkAgMI0PUUNxOgkqgKzlaIN3P1hPdiHFv9318LUipnCPhYJqIfpfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626294; c=relaxed/simple; bh=vN4SN9ToN/257ZV0lJIiYnOc5JbUoeplb3avFKLku4A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sqsBE22K2RNAHEOQDf+Qh0vJZ9QH8rPYUJHBF5ZDAhNtM9B9wdHJxDdHIFbS0E1WangqFuXVyFjKLLGn868LNIgPnohmzgQkmF70+5cBFPJgIOU8ApbpfvnwB/UBhN/kO9Ty/fXh8jOxSt0DY1E+jMC3BvzvJmYDbsTR+a3mv8Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=P67dkN5d; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=P67dkN5d; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="P67dkN5d"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="P67dkN5d" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 353EA1F37E; Sun, 8 Dec 2024 02:51:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626291; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oAqJGIN2lcQtOoQWErAl/GqukOyzMqjea0klTX83sbs=; b=P67dkN5dS4o1YgkNHB1oitkA6dK+eodb8CXrB0/9O2uTi3sh3hkcFYJ9nbUxoqMeDnPgP5 /StZHE7rvxjzNVCdG2KFAY08mikeB+mDKjzWZBG2akprGgL/0TaMC7RnWKo4d9Ms9DAcYA ca2idt9viw4Y3N6YT6ux9o3SkO05Gi8= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626291; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oAqJGIN2lcQtOoQWErAl/GqukOyzMqjea0klTX83sbs=; b=P67dkN5dS4o1YgkNHB1oitkA6dK+eodb8CXrB0/9O2uTi3sh3hkcFYJ9nbUxoqMeDnPgP5 /StZHE7rvxjzNVCdG2KFAY08mikeB+mDKjzWZBG2akprGgL/0TaMC7RnWKo4d9Ms9DAcYA ca2idt9viw4Y3N6YT6ux9o3SkO05Gi8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 3922A133D1; Sun, 8 Dec 2024 02:51:29 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id aKYuO7EJVWcXcQAAD6G6ig (envelope-from ); Sun, 08 Dec 2024 02:51:29 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH 5/8] btrfs: do proper folio cleanup when run_delalloc_nocow() failed Date: Sun, 8 Dec 2024 13:21:02 +1030 Message-ID: X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Score: -2.80 X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; FUZZY_BLOCKED(0.00)[rspamd.com]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:mid,suse.com:email]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO X-Spam-Level: [BUG] With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash with the following VM_BUG_ON_FOLIO(): BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff) page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : folio_clear_dirty_for_io+0x128/0x258 lr : folio_clear_dirty_for_io+0x128/0x258 Call trace: folio_clear_dirty_for_io+0x128/0x258 btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs] __process_folios_contig+0x154/0x268 [btrfs] extent_clear_unlock_delalloc+0x5c/0x80 [btrfs] run_delalloc_nocow+0x5f8/0x760 [btrfs] btrfs_run_delalloc_range+0xa8/0x220 [btrfs] writepage_delalloc+0x230/0x4c8 [btrfs] extent_writepage+0xb8/0x358 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x178/0x3a8 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of run_delalloc_nocow(). E.g. we have the following dirtied range (4K blocksize 4K page size): 0 16K 32K |//////////////////////////////////////| | Pre-allocated | And the range [0, 16K) has a preallocated extent. - Enter run_delalloc_nocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write. - Enter fallback_to_fow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW. - cow_file_range() failed for range [16K, 32K) So cow_file_range() will do the clean up by clearing folio dirty, unlock the folios. Now the folios in range [16K, 32K) is unlocked. - Enter extent_clear_unlock_delalloc() from run_delalloc_nocow() Which is called with PAGE_START_WRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VM_BUG_ON_FOLIO(). Furthermore there is another hidden but common bug that run_delalloc_nocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between run_delalloc_nocow() and cow_file_range(). [FIX] - Clear folio dirty for range [@start, @cur_offset) Introduce a helper, cleanup_dirty_folios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @locked_folio. - Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking. - Skip the failed COW range for the error handling Cc: stable@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 93 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 86 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 57e9b7deee88..75b7956a7b4c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1961,6 +1961,48 @@ static int can_nocow_file_extent(struct btrfs_path *path, return ret < 0 ? ret : can_nocow; } +static void cleanup_dirty_folios(struct btrfs_inode *inode, + struct folio *locked_folio, + u64 start, u64 end, int error) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct address_space *mapping = inode->vfs_inode.i_mapping; + pgoff_t start_index = start >> PAGE_SHIFT; + pgoff_t end_index = end >> PAGE_SHIFT; + u32 len; + + ASSERT(end + 1 - start < U32_MAX); + ASSERT(IS_ALIGNED(start, fs_info->sectorsize) && + IS_ALIGNED(end + 1, fs_info->sectorsize)); + len = end + 1 - start; + + /* + * Handle the locked folio first. + * btrfs_folio_clamp_*() helpers can handle range out of the folio case. + */ + btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len); + btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len); + btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len); + + for (pgoff_t index = start_index; index <= end_index; index++) { + struct folio *folio; + + /* Already handled at the beginning. */ + if (index == locked_folio->index) + continue; + folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS); + /* Cache already dropped, no need to do any cleanup. */ + if (IS_ERR(folio)) + continue; + btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len); + btrfs_folio_clamp_set_writeback(fs_info, folio, start, len); + btrfs_folio_clamp_clear_writeback(fs_info, folio, start, len); + folio_unlock(folio); + folio_put(folio); + } + mapping_set_error(mapping, error); +} + /* * when nowcow writeback call back. This checks for snapshots or COW copies * of the extents that exist in the file, and COWs the file as required. @@ -1976,6 +2018,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, struct btrfs_root *root = inode->root; struct btrfs_path *path; u64 cow_start = (u64)-1; + /* + * If not 0, represents the inclusive end of the last fallback_to_cow() + * range. Only for error handling. + */ + u64 cow_end = 0; u64 cur_offset = start; int ret; bool check_prev = true; @@ -2136,6 +2183,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, found_key.offset - 1); cow_start = (u64)-1; if (ret) { + cow_end = found_key.offset - 1; btrfs_dec_nocow_writers(nocow_bg); goto error; } @@ -2209,11 +2257,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, cow_start = cur_offset; if (cow_start != (u64)-1) { - cur_offset = end; ret = fallback_to_cow(inode, locked_folio, cow_start, end); cow_start = (u64)-1; - if (ret) + if (ret) { + cow_end = end; goto error; + } } btrfs_free_path(path); @@ -2221,12 +2270,42 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, error: /* - * If an error happened while a COW region is outstanding, cur_offset - * needs to be reset to cow_start to ensure the COW region is unlocked - * as well. + * There are several error cases: + * + * 1) Failed without falling back to COW + * start cur_start end + * |/////////////| | + * + * For range [start, cur_start) the folios are already unlocked (except + * @locked_folio), EXTENT_DELALLOC already removed. + * Only need to clear the dirty flag as they will never be submitted. + * Ordered extent and extent maps are handled by + * btrfs_mark_ordered_io_finished() inside run_delalloc_range(). + * + * 2) Failed with error from fallback_to_cow() + * start cur_start cow_end end + * |/////////////|-----------| | + * + * For range [start, cur_start) it's the same as case 1). + * But for range [cur_start, cow_end), the folios have dirty flag + * cleared and unlocked, EXTENT_DEALLLOC cleared. + * There may or may not be any ordered extents/extent maps allocated. + * + * We should not call extent_clear_unlock_delalloc() on range [cur_start, + * cow_end), as the folios are already unlocked. + * + * So clear the folio dirty flags for [start, cur_offset) first. */ - if (cow_start != (u64)-1) - cur_offset = cow_start; + if (cur_offset > start) + cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret); + + /* + * If an error happened while a COW region is outstanding, cur_offset + * needs to be reset to @cow_end + 1 to skip the COW range, as + * cow_file_range() will do the proper cleanup at error. + */ + if (cow_end) + cur_offset = cow_end + 1; /* * We need to lock the extent here because we're clearing DELALLOC and From patchwork Sun Dec 8 02:51:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898473 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35C4E4EB45 for ; Sun, 8 Dec 2024 02:51:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626296; cv=none; b=eJ9I17pgJOqJNZKRRLhhiympmC5F1jktakfFq9s5eGsRP6+KcrPsCEDzc2Yi3TKC++KJgUyhSMu+BxD0ET5Wd87ygoFg1CeaDmxZKJCwSYJYQNvhOA9V+IEWP/6jSCgvXZ+AXo2eo452elcSJhvV4znv9FwiJIaRwz0070wjAf0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626296; c=relaxed/simple; bh=ZiEBxhZw8sN+7Ip7QT8ql8JrGi/bNM6PZOoo8hVaTyc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O2ZQuhtESDBmo8sckZk8obOYViCSU1NUycw8GThWWwuIRtIUZfLjz4BCR38xsSoHxDyah0ZHhoJ7CumZhD9TSliKN8vGaTqCWxzS6K/UqB1LZQqE5QRjuf4R2JDFpo4emxrOD86AsswclCmhtwopHVLIkQ6scDm0qDjLk10c/Cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=jlXI1SKl; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=jlXI1SKl; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="jlXI1SKl"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="jlXI1SKl" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6A7802117E for ; Sun, 8 Dec 2024 02:51:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0ZkH4wMOBy5BBqA+lrivu/lJPJgVj0EsmErgCPYJKWg=; b=jlXI1SKl28kDhYoUQvLXo9QgTQghGCRlDQ9UKKvm2pFdPYm7OjdX/5RbkJsLbP6FKGw0xs Rwxi4RizeFc9+Ig+3iSL0kxLDLUHuwBY3yl/jk7aeV3N9wfK8KQGrGvka7p7F5GPqxzCvy LYg4pId5dzki8+WUZuwZDZ20lazHaDU= Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0ZkH4wMOBy5BBqA+lrivu/lJPJgVj0EsmErgCPYJKWg=; b=jlXI1SKl28kDhYoUQvLXo9QgTQghGCRlDQ9UKKvm2pFdPYm7OjdX/5RbkJsLbP6FKGw0xs Rwxi4RizeFc9+Ig+3iSL0kxLDLUHuwBY3yl/jk7aeV3N9wfK8KQGrGvka7p7F5GPqxzCvy LYg4pId5dzki8+WUZuwZDZ20lazHaDU= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id A7C31133D1 for ; Sun, 8 Dec 2024 02:51:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OBiQGrMJVWcXcQAAD6G6ig (envelope-from ) for ; Sun, 08 Dec 2024 02:51:31 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/8] btrfs: subpage: fix the bitmap dump for the locked flags Date: Sun, 8 Dec 2024 13:21:03 +1030 Message-ID: <418c824fe3bb4135f412224359187e392d1e7edd.1733624454.git.wqu@suse.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Score: -2.80 X-Spam-Flag: NO We're dumping the locked bitmap into the @checked_bitmap variable, causing incorrect values during debug. Thankfuklly even during my development I haven't hit a case where I need to dump the locked bitmap. But for the sake of consistency, fix it by dumpping the locked bitmap into @locked_bitmap variable for output. Fixes: 75258f20fb70 ("btrfs: subpage: dump extra subpage bitmaps for debug") Signed-off-by: Qu Wenruo --- fs/btrfs/subpage.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 8c68059ac1b0..03d7bfc042e2 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -716,6 +716,7 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info, unsigned long writeback_bitmap; unsigned long ordered_bitmap; unsigned long checked_bitmap; + unsigned long locked_bitmap; unsigned long flags; ASSERT(folio_test_private(folio) && folio_get_private(folio)); @@ -728,15 +729,16 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info, GET_SUBPAGE_BITMAP(subpage, fs_info, writeback, &writeback_bitmap); GET_SUBPAGE_BITMAP(subpage, fs_info, ordered, &ordered_bitmap); GET_SUBPAGE_BITMAP(subpage, fs_info, checked, &checked_bitmap); - GET_SUBPAGE_BITMAP(subpage, fs_info, locked, &checked_bitmap); + GET_SUBPAGE_BITMAP(subpage, fs_info, locked, &locked_bitmap); spin_unlock_irqrestore(&subpage->lock, flags); dump_page(folio_page(folio, 0), "btrfs subpage dump"); btrfs_warn(fs_info, -"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl checked=%*pbl", +"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl locked=%*pbl writeback=%*pbl ordered=%*pbl checked=%*pbl", start, len, folio_pos(folio), sectors_per_page, &uptodate_bitmap, sectors_per_page, &dirty_bitmap, + sectors_per_page, &locked_bitmap, sectors_per_page, &writeback_bitmap, sectors_per_page, &ordered_bitmap, sectors_per_page, &checked_bitmap); From patchwork Sun Dec 8 02:51:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898474 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 712D25B1FB for ; Sun, 8 Dec 2024 02:51:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626297; cv=none; b=hOP3hdgZ+lX08uaZBhR1FQu2/RuFDKg5oTWUuG/uyf4LC8/4oDvTZwyGVmGL2AJYHgEMZo86pYanPy51+TkcNb6d5F7gYbV1HxEh6TKswSS97cCiDUt5DyhFAEvxsDEop8BxTBN1pvpTsTbWsbuN6aYQ4e9cBtAApWzO22fPqhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626297; c=relaxed/simple; bh=hP3vj1euKEibyALCglERp51BEdi5Q2zDXUM25DsqdmA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c3lZIfAIiyY9W3/H0BqjNbPtSUlEtTZteqBBogR5r7N7Iv42xYZQrnOX38WvWsCghZXvfDvlf5SeheI1rCXzczFVu86DenRYPYOpynNO8/Qa5jWxvqa8fjmbz47tnjuqN1VPNFlUn4WhyWtGhg5xaCKpc46THpkqnYX6dTNmLz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=qqWDYMgK; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=qqWDYMgK; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="qqWDYMgK"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="qqWDYMgK" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id AC7692117F for ; Sun, 8 Dec 2024 02:51:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8XILA8AcWe1lqq7wAthDGyFjPzPZ10C7tdJvxowfsa4=; b=qqWDYMgK5t/MU2pkHyUDn16uTfgBeoX/Cpg1RPvgmhTdFInwq29lN/gmk+q+QdY1GdLSQX 3TS9yB4zzoG//btnS696VUnxVGtRI8kYjFxdan2q28aj2q9ox9uAWUKOv6BbfqlrfWhndq WvCt4S2SDhhEQoX9U6Kxla4kbni8Xec= Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8XILA8AcWe1lqq7wAthDGyFjPzPZ10C7tdJvxowfsa4=; b=qqWDYMgK5t/MU2pkHyUDn16uTfgBeoX/Cpg1RPvgmhTdFInwq29lN/gmk+q+QdY1GdLSQX 3TS9yB4zzoG//btnS696VUnxVGtRI8kYjFxdan2q28aj2q9ox9uAWUKOv6BbfqlrfWhndq WvCt4S2SDhhEQoX9U6Kxla4kbni8Xec= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id DC944133D1 for ; Sun, 8 Dec 2024 02:51:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id aEpiJ7QJVWcXcQAAD6G6ig (envelope-from ) for ; Sun, 08 Dec 2024 02:51:32 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 7/8] btrfs: subpage: dump the involved bitmap when ASSERT() failed Date: Sun, 8 Dec 2024 13:21:04 +1030 Message-ID: X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Score: -2.80 X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:mid,suse.com:email]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Flag: NO X-Spam-Level: For btrfs_folio_assert_not_dirty() and btrfs_folio_set_lock(), we call bitmap_test_range_all_zero() to ensure the involved range has not bit set. However with my recent enhanced delalloc range error handling, I'm hitting the ASSERT() inside btrfs_folio_set_lock(), and is wondering if it's some error handling not properly cleanup the locked bitmap but directly unlock the page. So add some extra dumpping for the ASSERTs to dump the involved bitmap to help debug. Signed-off-by: Qu Wenruo --- fs/btrfs/subpage.c | 41 ++++++++++++++++++++++++++++++----------- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 03d7bfc042e2..d692bc34a3af 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -635,6 +635,28 @@ IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered, IMPLEMENT_BTRFS_PAGE_OPS(checked, folio_set_checked, folio_clear_checked, folio_test_checked); +#define GET_SUBPAGE_BITMAP(subpage, fs_info, name, dst) \ +{ \ + const int sectors_per_page = fs_info->sectors_per_page; \ + \ + ASSERT(sectors_per_page < BITS_PER_LONG); \ + *dst = bitmap_read(subpage->bitmaps, \ + sectors_per_page * btrfs_bitmap_nr_##name, \ + sectors_per_page); \ +} + +#define subpage_dump_bitmap(fs_info, folio, name, start, len) \ +{ \ + struct btrfs_subpage *subpage = folio_get_private(folio); \ + unsigned long bitmap; \ + \ + GET_SUBPAGE_BITMAP(subpage, fs_info, name, &bitmap); \ + btrfs_warn(fs_info, \ + "dumpping bitmap start=%llu len=%u folio=%llu" #name "_bitmap=%*pbl", \ + start, len, folio_pos(folio), \ + fs_info->sectors_per_page, &bitmap); \ +} + /* * Make sure not only the page dirty bit is cleared, but also subpage dirty bit * is cleared. @@ -660,6 +682,10 @@ void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, subpage = folio_get_private(folio); ASSERT(subpage); spin_lock_irqsave(&subpage->lock, flags); + if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) { + subpage_dump_bitmap(fs_info, folio, dirty, start, len); + ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); + } ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); spin_unlock_irqrestore(&subpage->lock, flags); } @@ -689,23 +715,16 @@ void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info, nbits = len >> fs_info->sectorsize_bits; spin_lock_irqsave(&subpage->lock, flags); /* Target range should not yet be locked. */ - ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); + if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) { + subpage_dump_bitmap(fs_info, folio, locked, start, len); + ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); + } bitmap_set(subpage->bitmaps, start_bit, nbits); ret = atomic_add_return(nbits, &subpage->nr_locked); ASSERT(ret <= fs_info->sectors_per_page); spin_unlock_irqrestore(&subpage->lock, flags); } -#define GET_SUBPAGE_BITMAP(subpage, fs_info, name, dst) \ -{ \ - const int sectors_per_page = fs_info->sectors_per_page; \ - \ - ASSERT(sectors_per_page < BITS_PER_LONG); \ - *dst = bitmap_read(subpage->bitmaps, \ - sectors_per_page * btrfs_bitmap_nr_##name, \ - sectors_per_page); \ -} - void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { From patchwork Sun Dec 8 02:51:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13898475 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA92E27473 for ; Sun, 8 Dec 2024 02:51:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626298; cv=none; b=JqJeFDw79cNecfs1CWJlgg6XrHcnafTliOrYgDwoVl68+T68k2pa0WcKDqpnGVmLv9mYl5w0dgyCV4nYwtne8FIgwy7DPr4IMcgvqESJf+IK9CGQUU+nbTiCd5bWIUiDd2ApTCUwxvz1jWaLZGqXu4XzqZapOW0AM17xNp5juzU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733626298; c=relaxed/simple; bh=dnxcTuhNO6u21ZSi+b9jlUhIhAWMYZBAZmAKiqpNgpM=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iHz+P6SFmjqG6XYimnwLg+dPN5yySzZZ7CpGo/aZ+B/wXKChsNiGbZfipbqROyp7M76NLMNobtQ6CSAWXtoWhm3dVne3SS8jJ1upmC5MJr5Oigb3S8/coZ6XAFoJFjQf/De/fOdtpr/GBmBFjj9ZIRCEhKZufAJBJYuw6UWEYWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=sPRugNWI; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=sPRugNWI; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="sPRugNWI"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="sPRugNWI" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D639321178 for ; Sun, 8 Dec 2024 02:51:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626294; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yUn2zHpQAZ+ipDzrknoFHOvKMZIYm1Vk5ndxzmsCHYc=; b=sPRugNWIIzCI3m9Gumljrfj6odjiTwKPPE3mboZPLsHggCC4pXyw1dVFFwOgPqv7eGXL/g zIoCNxZC38tNuPBzFcpVxr9xvY7AUaTcMUz7Tti6bfLqOCrzag08QLLz4hvR4oygBAmKj5 N7ljU4OwatXY3B+c4hEAKTRv4dcBB3w= Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1733626294; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yUn2zHpQAZ+ipDzrknoFHOvKMZIYm1Vk5ndxzmsCHYc=; b=sPRugNWIIzCI3m9Gumljrfj6odjiTwKPPE3mboZPLsHggCC4pXyw1dVFFwOgPqv7eGXL/g zIoCNxZC38tNuPBzFcpVxr9xvY7AUaTcMUz7Tti6bfLqOCrzag08QLLz4hvR4oygBAmKj5 N7ljU4OwatXY3B+c4hEAKTRv4dcBB3w= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 1F699133D1 for ; Sun, 8 Dec 2024 02:51:33 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id CMLTNLUJVWcXcQAAD6G6ig (envelope-from ) for ; Sun, 08 Dec 2024 02:51:33 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 8/8] btrfs: add extra error messages for delalloc range related errors Date: Sun, 8 Dec 2024 13:21:05 +1030 Message-ID: X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; RCVD_TLS_ALL(0.00)[] X-Spam-Score: -2.80 X-Spam-Flag: NO All the error handling bugs I hit so far are all -ENOSPC from either: - cow_file_range() - run_delalloc_nocow() - submit_uncompressed_range() Previously when those functions failed, there is no error message at all, making the debugging much harder. So here we introduce extra error messages for: - cow_file_range() - run_delalloc_nocow() - submit_uncompressed_range() - writepage_delalloc() when btrfs_run_delalloc_range() failed - extent_writepage() when extent_writepage_io() failed One example of the new debug error messages is the following one: run fstests generic/750 at 2024-12-08 12:41:41 BTRFS: device fsid 461b25f5-e240-4543-8deb-e7c2bd01a6d3 devid 1 transid 8 /dev/mapper/test-scratch1 (253:4) scanned by mount (2436600) BTRFS info (device dm-4): first mount of filesystem 461b25f5-e240-4543-8deb-e7c2bd01a6d3 BTRFS info (device dm-4): using crc32c (crc32c-arm64) checksum algorithm BTRFS info (device dm-4): forcing free space tree for sector size 4096 with page size 65536 BTRFS info (device dm-4): using free-space-tree BTRFS warning (device dm-4): read-write for sector size 4096 with page size 65536 is experimental BTRFS info (device dm-4): checking UUID tree BTRFS error (device dm-4): cow_file_range failed, root=363 inode=412 start=503808 len=98304: -28 BTRFS error (device dm-4): run_delalloc_nocow failed, root=363 inode=412 start=503808 len=98304: -28 BTRFS error (device dm-4): failed to run delalloc range, root=363 ino=412 folio=458752 submit_bitmap=11-15 start=503808 len=98304: -28 Which shows an error from cow_file_range() which is called inside a nocow write attempt, along with the extra bitmap from writepage_delalloc(). Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 16 ++++++++++++++++ fs/btrfs/inode.c | 14 +++++++++++++- fs/btrfs/subpage.c | 3 ++- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b6a4f1765b4c..f4fb1fb3454a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1254,6 +1254,15 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, wbc); if (ret >= 0) last_finished = found_start + found_len; + if (unlikely(ret < 0)) + btrfs_err_rl(fs_info, +"failed to run delalloc range, root=%lld ino=%llu folio=%llu submit_bitmap=%*pbl start=%llu len=%u: %d", + inode->root->root_key.objectid, + btrfs_ino(inode), + folio_pos(folio), + fs_info->sectors_per_page, + &bio_ctrl->submit_bitmap, + found_start, found_len, ret); } else { /* * We've hit an error during previous delalloc range, @@ -1546,6 +1555,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl PAGE_SIZE, bio_ctrl, i_size); if (ret == 1) return 0; + if (ret < 0) + btrfs_err_rl(fs_info, +"failed to submit blocks, root=%lld inode=%llu folio=%llu submit_bitmap=%*pbl: %d", + BTRFS_I(inode)->root->root_key.objectid, + btrfs_ino(BTRFS_I(inode)), + folio_pos(folio), fs_info->sectors_per_page, + &bio_ctrl->submit_bitmap, ret); bio_ctrl->wbc->nr_to_write--; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 75b7956a7b4c..620c527ba841 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1134,6 +1134,10 @@ static void submit_uncompressed_range(struct btrfs_inode *inode, if (locked_folio) btrfs_folio_end_lock(inode->root->fs_info, locked_folio, start, async_extent->ram_size); + btrfs_err_rl(inode->root->fs_info, + "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d", + __func__, btrfs_root_id(inode->root), + btrfs_ino(inode), start, async_extent->ram_size, ret); } } @@ -1246,7 +1250,7 @@ static void submit_one_async_extent(struct async_chunk *async_chunk, free_async_extent_pages(async_extent); if (async_chunk->blkcg_css) kthread_associate_blkcg(NULL); - btrfs_debug(fs_info, + btrfs_debug_rl(fs_info, "async extent submission failed root=%lld inode=%llu start=%llu len=%llu ret=%d", btrfs_root_id(root), btrfs_ino(inode), start, async_extent->ram_size, ret); @@ -1580,6 +1584,10 @@ static noinline int cow_file_range(struct btrfs_inode *inode, btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size, end - start - cur_alloc_size + 1, NULL); } + btrfs_err_rl(fs_info, + "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d", + __func__, btrfs_root_id(inode->root), + btrfs_ino(inode), orig_start, end + 1 - orig_start, ret); return ret; } @@ -2325,6 +2333,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, btrfs_qgroup_free_data(inode, NULL, cur_offset, end - cur_offset + 1, NULL); } btrfs_free_path(path); + btrfs_err_rl(fs_info, + "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d", + __func__, btrfs_root_id(inode->root), + btrfs_ino(inode), start, end + 1 - start, ret); return ret; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index d692bc34a3af..7f47bc61389c 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -652,7 +652,7 @@ IMPLEMENT_BTRFS_PAGE_OPS(checked, folio_set_checked, folio_clear_checked, \ GET_SUBPAGE_BITMAP(subpage, fs_info, name, &bitmap); \ btrfs_warn(fs_info, \ - "dumpping bitmap start=%llu len=%u folio=%llu" #name "_bitmap=%*pbl", \ + "dumpping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl", \ start, len, folio_pos(folio), \ fs_info->sectors_per_page, &bitmap); \ } @@ -717,6 +717,7 @@ void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info, /* Target range should not yet be locked. */ if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) { subpage_dump_bitmap(fs_info, folio, locked, start, len); + btrfs_warn(fs_info, "nr_locked=%u\n", atomic_read(&subpage->nr_locked)); ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); } bitmap_set(subpage->bitmaps, start_bit, nbits);