From patchwork Thu May 23 07:05:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13671324 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8973D13C9AC for ; Thu, 23 May 2024 07:06:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447981; cv=none; b=MO1u8BLgtxBVIDisSsqiRT3n+counsgoQcekJcleOSsMp2puEvIynoxO/nHcYJS3hiXw7jihwSw79gc6943LIpSg7Q2IBCpGr1I4YI9tB5RsAThPZdXU8rXM8ZVu+hlUMpmluaclwcqFi7pghmaGiFGKTfLqPgWUyySTW5IFAtI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447981; c=relaxed/simple; bh=MF+S98NEFj1HdG3XCS+AZNJUqwLO+F5Mu8SVZxVoMQE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IhGv3zkUQtwl6awDsYJXLw6IZK6ZGuvGyyYl30c6S3298O8Dq8pSnfGg+a2PKtzTKQhMCfn3t5k+XchDtBFtjkGFAl87FJ9P++g/oBqGzpAjGp5C5xmPhZNP7cgtznvR/quoMrmGNN06RAhCJN2ceZeWWXLhwdaX3+HdQyRh+9M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=Hx2mUts3; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=Hx2mUts3; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Hx2mUts3"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Hx2mUts3" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6387D1FF79; Thu, 23 May 2024 07:06:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447972; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sJI3UxIe8tgZBONHkvn/BRLvIDH5nF2YtnCnbamXDm8=; b=Hx2mUts3MQgdaOxYeaxRgtAxmmW/+6tHWCzgIVZ7WeM+F9xfZLOZt0RLlwujcYNIwGT0Pg vRPh2zzk2IomzKjnUxtB0KvMq9tHInrJz1VWmm6G5DeKNCmGR5+oOc6Vg8ZV+sfFDfY0Ke TtF7XTegV4ZRJSuI6Pprv5+MnWF8bLI= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=Hx2mUts3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447972; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sJI3UxIe8tgZBONHkvn/BRLvIDH5nF2YtnCnbamXDm8=; b=Hx2mUts3MQgdaOxYeaxRgtAxmmW/+6tHWCzgIVZ7WeM+F9xfZLOZt0RLlwujcYNIwGT0Pg vRPh2zzk2IomzKjnUxtB0KvMq9tHInrJz1VWmm6G5DeKNCmGR5+oOc6Vg8ZV+sfFDfY0Ke TtF7XTegV4ZRJSuI6Pprv5+MnWF8bLI= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D5E5713A7D; Thu, 23 May 2024 07:06:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id WNUPIuLqTmb0bwAAD6G6ig (envelope-from ); Thu, 23 May 2024 07:06:10 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn , Naohiro Aota Subject: [PATCH v6 1/5] btrfs: make __extent_writepage_io() to write specified range only Date: Thu, 23 May 2024 16:35:42 +0930 Message-ID: <4a50f491c71005015b13be437dd17bbbffc59fe1.1716445070.git.wqu@suse.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Flag: NO X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Rspamd-Queue-Id: 6387D1FF79 X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; MIME_TRACE(0.00)[0:+]; RCVD_VIA_SMTP_AUTH(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; DWL_DNSWL_BLOCKED(0.00)[suse.com:dkim]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_TLS_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.com:dkim,suse.com:email]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] Function __extent_writepage_io() is designed to find all dirty range of a page, and add that dirty range into the bio_ctrl for submission. It requires all the dirtied range to be covered by an ordered extent. It get called in two locations, but one call site is not subpage aware: - __extent_writepage() It get called when writepage_delalloc() returned 0, which means writepage_delalloc() has handled dellalloc for all subpage sectors inside the page. So this call site is OK. - extent_write_locked_range() This call site is utilized by zoned support, and in this case, we may only run delalloc range for a subset of the page, like this: (64K page size) 0 16K 32K 48K 64K |/////| |///////| | In above case, if extent_write_locked_range() is only triggered for range [0, 16K), __extent_writepage_io() would still try to submit the dirty range of [32K, 48K), then it would not find any ordered extent for it and trigger various ASSERT()s. Fix this problem by: - Introducing @start and @len parameters to specify the range For the first call site, we just pass the whole page, and the behavior is not touched, since run_delalloc_range() for the page should have created all ordered extents for the page. For the second call site, we would avoid touching anything beyond the range, thus avoid the dirty range which is not yet covered by any delalloc range. - Making btrfs_folio_assert_not_dirty() subpage aware The only caller is inside __extent_writepage_io(), and since that caller now accepts a subpage range, we should also check the subpage range other than the whole page. Reviewed-by: Johannes Thumshirn Reviewed-by: Naohiro Aota Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 18 +++++++++++------- fs/btrfs/subpage.c | 22 ++++++++++++++++------ fs/btrfs/subpage.h | 3 ++- 3 files changed, 29 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bf50301ee528..938061e0ce01 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1339,20 +1339,23 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info, * < 0 if there were errors (page still locked) */ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, - struct page *page, + struct page *page, u64 start, u32 len, struct btrfs_bio_ctrl *bio_ctrl, loff_t i_size, int *nr_ret) { struct btrfs_fs_info *fs_info = inode->root->fs_info; - u64 cur = page_offset(page); - u64 end = cur + PAGE_SIZE - 1; + u64 cur = start; + u64 end = start + len - 1; u64 extent_offset; u64 block_start; struct extent_map *em; int ret = 0; int nr = 0; + ASSERT(start >= page_offset(page) && + start + len <= page_offset(page) + PAGE_SIZE); + ret = btrfs_writepage_cow_fixup(page); if (ret) { /* Fixup worker will requeue */ @@ -1441,7 +1444,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, nr++; } - btrfs_folio_assert_not_dirty(fs_info, page_folio(page)); + btrfs_folio_assert_not_dirty(fs_info, page_folio(page), start, len); *nr_ret = nr; return 0; @@ -1499,7 +1502,8 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl if (ret) goto done; - ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr); + ret = __extent_writepage_io(BTRFS_I(inode), page, page_offset(page), + PAGE_SIZE, bio_ctrl, i_size, &nr); if (ret == 1) return 0; @@ -2251,8 +2255,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, clear_page_dirty_for_io(page); } - ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl, - i_size, &nr); + ret = __extent_writepage_io(BTRFS_I(inode), page, cur, cur_len, + &bio_ctrl, i_size, &nr); if (ret == 1) goto next_page; diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 9127704236ab..2697e528eab2 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -703,19 +703,29 @@ IMPLEMENT_BTRFS_PAGE_OPS(checked, folio_set_checked, folio_clear_checked, * Make sure not only the page dirty bit is cleared, but also subpage dirty bit * is cleared. */ -void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio) +void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len) { - struct btrfs_subpage *subpage = folio_get_private(folio); + struct btrfs_subpage *subpage; + unsigned int start_bit; + unsigned int nbits; + unsigned long flags; if (!IS_ENABLED(CONFIG_BTRFS_ASSERT)) return; - ASSERT(!folio_test_dirty(folio)); - if (!btrfs_is_subpage(fs_info, folio->mapping)) + if (!btrfs_is_subpage(fs_info, folio->mapping)) { + ASSERT(!folio_test_dirty(folio)); return; + } - ASSERT(folio_test_private(folio) && folio_get_private(folio)); - ASSERT(subpage_test_bitmap_all_zero(fs_info, subpage, dirty)); + start_bit = subpage_calc_start_bit(fs_info, folio, dirty, start, len); + nbits = len >> fs_info->sectorsize_bits; + subpage = folio_get_private(folio); + ASSERT(subpage); + spin_lock_irqsave(&subpage->lock, flags); + ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); + spin_unlock_irqrestore(&subpage->lock, flags); } /* diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index b6dc013b0fdc..4b363d9453af 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -156,7 +156,8 @@ DECLARE_BTRFS_SUBPAGE_OPS(checked); bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); -void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio); +void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len); void btrfs_folio_unlock_writer(struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info, From patchwork Thu May 23 07:05:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13671327 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABCCF13C9D3 for ; Thu, 23 May 2024 07:06:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447984; cv=none; b=ofsKi/sQOLQombYF1vqhrXQtCDr0pQxUsAdAXJn5iA5L85I/blVKO2KMtTEm+NFqsnjwD+uKlLMDU5laKhWAnXi0f+he/yGyKL637pjj1gt39ckeGiAU4z+n1YG55h84nOEpV5o5fiZSMDGAtScgJqqm/+bksU/Qg1MpEHD02S0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447984; c=relaxed/simple; bh=n1f9R9F4+iixf5PDTSSjPPs3CCbE2Iur8ZjbMKl6MfU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MSuyvfjZT+ZntnDkvd+jDeggMERgXu8yI5mkqXGMV5kC327uI/fRYTT0vsmjvwHdcyD6Hz5Ggt2dEewLn8YihHzCJFhxLrIBcO/ZzZqsGLfT9ttTZSbY0LUWYXKB4oXuqke1VmhTxN9nkGrfrFnU7+FShSbBkSPJlwogzuxePLc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=NTbgFtpW; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=NTbgFtpW; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="NTbgFtpW"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="NTbgFtpW" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 903191FF7A; Thu, 23 May 2024 07:06:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447974; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kvEjFEM8oTzsvp3V9yE8oGu16+8Lsht4ZEW/66o0i4c=; b=NTbgFtpWAtWtJdgeqOhTZbmzW+vEoi3X4XyqJZMplSCP4aWhaB6K5ZWKdYIhuL3Hg4asgm O1eKzTBLrNByYMwacS6Od+uvU3vWlHAiK7qEfZVG84Yk7891Qx0TLjOODu/Z4X9u/ialDB SLw/6U1MWLCvle+vCT1eXnB7mSDvsuI= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=NTbgFtpW DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447974; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kvEjFEM8oTzsvp3V9yE8oGu16+8Lsht4ZEW/66o0i4c=; b=NTbgFtpWAtWtJdgeqOhTZbmzW+vEoi3X4XyqJZMplSCP4aWhaB6K5ZWKdYIhuL3Hg4asgm O1eKzTBLrNByYMwacS6Od+uvU3vWlHAiK7qEfZVG84Yk7891Qx0TLjOODu/Z4X9u/ialDB SLw/6U1MWLCvle+vCT1eXnB7mSDvsuI= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 06C6E13A7D; Thu, 23 May 2024 07:06:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id SEh/J+TqTmb0bwAAD6G6ig (envelope-from ); Thu, 23 May 2024 07:06:12 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn , Naohiro Aota Subject: [PATCH v6 2/5] btrfs: subpage: introduce helpers to handle subpage delalloc locking Date: Thu, 23 May 2024 16:35:43 +0930 Message-ID: <5dac0b184913eb6f7992c88233ede0f836cad77e.1716445070.git.wqu@suse.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Flag: NO X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Rspamd-Queue-Id: 903191FF7A X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_TLS_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; DWL_DNSWL_BLOCKED(0.00)[suse.com:dkim]; RCVD_VIA_SMTP_AUTH(0.00)[]; RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received]; RCPT_COUNT_THREE(0.00)[3]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.com:dkim,suse.com:email] Three new helpers are introduced for the incoming subpage delalloc locking change. - btrfs_folio_set_writer_lock() This is to mark specified range with subpage specific writer lock. After calling this, the subpage range can be proper unlocked by btrfs_folio_end_writer_lock() - btrfs_subpage_find_writer_locked() This is to find the writer locked subpage range in a page. With the help of btrfs_folio_set_writer_lock(), it can allow us to record and find previously locked subpage range without extra memory allocation. - btrfs_folio_end_all_writers() This is for the locked_page of __extent_writepage(), as there may be multiple subpage delalloc ranges locked. Reviewed-by: Johannes Thumshirn Reviewed-by: Naohiro Aota Signed-off-by: Qu Wenruo --- fs/btrfs/subpage.c | 122 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/subpage.h | 7 +++ 2 files changed, 129 insertions(+) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 2697e528eab2..8bf83dd3313d 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -775,6 +775,128 @@ void btrfs_folio_unlock_writer(struct btrfs_fs_info *fs_info, btrfs_folio_end_writer_lock(fs_info, folio, start, len); } +/* + * This is for folio already locked by plain lock_page()/folio_lock(), which + * doesn't have any subpage awareness. + * + * This would populate the involved subpage ranges so that subpage helpers can + * properly unlock them. + */ +void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len) +{ + struct btrfs_subpage *subpage; + unsigned long flags; + unsigned int start_bit; + unsigned int nbits; + int ret; + + ASSERT(folio_test_locked(folio)); + if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping)) + return; + + subpage = folio_get_private(folio); + start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); + nbits = len >> fs_info->sectorsize_bits; + spin_lock_irqsave(&subpage->lock, flags); + /* Target range should not yet be locked. */ + ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); + bitmap_set(subpage->bitmaps, start_bit, nbits); + ret = atomic_add_return(nbits, &subpage->writers); + ASSERT(ret <= fs_info->subpage_info->bitmap_nr_bits); + spin_unlock_irqrestore(&subpage->lock, flags); +} + +/* + * Find any subpage writer locked range inside @folio, starting at file offset + * @search_start. + * The caller should ensure the folio is locked. + * + * Return true and update @found_start_ret and @found_len_ret to the first + * writer locked range. + * Return false if there is no writer locked range. + */ +bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 search_start, + u64 *found_start_ret, u32 *found_len_ret) +{ + struct btrfs_subpage_info *subpage_info = fs_info->subpage_info; + struct btrfs_subpage *subpage = folio_get_private(folio); + const unsigned int len = PAGE_SIZE - offset_in_page(search_start); + const unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, + locked, search_start, len); + const unsigned int locked_bitmap_start = subpage_info->locked_offset; + const unsigned int locked_bitmap_end = locked_bitmap_start + + subpage_info->bitmap_nr_bits; + unsigned long flags; + int first_zero; + int first_set; + bool found = false; + + ASSERT(folio_test_locked(folio)); + spin_lock_irqsave(&subpage->lock, flags); + first_set = find_next_bit(subpage->bitmaps, locked_bitmap_end, + start_bit); + if (first_set >= locked_bitmap_end) + goto out; + + found = true; + + *found_start_ret = folio_pos(folio) + + ((first_set - locked_bitmap_start) << fs_info->sectorsize_bits); + /* + * Since @first_set is ensured to be smaller than locked_bitmap_end + * here, @found_start_ret should be inside the folio. + */ + ASSERT(*found_start_ret < folio_pos(folio) + PAGE_SIZE); + + first_zero = find_next_zero_bit(subpage->bitmaps, + locked_bitmap_end, first_set); + *found_len_ret = (first_zero - first_set) << fs_info->sectorsize_bits; +out: + spin_unlock_irqrestore(&subpage->lock, flags); + return found; +} + +/* + * Unlike btrfs_folio_end_writer_lock() which unlock a specified subpage range, + * this would end all writer locked ranges of a page. + * + * This is for the locked page of __extent_writepage(), as the locked page + * can contain several locked subpage ranges. + */ +void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info, + struct folio *folio) +{ + u64 folio_start = folio_pos(folio); + u64 cur = folio_start; + + ASSERT(folio_test_locked(folio)); + if (!btrfs_is_subpage(fs_info, folio->mapping)) { + folio_unlock(folio); + return; + } + + while (cur < folio_start + PAGE_SIZE) { + u64 found_start; + u32 found_len; + bool found; + bool last; + + found = btrfs_subpage_find_writer_locked(fs_info, folio, cur, + &found_start, &found_len); + if (!found) + break; + last = btrfs_subpage_end_and_test_writer(fs_info, folio, + found_start, found_len); + if (last) { + folio_unlock(folio); + break; + } + cur = found_start + found_len; + } +} + #define GET_SUBPAGE_BITMAP(subpage, subpage_info, name, dst) \ bitmap_cut(dst, subpage->bitmaps, 0, \ subpage_info->name##_offset, subpage_info->bitmap_nr_bits) diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 4b363d9453af..9f19850d59f2 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -112,6 +112,13 @@ int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); +void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len); +bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 search_start, + u64 *found_start_ret, u32 *found_len_ret); +void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info, + struct folio *folio); /* * Template for subpage related operations. From patchwork Thu May 23 07:05:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13671328 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EAC413C9AD for ; Thu, 23 May 2024 07:06:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447985; cv=none; b=CDiNN5fqU1V4gsPLHddGdrJ0Yj/66EuUwliDJiccIAdsndCs15lCYsmpM1BpDao+FadwcWhvBDooIDfoHo5h781BBfsGncv0KuuNV6KKCi9Sz9/w71yw6woTrpSSp068RPnMtfXyiW8/O3e1PzwgYRC3C7JEqBhC3ixF+41FDwY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447985; c=relaxed/simple; bh=R/4/YeYrLs/yPGHvImvZSMu7LTXDu53uH9Ony4z4Tqk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BcfGtkHiJTuIahrm/zJJKV5rNFSpY42dBBy4s1AZf5fRZwcrpaqMyT5X0uZRDcNBiBcr2cw5C42XduzBrghB2WIbme/AbEu8wIo+3+bKMsfZBAyr4UPeoHp5/ZR2juz4v7Pha64Sk0Z6haiFMQP1czD3RtO+DfNXBCFP358XWfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=I87XDzTK; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=I87XDzTK; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="I87XDzTK"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="I87XDzTK" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 1D3EE1FF7B for ; Thu, 23 May 2024 07:06:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4lrEdGRguSPR4qShMLKko8CWKzzucT2T64fHN+EuuOI=; b=I87XDzTKP9u7+TuhcfRf0S+j8+OvCeS8Uqf8xU/iDxQyD8jwK9Vvhufp6D2sgcVbE6dkRz DveHm/vJpYbm9no7UfPwA9ivfHHm6dTjyYi1pUsDjfK9pPrMfTkg7PjMy0Pwh4inIWxq/i RTJsaw6anJLuXrBf2ZiZUfaEAB/WEI8= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4lrEdGRguSPR4qShMLKko8CWKzzucT2T64fHN+EuuOI=; b=I87XDzTKP9u7+TuhcfRf0S+j8+OvCeS8Uqf8xU/iDxQyD8jwK9Vvhufp6D2sgcVbE6dkRz DveHm/vJpYbm9no7UfPwA9ivfHHm6dTjyYi1pUsDjfK9pPrMfTkg7PjMy0Pwh4inIWxq/i RTJsaw6anJLuXrBf2ZiZUfaEAB/WEI8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 2B34513A7D for ; Thu, 23 May 2024 07:06:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id kDP0M+bqTmb0bwAAD6G6ig (envelope-from ) for ; Thu, 23 May 2024 07:06:14 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v6 3/5] btrfs: lock subpage ranges in one go for writepage_delalloc() Date: Thu, 23 May 2024 16:35:44 +0930 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; RCVD_TLS_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-btrfs@vger.kernel.org]; FROM_EQ_ENVFROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,suse.com:email] X-Spam-Score: -2.80 X-Spam-Flag: NO If we have a subpage range like this for a 16K page with 4K sectorsize: 0 4K 8K 12K 16K |/////| |//////| | |/////| = dirty range Currently writepage_delalloc() would go through the following steps: - lock range [0, 4K) - run delalloc range for [0, 4K) - lock range [8K, 12K) - run delalloc range for [8K 12K) So far it's fine for regular subpage writeback, as btrfs_run_delalloc_range() can only go into one of run_delalloc_nocow(), cow_file_range() and run_delalloc_compressed(). But there is a special pitfall for zoned subpage, where we will go through run_delalloc_cow(), which would create the ordered extent for the range and immediately submit the range. This would unlock the whole page range, causing all kinds of different ASSERT()s related to locked page. This patch would address the page unlocking problem of run_delalloc_cow(), by changing the workflow to the following one: - lock range [0, 4K) - lock range [8K, 12K) - run delalloc range for [0, 4K) - run delalloc range for [8K, 12K) So that run_delalloc_cow() can only unlock the full page until the last lock user released. To do that, this patch would: - Utilizing subpage locked bitmap So for every delalloc range we found, call btrfs_folio_set_writer_lock() to populate the subpage locked bitmap, and later btrfs_folio_end_all_writers() if the page is fully unlocked. So we know there is a delalloc range that needs to be run later. - Save the @delalloc_end as @last_delalloc_end inside writepage_delalloc() Since subpage locked bitmap is only for ranges inside the page, meanwhile we can have delalloc range ends beyond our page boundary, we have to save the @last_delalloc_end just in case it's beyond our page boundary. Although there is one extra point to notice: - We need to handle errors in previous iteration Since we can have multiple locked delalloc ranges thus we have to call run_delalloc_ranges() multiple times. If we hit an error half way, we still need to unlock the remaining ranges. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 104 ++++++++++++++++++++++++++++++++++++++++--- fs/btrfs/subpage.c | 6 +++ 2 files changed, 103 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 938061e0ce01..338067ce724a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1226,13 +1226,23 @@ static inline void contiguous_readpages(struct page *pages[], int nr_pages, static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, struct page *page, struct writeback_control *wbc) { + struct btrfs_fs_info *fs_info = inode_to_fs_info(&inode->vfs_inode); + struct folio *folio = page_folio(page); + const bool is_subpage = btrfs_is_subpage(fs_info, page->mapping); const u64 page_start = page_offset(page); const u64 page_end = page_start + PAGE_SIZE - 1; + /* + * Saves the last found delalloc end. As the delalloc end can go beyond + * page boundary, thus we can not rely on subpage bitmap to locate + * the last delalloc end. + */ + u64 last_delalloc_end = 0; u64 delalloc_start = page_start; u64 delalloc_end = page_end; u64 delalloc_to_write = 0; int ret = 0; + /* Lock all (subpage) delalloc ranges inside the page first. */ while (delalloc_start < page_end) { delalloc_end = page_end; if (!find_lock_delalloc_range(&inode->vfs_inode, page, @@ -1240,15 +1250,94 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, delalloc_start = delalloc_end + 1; continue; } - - ret = btrfs_run_delalloc_range(inode, page, delalloc_start, - delalloc_end, wbc); - if (ret < 0) - return ret; - + btrfs_folio_set_writer_lock(fs_info, folio, delalloc_start, + min(delalloc_end, page_end) + 1 - + delalloc_start); + last_delalloc_end = delalloc_end; delalloc_start = delalloc_end + 1; } + delalloc_start = page_start; + if (!last_delalloc_end) + goto out; + + /* Run the delalloc ranges for above locked ranges. */ + while (delalloc_start < page_end) { + u64 found_start; + u32 found_len; + bool found; + + if (!is_subpage) { + /* + * For non-subpage case, the found delalloc range must + * cover this page and there must be only one locked + * delalloc range. + */ + found_start = page_start; + found_len = last_delalloc_end + 1 - found_start; + found = true; + } else { + found = btrfs_subpage_find_writer_locked(fs_info, folio, + delalloc_start, &found_start, &found_len); + } + if (!found) + break; + /* + * The subpage range covers the last sector, the delalloc range may + * end beyonds the page boundary, use the saved delalloc_end + * instead. + */ + if (found_start + found_len >= page_end) + found_len = last_delalloc_end + 1 - found_start; + + if (likely(ret >= 0)) { + /* No errors hit so far, run the current delalloc range. */ + ret = btrfs_run_delalloc_range(inode, page, found_start, + found_start + found_len - 1, wbc); + } else { + /* + * We hit error during previous delalloc range, has to cleanup + * the remaining locked ranges. + */ + unlock_extent(&inode->io_tree, found_start, + found_start + found_len - 1, NULL); + __unlock_for_delalloc(&inode->vfs_inode, page, found_start, + found_start + found_len - 1); + } + + /* + * We can hit btrfs_run_delalloc_range() with >0 return value. + * + * This happens when either the IO is already done and page + * unlocked (inline) or the IO submission and page unlock would + * be handled async (compression). + * + * Inline is only possible for regular sectorsize for now. + * + * Compression is possible for both subpage and regular cases, + * but even for subpage compression only happens for page aligned + * range, thus the found delalloc range must go beyond current + * page. + */ + if (ret > 0) + ASSERT(!is_subpage || found_start + found_len >= page_end); + + /* + * Above btrfs_run_delalloc_range() may have unlocked the page, + * Thus for the last range, we can not touch the page anymore. + */ + if (found_start + found_len >= last_delalloc_end + 1) + break; + + delalloc_start = found_start + found_len; + } + if (ret < 0) + return ret; +out: + if (last_delalloc_end) + delalloc_end = last_delalloc_end; + else + delalloc_end = page_end; /* * delalloc_end is already one less than the total length, so * we don't subtract one from PAGE_SIZE @@ -1520,7 +1609,8 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl PAGE_SIZE, !ret); mapping_set_error(page->mapping, ret); } - unlock_page(page); + + btrfs_folio_end_all_writers(inode_to_fs_info(inode), folio); ASSERT(ret <= 0); return ret; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 8bf83dd3313d..fe99a8ea94c0 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -868,6 +868,7 @@ bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info, void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info, struct folio *folio) { + struct btrfs_subpage *subpage = folio_get_private(folio); u64 folio_start = folio_pos(folio); u64 cur = folio_start; @@ -877,6 +878,11 @@ void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info, return; } + /* The page has no new delalloc range locked on it. Just plain unlock. */ + if (atomic_read(&subpage->writers) == 0) { + folio_unlock(folio); + return; + } while (cur < folio_start + PAGE_SIZE) { u64 found_start; u32 found_len; From patchwork Thu May 23 07:05:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13671325 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBB0213C9B0 for ; Thu, 23 May 2024 07:06:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447982; cv=none; b=WiEXqwTaUvvfZB+ErVwgge7TWcwNJzHRrXct+h6mdsG6hKBrigtacJltlrMpBmp1nb9LOl6IoyXqcOwaiOuihf7AJtyrCmzEYRAvGPDP+ETvGpueEZjAjCg7lm+iDFrLDci64tbpVZOtxLezYJJ/UF0DBWLctctlfrzFVvIwAq8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447982; c=relaxed/simple; bh=FN3VPO1eBeGL8bTyby5X1FLW4GGZjLhPvxGmkzI4uBE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Cs3OEuJby7CV4s4VULcmpdSFkgh5O7Lq+oEUb3pSZMcGLwkviYGHkYnAp24kUl2dmaNKW6PF/L2Sfhgy1uxIDNYpoY1WKlcsKjQ2nGlLUK9+UwW7hZtSszoZyAb9u+UQuzBjTjpUv0G4wf26/n+6GnFeV6wyjkVuf70CIFLcqEU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=ZaaCHs+J; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=Ut90Ol+5; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="ZaaCHs+J"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Ut90Ol+5" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EAB0722296; Thu, 23 May 2024 07:06:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447978; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PEudq40Jp4bmQLcthC7fNXjqzsiZOlC3dna6fC3kpu0=; b=ZaaCHs+J+ARwEuWjSBx70dUHfxLAzExgs1S8t3UPu7aGo3CMDe+/SPebqMQBaDVPRSs+1X ACMrUZOTZkfuZN6UPfZBdPc7PR3XSdGUt+EPALROR1HH8zg8ZuP28pA6d3WXTSxjv9Snoc NkfoA6Obh//jwgyWmAPaBwCpSBfJBFo= Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447977; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PEudq40Jp4bmQLcthC7fNXjqzsiZOlC3dna6fC3kpu0=; b=Ut90Ol+5zEsc6pmf5LZAqjjzjQLpX82PNKAp5p9j0YmnN57OifAUXfwxoCnf5zsaEZT7ss 7SEN/qxeIQCtl1v6QBhXCf9spUIAIw09lNKpudb1UfPm8Mpu1s73BZudNNZjxp6NyMU2iw fRi0EJYZeUGkhHoySNRXhaOu2ApnOZ8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id A6D5113A7D; Thu, 23 May 2024 07:06:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id WOZdGOjqTmb0bwAAD6G6ig (envelope-from ); Thu, 23 May 2024 07:06:16 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn Subject: [PATCH v6 4/5] btrfs: do not clear page dirty inside extent_write_locked_range() Date: Thu, 23 May 2024 16:35:45 +0930 Message-ID: <9e9e4c6a3f7b82f33e39d05e77e67eea88789701.1716445070.git.wqu@suse.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Flag: NO X-Spam-Score: -2.80 X-Spam-Level: X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,imap1.dmz-prg2.suse.org:helo]; RCVD_TLS_ALL(0.00)[] [BUG] For subpage + zoned case, the following workload can lead to rsv data leak at unmount time: # mkfs.btrfs -f -s 4k $dev # mount $dev $mnt # fsstress -w -n 8 -d $mnt -s 1709539240 0/0: fiemap - no filename 0/1: copyrange read - no filename 0/2: write - no filename 0/3: rename - no source filename 0/4: creat f0 x:0 0 0 0/4: creat add id=0,parent=-1 0/5: writev f0[259 1 0 0 0 0] [778052,113,965] 0 0/6: ioctl(FIEMAP) f0[259 1 0 0 224 887097] [1294220,2291618343991484791,0x10000] -1 0/7: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 224 887097] return 25, fallback to stat() 0/7: dwrite f0[259 1 0 0 224 887097] [696320,102400] 0 # umount $mnt The dmesg would include the following rsv leak detection wanring (all call trace skipped): ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8653 btrfs_destroy_inode+0x1e0/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8654 btrfs_destroy_inode+0x1a8/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8660 btrfs_destroy_inode+0x1a0/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): last unmount of filesystem 1b4abba9-de34-4f07-9e7f-157cf12a18d6 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): space_info DATA has 268218368 free, is not full BTRFS info (device sda): space_info total=268435456, used=204800, pinned=0, reserved=0, may_use=12288, readonly=0 zone_unusable=0 BTRFS info (device sda): global_block_rsv: size 0 reserved 0 BTRFS info (device sda): trans_block_rsv: size 0 reserved 0 BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): space_info METADATA has 267796480 free, is not full BTRFS info (device sda): space_info total=268435456, used=131072, pinned=0, reserved=0, may_use=262144, readonly=0 zone_unusable=245760 BTRFS info (device sda): global_block_rsv: size 0 reserved 0 BTRFS info (device sda): trans_block_rsv: size 0 reserved 0 BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0 Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone append size of 64K, and the system has 64K page size. [CAUSE] I have added several trace_printk() to show the events (header skipped): > btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688 > btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288 > btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536 > btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864 The above lines shows our buffered write has dirtied 3 pages of inode 259 of root 5: 704K 768K 832K 896K I |////I/////////////////I///////////| I 756K 868K |///| is the dirtied range using subpage bitmaps. and 'I' is the page boundary. Meanwhile all three pages (704K, 768K, 832K) all have its PageDirty flag set. > btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400 Then direct IO write starts, since the range [680K, 780K) covers the beginning part of the above dirty range, btrfs needs to writeback the two pages at 704K and 768K. > cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536 > extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536 Now the above 2 lines shows that, we're writing back for dirty range [756K, 756K + 64K). We only writeback 64K because the zoned device has max zone append size as 64K. > extent_write_locked_range: r/i=5/259 clear dirty for page=786432 !!! The above line shows the root cause. !!! We're calling clear_page_dirty_for_io() inside extent_write_locked_range(), for the page 768K. This is because extent_write_locked_range() can go beyond the current locked page, here we hit the page at 768K and clear it's page dirt. In fact this would lead to the desync between subpage dirty and page dirty flags. We have the page dirty flag cleared, but the subpage range [820K, 832K) is still dirty. After the writeback of range [756K, 820K), the dirty flags looks like this, as page 768K no longer has dirty flag set. 704K 768K 832K 896K I I | I/////////////| I 820K 868K This means we will no longer writeback range [820K, 832K), thus the reserved data/metadata space would never be properly released. > extent_write_cache_pages: r/i=5/259 skip non-dirty folio=786432 Now even we try to start wrteiback for page 768K, since the page is not dirty, we completely skip it at extent_write_cache_pages() time. > btrfs_direct_write: r/i=5/259 dio done filepos=696320 len=0 Now the direct IO finished. > cow_file_range: r/i=5/259 add ordered extent filepos=851968 len=36864 > extent_write_locked_range: r/i=5/259 locked page=851968 start=851968 len=36864 Now we writeback the remaining dirty range, which is [832K, 868K). Causing the range [820K, 832K) never be submitted, thus leaking the reserved space. This bug only affects subpage and zoned case. For non-subpage and zoned case, we have exact one sector for each page, thus no such partial dirty cases. For subpage and non-zoned case, we never go into run_delalloc_cow(), and normally all the dirty subpage ranges would be properly submitted inside __extent_writepage_io(). [FIX] Just do not clear the page dirty at all inside extent_write_locked_range(). As __extent_writepage_io() would do a more accurate, subpage compatible clear for page and subpage dirty flags anyway. Now the correct trace would look like this: > btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688 > btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288 > btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536 > btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864 The page dirty part is still the same 3 pages. > btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400 > cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536 > extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536 And the writeback for the first 64K is still correct. > cow_file_range: r/i=5/259 add ordered extent filepos=839680 len=49152 > extent_write_locked_range: r/i=5/259 locked page=786432 start=839680 len=49152 Now with the fix, we can properly writeback the range [820K, 832K), and properly release the reserved data/metadata space. Reviewed-by: Johannes Thumshirn Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 338067ce724a..2174c0e0fb15 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2340,10 +2340,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, page = find_get_page(mapping, cur >> PAGE_SHIFT); ASSERT(PageLocked(page)); - if (pages_dirty && page != locked_page) { + if (pages_dirty && page != locked_page) ASSERT(PageDirty(page)); - clear_page_dirty_for_io(page); - } ret = __extent_writepage_io(BTRFS_I(inode), page, cur, cur_len, &bio_ctrl, i_size, &nr); From patchwork Thu May 23 07:05:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13671326 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5141413C9C8 for ; Thu, 23 May 2024 07:06:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447984; cv=none; b=fMv7BohmfwMuhZ8xhLgCJEPZrbuCUzp9iRxQNWy2ct+ukf7dsgoQgq9T2THmNoBcwjDShMNHZQKIHpiVrVs/+u3QPk1s1l1n/uM82o/U8C0b3HlONTZIRSr7sEuPLKdPZXMBPRUzC2ytg2kxrULkpZ3jhSu+kA+o4oaP/irBbAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716447984; c=relaxed/simple; bh=DDqLYl4NIyPAtnlMT2v3luTIDtK4rluu2Ki0MrFTulQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rWEU07bjUK1/fna1jkHKZeIhSG8rHJVNZXDHzFSQIym6NkgHrpnHy2R6li8VHvvL+x9e3041v8DnUEnGeIAWmT/RYEJ2tOoPnaE3W0/SCMxFK1iEH3MXmfk9vT+sE+q2jR632kuJkz3waRAQtQou4p03ghpjn5eKm1FLCQ6serA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=t18n3Rxd; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=t18n3Rxd; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="t18n3Rxd"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="t18n3Rxd" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6682422293; Thu, 23 May 2024 07:06:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447980; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DgvwEHu07nJ+BmswWnWW2Fe8xJQbIcRv+TCVL77ny4s=; b=t18n3Rxd+QrSp20oQM8/+us6qfBdk2BF8BfoHpdix8hODjQJpTXa9TxGFCYoLPyW9eQ/ci q6WmTeTmX8BVXltgICrddNA7IQwPp40ZABU1dUyCFF9L+sNkYvERt8QUjxnnBbyHfLI+mo NTi5MNHSI426728i3Q8dbEJl4YbtY58= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=t18n3Rxd DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1716447980; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DgvwEHu07nJ+BmswWnWW2Fe8xJQbIcRv+TCVL77ny4s=; b=t18n3Rxd+QrSp20oQM8/+us6qfBdk2BF8BfoHpdix8hODjQJpTXa9TxGFCYoLPyW9eQ/ci q6WmTeTmX8BVXltgICrddNA7IQwPp40ZABU1dUyCFF9L+sNkYvERt8QUjxnnBbyHfLI+mo NTi5MNHSI426728i3Q8dbEJl4YbtY58= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 87A8F13A7D; Thu, 23 May 2024 07:06:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id CLP+DurqTmb0bwAAD6G6ig (envelope-from ); Thu, 23 May 2024 07:06:18 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Josef Bacik , Johannes Thumshirn , Naohiro Aota Subject: [PATCH v6 5/5] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Date: Thu, 23 May 2024 16:35:46 +0930 Message-ID: <3013967ef7a76d7962ac8824a18f8e0eabee9945.1716445070.git.wqu@suse.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-2.61 / 50.00]; BAYES_HAM(-2.60)[98.21%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; MIME_TRACE(0.00)[0:+]; TO_DN_SOME(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_TLS_ALL(0.00)[]; DKIM_TRACE(0.00)[suse.com:+]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_VIA_SMTP_AUTH(0.00)[]; DWL_DNSWL_BLOCKED(0.00)[suse.com:dkim]; RCPT_COUNT_THREE(0.00)[4]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:email,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns] X-Rspamd-Action: no action X-Rspamd-Queue-Id: 6682422293 X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Flag: NO X-Spam-Score: -2.61 When extent_write_locked_range() generated an inline extent, it would set and finish the writeback for the whole page. Although currently it's safe since subpage disables inline creation, for the sake of consistency, let it go with subpage helpers to set and clear the writeback flags. Reviewed-by: Josef Bacik Reviewed-by: Johannes Thumshirn Reviewed-by: Naohiro Aota Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2174c0e0fb15..1aac7b8fa7e2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2336,6 +2336,7 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end); u32 cur_len = cur_end + 1 - cur; struct page *page; + struct folio *folio; int nr = 0; page = find_get_page(mapping, cur >> PAGE_SHIFT); @@ -2350,8 +2351,9 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, /* Make sure the mapping tag for page dirty gets cleared. */ if (nr == 0) { - set_page_writeback(page); - end_page_writeback(page); + folio = page_folio(page); + btrfs_folio_set_writeback(fs_info, folio, cur, cur_len); + btrfs_folio_clear_writeback(fs_info, folio, cur, cur_len); } if (ret) { btrfs_mark_ordered_io_finished(BTRFS_I(inode), page,