diff mbox series

fsdax: force clear dirty mark if CoW

Message ID 1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com (mailing list archive)
State Accepted
Commit f76b3a32879de215ced3f8c754c4077b0c2f79e3
Headers show
Series fsdax: force clear dirty mark if CoW | expand

Commit Message

Shiyang Ruan March 24, 2023, 10:28 a.m. UTC
XFS allows CoW on non-shared extents to combat fragmentation[1].  The
old non-shared extent could be mwrited before, its dax entry is marked
dirty.  To be able to delete this entry, clear its dirty mark before
invalidate_inode_pages2_range().

[1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 fs/dax.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

Comments

Andrew Morton March 24, 2023, 7:42 p.m. UTC | #1
On Fri, 24 Mar 2023 10:28:00 +0000 Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:

> XFS allows CoW on non-shared extents to combat fragmentation[1].  The
> old non-shared extent could be mwrited before, its dax entry is marked
> dirty.  To be able to delete this entry, clear its dirty mark before
> invalidate_inode_pages2_range().

What are the user-visible runtime effects of this flaw?

Are we able to identify a Fixes: target for this?  Perhaps
f80e1668888f3 ("fsdax: invalidate pages when CoW")?
Shiyang Ruan March 27, 2023, 3:19 a.m. UTC | #2
在 2023/3/25 3:42, Andrew Morton 写道:
> On Fri, 24 Mar 2023 10:28:00 +0000 Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:
> 
>> XFS allows CoW on non-shared extents to combat fragmentation[1].  The
>> old non-shared extent could be mwrited before, its dax entry is marked
>> dirty.  To be able to delete this entry, clear its dirty mark before
>> invalidate_inode_pages2_range().
> 
> What are the user-visible runtime effects of this flaw?

This bug won't leak or mess up the data of filesystem.  In dmesg it will 
show like this:

[   28.512349] ------------[ cut here ]------------
[   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 
dax_insert_entry+0x342/0x390
[   28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs 
lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
[   28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 
6.3.0-rc1-00001-g85e1481e19c1-dirty #117
[   28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
Arch Linux 1.16.1-1-1 04/01/2014
[   28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
[   28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 
8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff 
ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
[   28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086
[   28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 
000000000018824b
[   28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 
00000000ffffffff
[   28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: 
ffffc9000845f9c8
[   28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 
0000000000000155
[   28.519301] R13: 0000000000000000 R14: 000000000018824b R15: 
ffff88810cfa76b8
[   28.519703] FS:  00007f14a0c94740(0000) GS:ffff88817bd00000(0000) 
knlGS:0000000000000000
[   28.520148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 
0000000000770ee0
[   28.520863] PKRU: 55555554
[   28.521043] Call Trace:
[   28.521219]  <TASK>
[   28.521368]  dax_fault_iter+0x196/0x390
[   28.521595]  dax_iomap_pte_fault+0x19b/0x3d0
[   28.521852]  __xfs_filemap_fault+0x234/0x2b0
[   28.522116]  __do_fault+0x30/0x130
[   28.522334]  do_fault+0x193/0x340
[   28.522586]  __handle_mm_fault+0x2d3/0x690
[   28.522975]  handle_mm_fault+0xe6/0x2c0
[   28.523259]  do_user_addr_fault+0x1bc/0x6f0
[   28.523521]  exc_page_fault+0x60/0x140
[   28.523763]  asm_exc_page_fault+0x22/0x30
[   28.524001] RIP: 0033:0x7f14a0b589ca
[   28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 
47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 
fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[   28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202
[   28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 
0000000000006046
[   28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 
00007f14a0c8d000
[   28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 
000000000014a000
[   28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 
028f5c28f5c28f5c
[   28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 
00007f14a0c946c0
[   28.527449]  </TASK>
[   28.527600] ---[ end trace 0000000000000000 ]---

 >
 > Are we able to identify a Fixes: target for this?  Perhaps
 > f80e1668888f3 ("fsdax: invalidate pages when CoW")?
 >

Yes, it is to fix this commit.


--
Thanks,
Ruan.
Andrew Morton March 27, 2023, 8:24 p.m. UTC | #3
On Mon, 27 Mar 2023 11:19:01 +0800 Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:

> 
> 
> 在 2023/3/25 3:42, Andrew Morton 写道:
> > On Fri, 24 Mar 2023 10:28:00 +0000 Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:
> > 
> >> XFS allows CoW on non-shared extents to combat fragmentation[1].  The
> >> old non-shared extent could be mwrited before, its dax entry is marked
> >> dirty.  To be able to delete this entry, clear its dirty mark before
> >> invalidate_inode_pages2_range().
> > 
> > What are the user-visible runtime effects of this flaw?
> 
> This bug won't leak or mess up the data of filesystem.  In dmesg it will 
> show like this:
> 
> [   28.512349] ------------[ cut here ]------------
> [   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 
> dax_insert_entry+0x342/0x390
> 
> ...
>
>  >
>  > Are we able to identify a Fixes: target for this?  Perhaps
>  > f80e1668888f3 ("fsdax: invalidate pages when CoW")?
>  >
> 
> Yes, it is to fix this commit.

OK, thanks.  I added the extra changelog info, added the Fixes and a
cc:stable.

Some review from other fsdax developers would be helpful, please.
diff mbox series

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 5d2e9b10030e..2ababb89918d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -781,6 +781,33 @@  static int __dax_invalidate_entry(struct address_space *mapping,
 	return ret;
 }
 
+static int __dax_clear_dirty_range(struct address_space *mapping,
+		pgoff_t start, pgoff_t end)
+{
+	XA_STATE(xas, &mapping->i_pages, start);
+	unsigned int scanned = 0;
+	void *entry;
+
+	xas_lock_irq(&xas);
+	xas_for_each(&xas, entry, end) {
+		entry = get_unlocked_entry(&xas, 0);
+		xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY);
+		xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
+		put_unlocked_entry(&xas, entry, WAKE_NEXT);
+
+		if (++scanned % XA_CHECK_SCHED)
+			continue;
+
+		xas_pause(&xas);
+		xas_unlock_irq(&xas);
+		cond_resched();
+		xas_lock_irq(&xas);
+	}
+	xas_unlock_irq(&xas);
+
+	return 0;
+}
+
 /*
  * Delete DAX entry at @index from @mapping.  Wait for it
  * to be unlocked before deleting it.
@@ -1440,6 +1467,16 @@  static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 	 * written by write(2) is visible in mmap.
 	 */
 	if (iomap->flags & IOMAP_F_NEW || cow) {
+		/*
+		 * Filesystem allows CoW on non-shared extents. The src extents
+		 * may have been mmapped with dirty mark before. To be able to
+		 * invalidate its dax entries, we need to clear the dirty mark
+		 * in advance.
+		 */
+		if (cow)
+			__dax_clear_dirty_range(iomi->inode->i_mapping,
+						pos >> PAGE_SHIFT,
+						(end - 1) >> PAGE_SHIFT);
 		invalidate_inode_pages2_range(iomi->inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);