dax: make sure inodes are flushed before destroy cache

Message ID	20220212071111.148575-1-ztong0001@gmail.com (mailing list archive)
State	Accepted
Commit	2166a9974902d277cc03f15027d72c4d6ab2a256
Headers	show Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED1457E for <nvdimm@lists.linux.dev>; Sat, 12 Feb 2022 07:11:35 +0000 (UTC) From: Tong Zhang <ztong0001@gmail.com> To: Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Dave Jiang <dave.jiang@intel.com>, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Tong Zhang <ztong0001@gmail.com> Subject: [PATCH] dax: make sure inodes are flushed before destroy cache Date: Fri, 11 Feb 2022 23:11:11 -0800 Message-Id: <20220212071111.148575-1-ztong0001@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	dax: make sure inodes are flushed before destroy cache \| expand dax: make sure inodes are flushed before destroy cache

Message ID

20220212071111.148575-1-ztong0001@gmail.com (mailing list archive)

State

Accepted

Commit

2166a9974902d277cc03f15027d72c4d6ab2a256

Headers

From: Tong Zhang <ztong0001@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	nvdimm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Cc: Tong Zhang <ztong0001@gmail.com>
Subject: [PATCH] dax: make sure inodes are flushed before destroy cache
Date: Fri, 11 Feb 2022 23:11:11 -0800
Message-Id: <20220212071111.148575-1-ztong0001@gmail.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

dax: make sure inodes are flushed before destroy cache | expand

Commit Message

Tong Zhang Feb. 12, 2022, 7:11 a.m. UTC

A bug can be triggered by following command

$ modprobe nd_pmem && modprobe -r nd_pmem

[   10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on __kmem_cache_shutdown()
[   10.060938] Slab 0x0000000085b729ac objects=9 used=1 fp=0x000000004f5ae469 flags=0x200000000010200(slab|head|node)
[   10.062433] Call Trace:
[   10.062673]  dump_stack_lvl+0x34/0x44
[   10.062865]  slab_err+0x90/0xd0
[   10.063619]  __kmem_cache_shutdown+0x13b/0x2f0
[   10.063848]  kmem_cache_destroy+0x4a/0x110
[   10.064058]  __x64_sys_delete_module+0x265/0x300

This is caused by dax_fs_exit() not flushing inodes before destroy cache.
To fix this issue, call rcu_barrier() before destroy cache.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
---
 drivers/dax/super.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Christoph Hellwig Feb. 14, 2022, 2:07 p.m. UTC | #1

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Ira Weiny Feb. 14, 2022, 5:59 p.m. UTC | #2

On Fri, Feb 11, 2022 at 11:11:11PM -0800, Tong Zhang wrote:
> A bug can be triggered by following command
> 
> $ modprobe nd_pmem && modprobe -r nd_pmem
> 
> [   10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on __kmem_cache_shutdown()
> [   10.060938] Slab 0x0000000085b729ac objects=9 used=1 fp=0x000000004f5ae469 flags=0x200000000010200(slab|head|node)
> [   10.062433] Call Trace:
> [   10.062673]  dump_stack_lvl+0x34/0x44
> [   10.062865]  slab_err+0x90/0xd0
> [   10.063619]  __kmem_cache_shutdown+0x13b/0x2f0
> [   10.063848]  kmem_cache_destroy+0x4a/0x110
> [   10.064058]  __x64_sys_delete_module+0x265/0x300
> 
> This is caused by dax_fs_exit() not flushing inodes before destroy cache.
> To fix this issue, call rcu_barrier() before destroy cache.

I don't doubt that this fixes the bug.  However, I can't help but think this is
hiding a bug, or perhaps a missing step, in the kmem_cache layer?  As far as I
can see dax does not call call_rcu() and only uses srcu not rcu?  I was tempted
to suggest srcu_barrier() but dax does not call call_srcu() either.

So I'm not clear about what is really going on and why this fixes it.  I know
that dax is not using srcu is a standard way so perhaps this helps in a way I
don't quite grok?  If so perhaps a comment here would be in order?

Ira

> 
> Signed-off-by: Tong Zhang <ztong0001@gmail.com>
> ---
>  drivers/dax/super.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e3029389d809..6bd565fe2e63 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -476,6 +476,7 @@ static int dax_fs_init(void)
>  static void dax_fs_exit(void)
>  {
>  	kern_unmount(dax_mnt);
> +	rcu_barrier();
>  	kmem_cache_destroy(dax_cache);
>  }
>  
> -- 
> 2.25.1
> 
>

Dan Williams Feb. 14, 2022, 8:09 p.m. UTC | #3

On Mon, Feb 14, 2022 at 9:59 AM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Fri, Feb 11, 2022 at 11:11:11PM -0800, Tong Zhang wrote:
> > A bug can be triggered by following command
> >
> > $ modprobe nd_pmem && modprobe -r nd_pmem
> >
> > [   10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on __kmem_cache_shutdown()
> > [   10.060938] Slab 0x0000000085b729ac objects=9 used=1 fp=0x000000004f5ae469 flags=0x200000000010200(slab|head|node)
> > [   10.062433] Call Trace:
> > [   10.062673]  dump_stack_lvl+0x34/0x44
> > [   10.062865]  slab_err+0x90/0xd0
> > [   10.063619]  __kmem_cache_shutdown+0x13b/0x2f0
> > [   10.063848]  kmem_cache_destroy+0x4a/0x110
> > [   10.064058]  __x64_sys_delete_module+0x265/0x300
> >
> > This is caused by dax_fs_exit() not flushing inodes before destroy cache.
> > To fix this issue, call rcu_barrier() before destroy cache.
>
> I don't doubt that this fixes the bug.  However, I can't help but think this is
> hiding a bug, or perhaps a missing step, in the kmem_cache layer?  As far as I
> can see dax does not call call_rcu() and only uses srcu not rcu?  I was tempted
> to suggest srcu_barrier() but dax does not call call_srcu() either.

This rcu_barrier() is associated with the call_rcu() in destroy_inode().

While kern_unmount() does a full sycnrhonize_rcu() after clearing
->mnt_ns. Any pending destroy_inode() callbacks need to be flushed
before the kmem_cache is destroyed.

> So I'm not clear about what is really going on and why this fixes it.  I know
> that dax is not using srcu is a standard way so perhaps this helps in a way I
> don't quite grok?  If so perhaps a comment here would be in order?

Looks like a common pattern I missed that all filesystem exit paths implement.

Ira Weiny Feb. 14, 2022, 11:12 p.m. UTC | #4

On Mon, Feb 14, 2022 at 12:09:54PM -0800, Dan Williams wrote:
> On Mon, Feb 14, 2022 at 9:59 AM Ira Weiny <ira.weiny@intel.com> wrote:
> >
> > On Fri, Feb 11, 2022 at 11:11:11PM -0800, Tong Zhang wrote:
> > > A bug can be triggered by following command
> > >
> > > $ modprobe nd_pmem && modprobe -r nd_pmem
> > >
> > > [   10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on __kmem_cache_shutdown()
> > > [   10.060938] Slab 0x0000000085b729ac objects=9 used=1 fp=0x000000004f5ae469 flags=0x200000000010200(slab|head|node)
> > > [   10.062433] Call Trace:
> > > [   10.062673]  dump_stack_lvl+0x34/0x44
> > > [   10.062865]  slab_err+0x90/0xd0
> > > [   10.063619]  __kmem_cache_shutdown+0x13b/0x2f0
> > > [   10.063848]  kmem_cache_destroy+0x4a/0x110
> > > [   10.064058]  __x64_sys_delete_module+0x265/0x300
> > >
> > > This is caused by dax_fs_exit() not flushing inodes before destroy cache.
> > > To fix this issue, call rcu_barrier() before destroy cache.
> >
> > I don't doubt that this fixes the bug.  However, I can't help but think this is
> > hiding a bug, or perhaps a missing step, in the kmem_cache layer?  As far as I
> > can see dax does not call call_rcu() and only uses srcu not rcu?  I was tempted
> > to suggest srcu_barrier() but dax does not call call_srcu() either.
> 
> This rcu_barrier() is associated with the call_rcu() in destroy_inode().

Ok yea.

> 
> While kern_unmount() does a full sycnrhonize_rcu() after clearing
> ->mnt_ns. Any pending destroy_inode() callbacks need to be flushed
> before the kmem_cache is destroyed.
> 
> > So I'm not clear about what is really going on and why this fixes it.  I know
> > that dax is not using srcu is a standard way so perhaps this helps in a way I
> > don't quite grok?  If so perhaps a comment here would be in order?
> 
> Looks like a common pattern I missed that all filesystem exit paths implement.

I think a comment would be in order, especially since since it looks like every
other FS has one:

fs/ext4/super.c:

...
        /*
         * Make sure all delayed rcu free inodes are flushed before we
         * destroy cache.
         */
        rcu_barrier();  
...

Anyway ok.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

Thanks for looking Dan,
Ira

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e3029389d809..6bd565fe2e63 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -476,6 +476,7 @@  static int dax_fs_init(void)
 static void dax_fs_exit(void)
 {
 	kern_unmount(dax_mnt);
+	rcu_barrier();
 	kmem_cache_destroy(dax_cache);
 }

dax: make sure inodes are flushed before destroy cache

Commit Message

Comments

Patch