Message ID | 20210604011844.1756145-4-ruansy.fnst@fujitsu.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | fsdax: introduce fs query to support reflink | expand |
[ drop old linux-nvdimm@lists.01.org, add nvdimm@lists.linux.dev ] On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote: > > Memory failure occurs in fsdax mode will finally be handled in > filesystem. We introduce this interface to find out files or metadata > affected by the corrupted range, and try to recover the corrupted data > if possiable. > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > --- > include/linux/fs.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c3c88fdb9b2a..92af36c4225f 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2176,6 +2176,8 @@ struct super_operations { > struct shrink_control *); > long (*free_cached_objects)(struct super_block *, > struct shrink_control *); > + int (*corrupted_range)(struct super_block *sb, struct block_device *bdev, > + loff_t offset, size_t len, void *data); Why does the superblock need a new operation? Wouldn't whatever function is specified here just be specified to the dax_dev as the ->notify_failure() holder callback? -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
> -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock > > [ drop old linux-nvdimm@lists.01.org, add nvdimm@lists.linux.dev ] > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote: > > > > Memory failure occurs in fsdax mode will finally be handled in > > filesystem. We introduce this interface to find out files or metadata > > affected by the corrupted range, and try to recover the corrupted data > > if possiable. > > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > > --- > > include/linux/fs.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index > > c3c88fdb9b2a..92af36c4225f 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -2176,6 +2176,8 @@ struct super_operations { > > struct shrink_control *); > > long (*free_cached_objects)(struct super_block *, > > struct shrink_control *); > > + int (*corrupted_range)(struct super_block *sb, struct block_device > *bdev, > > + loff_t offset, size_t len, void *data); > > Why does the superblock need a new operation? Wouldn't whatever function is > specified here just be specified to the dax_dev as the > ->notify_failure() holder callback? Because we need to find out which file is effected by the given poison page so that memory-failure code can do collect_procs() and kill_procs() jobs. And it needs filesystem to use its rmap feature to search the file from a given offset. So, we need this implemented by the specified filesystem and called by dax_device's holder. This is the call trace I described in cover letter: memory_failure() * fsdax case pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() dax_device->holder_ops->corrupted_range() => - fs_dax_corrupted_range() - md_dax_corrupted_range() sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <== **HERE** xfs_rmap_query_range() xfs_currupt_helper() * corrupted on metadata try to recover data, call xfs_force_shutdown() * corrupted on file data try to recover data, call mf_dax_kill_procs() * normal case mf_generic_kill_procs() As you can see, this new added operation is an important for the whole progress. -- Thanks, Ruan. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 16, 2021 at 11:51 PM ruansy.fnst@fujitsu.com <ruansy.fnst@fujitsu.com> wrote: > > > -----Original Message----- > > From: Dan Williams <dan.j.williams@intel.com> > > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock > > > > [ drop old linux-nvdimm@lists.01.org, add nvdimm@lists.linux.dev ] > > > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote: > > > > > > Memory failure occurs in fsdax mode will finally be handled in > > > filesystem. We introduce this interface to find out files or metadata > > > affected by the corrupted range, and try to recover the corrupted data > > > if possiable. > > > > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > > > --- > > > include/linux/fs.h | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index > > > c3c88fdb9b2a..92af36c4225f 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -2176,6 +2176,8 @@ struct super_operations { > > > struct shrink_control *); > > > long (*free_cached_objects)(struct super_block *, > > > struct shrink_control *); > > > + int (*corrupted_range)(struct super_block *sb, struct block_device > > *bdev, > > > + loff_t offset, size_t len, void *data); > > > > Why does the superblock need a new operation? Wouldn't whatever function is > > specified here just be specified to the dax_dev as the > > ->notify_failure() holder callback? > > Because we need to find out which file is effected by the given poison page so that memory-failure code can do collect_procs() and kill_procs() jobs. And it needs filesystem to use its rmap feature to search the file from a given offset. So, we need this implemented by the specified filesystem and called by dax_device's holder. > > This is the call trace I described in cover letter: > memory_failure() > * fsdax case > pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() > dax_device->holder_ops->corrupted_range() => > - fs_dax_corrupted_range() > - md_dax_corrupted_range() > sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <== **HERE** > xfs_rmap_query_range() > xfs_currupt_helper() > * corrupted on metadata > try to recover data, call xfs_force_shutdown() > * corrupted on file data > try to recover data, call mf_dax_kill_procs() > * normal case > mf_generic_kill_procs() > > As you can see, this new added operation is an important for the whole progress. I don't think you need either fs_dax_corrupted_range() nor sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range() looks broken because the filesystem may not even be mounted on the device associated with the error. The holder_data and holder_op should be sufficient from communicating the stack of notifications: pgmap->notify_memory_failure() => pmem_pgmap_notify_failure() pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) => md_dax_notify_failure() md_dax_dev->holder_ops->notify_failure() => xfs_notify_failure() I.e. the entire chain just walks dax_dev holder ops. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
> -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock > > On Wed, Jun 16, 2021 at 11:51 PM ruansy.fnst@fujitsu.com > <ruansy.fnst@fujitsu.com> wrote: > > > > > -----Original Message----- > > > From: Dan Williams <dan.j.williams@intel.com> > > > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for > > > superblock > > > > > > [ drop old linux-nvdimm@lists.01.org, add nvdimm@lists.linux.dev ] > > > > > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> > wrote: > > > > > > > > Memory failure occurs in fsdax mode will finally be handled in > > > > filesystem. We introduce this interface to find out files or > > > > metadata affected by the corrupted range, and try to recover the > > > > corrupted data if possiable. > > > > > > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > > > > --- > > > > include/linux/fs.h | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index > > > > c3c88fdb9b2a..92af36c4225f 100644 > > > > --- a/include/linux/fs.h > > > > +++ b/include/linux/fs.h > > > > @@ -2176,6 +2176,8 @@ struct super_operations { > > > > struct shrink_control *); > > > > long (*free_cached_objects)(struct super_block *, > > > > struct shrink_control *); > > > > + int (*corrupted_range)(struct super_block *sb, struct > > > > + block_device > > > *bdev, > > > > + loff_t offset, size_t len, void > > > > + *data); > > > > > > Why does the superblock need a new operation? Wouldn't whatever > > > function is specified here just be specified to the dax_dev as the > > > ->notify_failure() holder callback? > > > > Because we need to find out which file is effected by the given poison page so > that memory-failure code can do collect_procs() and kill_procs() jobs. And it > needs filesystem to use its rmap feature to search the file from a given offset. > So, we need this implemented by the specified filesystem and called by > dax_device's holder. > > > > This is the call trace I described in cover letter: > > memory_failure() > > * fsdax case > > pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() > > dax_device->holder_ops->corrupted_range() => > > - fs_dax_corrupted_range() > > - md_dax_corrupted_range() > > sb->s_ops->currupted_range() => xfs_fs_corrupted_range() <== > **HERE** > > xfs_rmap_query_range() > > xfs_currupt_helper() > > * corrupted on metadata > > try to recover data, call xfs_force_shutdown() > > * corrupted on file data > > try to recover data, call mf_dax_kill_procs() > > * normal case > > mf_generic_kill_procs() > > > > As you can see, this new added operation is an important for the whole > progress. > > I don't think you need either fs_dax_corrupted_range() nor > sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range() > looks broken because the filesystem may not even be mounted on the device > associated with the error. If filesystem is not mounted, then there won't be any process using the broken page and no one need to be killed in memory-failure. So, I think we can just return and handle the error on driver level if needed. > The holder_data and holder_op should be sufficient > from communicating the stack of notifications: > > pgmap->notify_memory_failure() => pmem_pgmap_notify_failure() > pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) => > md_dax_notify_failure() > md_dax_dev->holder_ops->notify_failure() => xfs_notify_failure() > > I.e. the entire chain just walks dax_dev holder ops. Oh, I see. Just need to implement holder_ops in filesystem or mapped_device directly. I made the routine complicated. -- Thanks, Ruan. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
diff --git a/include/linux/fs.h b/include/linux/fs.h index c3c88fdb9b2a..92af36c4225f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2176,6 +2176,8 @@ struct super_operations { struct shrink_control *); long (*free_cached_objects)(struct super_block *, struct shrink_control *); + int (*corrupted_range)(struct super_block *sb, struct block_device *bdev, + loff_t offset, size_t len, void *data); }; /*
Memory failure occurs in fsdax mode will finally be handled in filesystem. We introduce this interface to find out files or metadata affected by the corrupted range, and try to recover the corrupted data if possiable. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- include/linux/fs.h | 2 ++ 1 file changed, 2 insertions(+)