Message ID | 20220227120747.711169-8-ruansy.fnst@fujitsu.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fsdax: introduce fs query to support reflink | expand |
Hi Shiyang, Thank you for the patch! Yet something to improve: [auto build test ERROR on xfs-linux/for-next] [also build test ERROR on linux/master] [cannot apply to hnaz-mm/master linus/master v5.17-rc5 next-20220225] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Shiyang-Ruan/fsdax-introduce-fs-query-to-support-reflink/20220227-200849 base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next config: i386-randconfig-a003 (https://download.01.org/0day-ci/archive/20220227/202202272203.U7VlQY3B-lkp@intel.com/config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): # https://github.com/0day-ci/linux/commit/9f4bfbd2bae60e9f172e0b7332b2af32aa5baa87 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Shiyang-Ruan/fsdax-introduce-fs-query-to-support-reflink/20220227-200849 git checkout 9f4bfbd2bae60e9f172e0b7332b2af32aa5baa87 # save the config file to linux build tree mkdir build_dir make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): ld: fs/xfs/xfs_buf.o: in function `xfs_free_buftarg': >> fs/xfs/xfs_buf.c:1897: undefined reference to `dax_unregister_holder' ld: fs/xfs/xfs_notify_failure.o: in function `xfs_dax_notify_failure': >> fs/xfs/xfs_notify_failure.c:187: undefined reference to `dax_holder' vim +1897 fs/xfs/xfs_buf.c 1885 1886 void 1887 xfs_free_buftarg( 1888 struct xfs_buftarg *btp) 1889 { 1890 unregister_shrinker(&btp->bt_shrinker); 1891 ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0); 1892 percpu_counter_destroy(&btp->bt_io_count); 1893 list_lru_destroy(&btp->bt_lru); 1894 1895 blkdev_issue_flush(btp->bt_bdev); 1896 if (btp->bt_daxdev) > 1897 dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); 1898 fs_put_dax(btp->bt_daxdev); 1899 1900 kmem_free(btp); 1901 } 1902 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Shiyang, Thank you for the patch! Yet something to improve: [auto build test ERROR on xfs-linux/for-next] [also build test ERROR on linux/master] [cannot apply to hnaz-mm/master linus/master v5.17-rc5 next-20220225] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Shiyang-Ruan/fsdax-introduce-fs-query-to-support-reflink/20220227-200849 base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next config: hexagon-buildonly-randconfig-r005-20220227 (https://download.01.org/0day-ci/archive/20220227/202202272333.bhLvmuHF-lkp@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/9f4bfbd2bae60e9f172e0b7332b2af32aa5baa87 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Shiyang-Ruan/fsdax-introduce-fs-query-to-support-reflink/20220227-200849 git checkout 9f4bfbd2bae60e9f172e0b7332b2af32aa5baa87 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: dax_unregister_holder >>> referenced by xfs_buf.c >>> xfs/xfs_buf.o:(xfs_free_buftarg) in archive fs/built-in.a >>> referenced by xfs_buf.c >>> xfs/xfs_buf.o:(xfs_free_buftarg) in archive fs/built-in.a -- >> ld.lld: error: undefined symbol: dax_holder >>> referenced by xfs_notify_failure.c >>> xfs/xfs_notify_failure.o:(xfs_dax_notify_failure) in archive fs/built-in.a >>> referenced by xfs_notify_failure.c >>> xfs/xfs_notify_failure.o:(xfs_dax_notify_failure) in archive fs/built-in.a --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Shiyang, Thank you for the patch! Yet something to improve: [auto build test ERROR on xfs-linux/for-next] [also build test ERROR on linux/master] [cannot apply to hnaz-mm/master linus/master v5.17-rc5 next-20220225] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Shiyang-Ruan/fsdax-introduce-fs-query-to-support-reflink/20220227-200849 base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next config: riscv-randconfig-p001-20220227 (https://download.01.org/0day-ci/archive/20220227/202202272331.SP0o3f9L-lkp@intel.com/config) compiler: riscv64-linux-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/9f4bfbd2bae60e9f172e0b7332b2af32aa5baa87 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Shiyang-Ruan/fsdax-introduce-fs-query-to-support-reflink/20220227-200849 git checkout 9f4bfbd2bae60e9f172e0b7332b2af32aa5baa87 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): riscv64-linux-ld: fs/xfs/xfs_buf.o: in function `.L828': >> xfs_buf.c:(.text+0x3f7c): undefined reference to `dax_unregister_holder' riscv64-linux-ld: fs/xfs/xfs_notify_failure.o: in function `xfs_dax_notify_failure': >> xfs_notify_failure.c:(.text+0x2b0): undefined reference to `dax_holder' --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> @@ -1892,6 +1893,8 @@ xfs_free_buftarg( > list_lru_destroy(&btp->bt_lru); > > blkdev_issue_flush(btp->bt_bdev); > + if (btp->bt_daxdev) > + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); > fs_put_dax(btp->bt_daxdev); > > kmem_free(btp); > @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( > struct block_device *bdev) > { > xfs_buftarg_t *btp; > + int error; > > btp = kmem_zalloc(sizeof(*btp), KM_NOFS); > > @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( > btp->bt_dev = bdev->bd_dev; > btp->bt_bdev = bdev; > btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); > + if (btp->bt_daxdev) { > + error = dax_register_holder(btp->bt_daxdev, mp, > + &xfs_dax_holder_operations); > + if (error) { > + xfs_err(mp, "DAX device already in use?!"); > + goto error_free; > + } > + } It seems to me that just passing the holder and holder ops to fs_dax_get_by_bdev and the holder to dax_unregister_holder would significantly simply the interface here. Dan, what do you think? > +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) No real need for the IS_ENABLED. Also any reason to even build this file if the options are not set? It seems like xfs_dax_holder_operations should just be defined to NULL and the whole file not supported if we can't support the functionality. Dan: not for this series, but is there any reason not to require MEMORY_FAILURE for DAX to start with? > + > + ddev_start = mp->m_ddev_targp->bt_dax_part_off; > + ddev_end = ddev_start + > + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1; This should use bdev_nr_bytes. But didn't we say we don't want to support notifications on partitioned devices and thus don't actually need all this? > + > + /* Ignore the range out of filesystem area */ > + if ((offset + len) < ddev_start) No need for the inner braces. > + if ((offset + len) > ddev_end) No need for the braces either. > diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h > new file mode 100644 > index 000000000000..76187b9620f9 > --- /dev/null > +++ b/fs/xfs/xfs_notify_failure.h > @@ -0,0 +1,10 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2022 Fujitsu. All Rights Reserved. > + */ > +#ifndef __XFS_NOTIFY_FAILURE_H__ > +#define __XFS_NOTIFY_FAILURE_H__ > + > +extern const struct dax_holder_operations xfs_dax_holder_operations; > + > +#endif /* __XFS_NOTIFY_FAILURE_H__ */ Dowe really need a new header for this vs just sequeezing it into xfs_super.h or something like that? > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index e8f37bdc8354..b8de6ed2c888 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -353,6 +353,12 @@ xfs_setup_dax_always( > return -EINVAL; > } > > + if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) { > + xfs_alert(mp, > + "need rmapbt when both DAX and reflink enabled."); > + return -EINVAL; > + } Right now we can't even enable reflink with DAX yet, so adding this here seems premature - it should go into the patch allowing DAX+reflink.
在 2022/3/30 14:00, Christoph Hellwig 写道: >> @@ -1892,6 +1893,8 @@ xfs_free_buftarg( >> list_lru_destroy(&btp->bt_lru); >> >> blkdev_issue_flush(btp->bt_bdev); >> + if (btp->bt_daxdev) >> + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); >> fs_put_dax(btp->bt_daxdev); >> >> kmem_free(btp); >> @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( >> struct block_device *bdev) >> { >> xfs_buftarg_t *btp; >> + int error; >> >> btp = kmem_zalloc(sizeof(*btp), KM_NOFS); >> >> @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( >> btp->bt_dev = bdev->bd_dev; >> btp->bt_bdev = bdev; >> btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); >> + if (btp->bt_daxdev) { >> + error = dax_register_holder(btp->bt_daxdev, mp, >> + &xfs_dax_holder_operations); >> + if (error) { >> + xfs_err(mp, "DAX device already in use?!"); >> + goto error_free; >> + } >> + } > > It seems to me that just passing the holder and holder ops to > fs_dax_get_by_bdev and the holder to dax_unregister_holder would > significantly simply the interface here. > > Dan, what do you think? > >> +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) > > No real need for the IS_ENABLED. Also any reason to even build this > file if the options are not set? It seems like > xfs_dax_holder_operations should just be defined to NULL and the > whole file not supported if we can't support the functionality. Got it. These two CONFIG seem not related for now. So, I think I should wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add `xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile. > > Dan: not for this series, but is there any reason not to require > MEMORY_FAILURE for DAX to start with? > >> + >> + ddev_start = mp->m_ddev_targp->bt_dax_part_off; >> + ddev_end = ddev_start + >> + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1; > > This should use bdev_nr_bytes. OK. > > But didn't we say we don't want to support notifications on partitioned > devices and thus don't actually need all this? > >> + >> + /* Ignore the range out of filesystem area */ >> + if ((offset + len) < ddev_start) > > No need for the inner braces. > >> + if ((offset + len) > ddev_end) > > No need for the braces either. Really no need? It is to make sure the range to be handled won't out of the filesystem area. And make sure the @offset and @len are valid and correct after subtract the bbdev_start. > >> diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h >> new file mode 100644 >> index 000000000000..76187b9620f9 >> --- /dev/null >> +++ b/fs/xfs/xfs_notify_failure.h >> @@ -0,0 +1,10 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Copyright (c) 2022 Fujitsu. All Rights Reserved. >> + */ >> +#ifndef __XFS_NOTIFY_FAILURE_H__ >> +#define __XFS_NOTIFY_FAILURE_H__ >> + >> +extern const struct dax_holder_operations xfs_dax_holder_operations; >> + >> +#endif /* __XFS_NOTIFY_FAILURE_H__ */ > > Dowe really need a new header for this vs just sequeezing it into > xfs_super.h or something like that? Yes, I'll move it into xfs_super.h. The xfs_notify_failure.c was splitted from xfs_super.c in the old patch. There is no need to create a header file for only single line of code. > >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index e8f37bdc8354..b8de6ed2c888 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -353,6 +353,12 @@ xfs_setup_dax_always( >> return -EINVAL; >> } >> >> + if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) { >> + xfs_alert(mp, >> + "need rmapbt when both DAX and reflink enabled."); >> + return -EINVAL; >> + } > > Right now we can't even enable reflink with DAX yet, so adding this > here seems premature - it should go into the patch allowing DAX+reflink. > Yes. I'll remove it for now. -- Thanks, Ruan.
On Wed, Mar 30, 2022 at 11:16:10PM +0800, Shiyang Ruan wrote: > > > +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) > > > > No real need for the IS_ENABLED. Also any reason to even build this > > file if the options are not set? It seems like > > xfs_dax_holder_operations should just be defined to NULL and the > > whole file not supported if we can't support the functionality. > > Got it. These two CONFIG seem not related for now. So, I think I should > wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add > `xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile. I'd do ifeq ($(CONFIG_MEMORY_FAILURE),y) xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o endif in the makefile and keep it out of the actual source file entirely. > > > + > > > + /* Ignore the range out of filesystem area */ > > > + if ((offset + len) < ddev_start) > > > > No need for the inner braces. > > > > > + if ((offset + len) > ddev_end) > > > > No need for the braces either. > > Really no need? It is to make sure the range to be handled won't out of the > filesystem area. And make sure the @offset and @len are valid and correct > after subtract the bbdev_start. Yes, but there is no need for the braces per the precedence rules in C. So these could be: if (offset + len < ddev_start) and if (offset + len > ddev_end)
在 2022/3/30 14:00, Christoph Hellwig 写道: >> @@ -1892,6 +1893,8 @@ xfs_free_buftarg( >> list_lru_destroy(&btp->bt_lru); >> >> blkdev_issue_flush(btp->bt_bdev); >> + if (btp->bt_daxdev) >> + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); >> fs_put_dax(btp->bt_daxdev); >> >> kmem_free(btp); >> @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( >> struct block_device *bdev) >> { >> xfs_buftarg_t *btp; >> + int error; >> >> btp = kmem_zalloc(sizeof(*btp), KM_NOFS); >> >> @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( >> btp->bt_dev = bdev->bd_dev; >> btp->bt_bdev = bdev; >> btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); >> + if (btp->bt_daxdev) { >> + error = dax_register_holder(btp->bt_daxdev, mp, >> + &xfs_dax_holder_operations); >> + if (error) { >> + xfs_err(mp, "DAX device already in use?!"); >> + goto error_free; >> + } >> + } > > It seems to me that just passing the holder and holder ops to > fs_dax_get_by_bdev and the holder to dax_unregister_holder would > significantly simply the interface here. > > Dan, what do you think? Hi Dan, Could you give some advise on this API? Is it needed to move dax_register_holder's job into fs_dax_get_by_bdev()? -- Thanks, Ruan > >> +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) > > No real need for the IS_ENABLED. Also any reason to even build this > file if the options are not set? It seems like > xfs_dax_holder_operations should just be defined to NULL and the > whole file not supported if we can't support the functionality. > > Dan: not for this series, but is there any reason not to require > MEMORY_FAILURE for DAX to start with? > >> + >> + ddev_start = mp->m_ddev_targp->bt_dax_part_off; >> + ddev_end = ddev_start + >> + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1; > > This should use bdev_nr_bytes. > > But didn't we say we don't want to support notifications on partitioned > devices and thus don't actually need all this? > >> + >> + /* Ignore the range out of filesystem area */ >> + if ((offset + len) < ddev_start) > > No need for the inner braces. > >> + if ((offset + len) > ddev_end) > > No need for the braces either. > >> diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h >> new file mode 100644 >> index 000000000000..76187b9620f9 >> --- /dev/null >> +++ b/fs/xfs/xfs_notify_failure.h >> @@ -0,0 +1,10 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Copyright (c) 2022 Fujitsu. All Rights Reserved. >> + */ >> +#ifndef __XFS_NOTIFY_FAILURE_H__ >> +#define __XFS_NOTIFY_FAILURE_H__ >> + >> +extern const struct dax_holder_operations xfs_dax_holder_operations; >> + >> +#endif /* __XFS_NOTIFY_FAILURE_H__ */ > > Dowe really need a new header for this vs just sequeezing it into > xfs_super.h or something like that? > >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index e8f37bdc8354..b8de6ed2c888 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -353,6 +353,12 @@ xfs_setup_dax_always( >> return -EINVAL; >> } >> >> + if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) { >> + xfs_alert(mp, >> + "need rmapbt when both DAX and reflink enabled."); >> + return -EINVAL; >> + } > > Right now we can't even enable reflink with DAX yet, so adding this > here seems premature - it should go into the patch allowing DAX+reflink. >
On Tue, Mar 29, 2022 at 11:01 PM Christoph Hellwig <hch@infradead.org> wrote: > > > @@ -1892,6 +1893,8 @@ xfs_free_buftarg( > > list_lru_destroy(&btp->bt_lru); > > > > blkdev_issue_flush(btp->bt_bdev); > > + if (btp->bt_daxdev) > > + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); > > fs_put_dax(btp->bt_daxdev); > > > > kmem_free(btp); > > @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( > > struct block_device *bdev) > > { > > xfs_buftarg_t *btp; > > + int error; > > > > btp = kmem_zalloc(sizeof(*btp), KM_NOFS); > > > > @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( > > btp->bt_dev = bdev->bd_dev; > > btp->bt_bdev = bdev; > > btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); > > + if (btp->bt_daxdev) { > > + error = dax_register_holder(btp->bt_daxdev, mp, > > + &xfs_dax_holder_operations); > > + if (error) { > > + xfs_err(mp, "DAX device already in use?!"); > > + goto error_free; > > + } > > + } > > It seems to me that just passing the holder and holder ops to > fs_dax_get_by_bdev and the holder to dax_unregister_holder would > significantly simply the interface here. > > Dan, what do you think? Yes, makes sense, just like the optional holder arguments to blkdev_get_by_*(). > > > +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) > > No real need for the IS_ENABLED. Also any reason to even build this > file if the options are not set? It seems like > xfs_dax_holder_operations should just be defined to NULL and the > whole file not supported if we can't support the functionality. > > Dan: not for this series, but is there any reason not to require > MEMORY_FAILURE for DAX to start with? Given that DAX ties some storage semantics to memory and storage supports EIO I can see an argument to require memory_failure() for DAX, and especially for DAX on CXL where hotplug is supported it will be necessary. Linux currently has no facility to consult PCI drivers about removal actions, so the only recourse for a force removed CXL device is mass memory_failure(). > > > + > > + ddev_start = mp->m_ddev_targp->bt_dax_part_off; > > + ddev_end = ddev_start + > > + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1; > > This should use bdev_nr_bytes. > > But didn't we say we don't want to support notifications on partitioned > devices and thus don't actually need all this? Right.
On Thu, Apr 7, 2022 at 11:05 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote: > > > > 在 2022/3/30 14:00, Christoph Hellwig 写道: > >> @@ -1892,6 +1893,8 @@ xfs_free_buftarg( > >> list_lru_destroy(&btp->bt_lru); > >> > >> blkdev_issue_flush(btp->bt_bdev); > >> + if (btp->bt_daxdev) > >> + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); > >> fs_put_dax(btp->bt_daxdev); > >> > >> kmem_free(btp); > >> @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( > >> struct block_device *bdev) > >> { > >> xfs_buftarg_t *btp; > >> + int error; > >> > >> btp = kmem_zalloc(sizeof(*btp), KM_NOFS); > >> > >> @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( > >> btp->bt_dev = bdev->bd_dev; > >> btp->bt_bdev = bdev; > >> btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); > >> + if (btp->bt_daxdev) { > >> + error = dax_register_holder(btp->bt_daxdev, mp, > >> + &xfs_dax_holder_operations); > >> + if (error) { > >> + xfs_err(mp, "DAX device already in use?!"); > >> + goto error_free; > >> + } > >> + } > > > > It seems to me that just passing the holder and holder ops to > > fs_dax_get_by_bdev and the holder to dax_unregister_holder would > > significantly simply the interface here. > > > > Dan, what do you think? > > Hi Dan, > > Could you give some advise on this API? Is it needed to move > dax_register_holder's job into fs_dax_get_by_bdev()? Yes, works for me to just add them as optional arguments.
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 04611a1068b4..389970b3e13b 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -84,6 +84,7 @@ xfs-y += xfs_aops.o \ xfs_message.o \ xfs_mount.o \ xfs_mru_cache.o \ + xfs_notify_failure.o \ xfs_pwork.o \ xfs_reflink.o \ xfs_stats.o \ diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index b45e0d50a405..5f4984a5e108 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -19,6 +19,7 @@ #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_ag.h" +#include "xfs_notify_failure.h" static struct kmem_cache *xfs_buf_cache; @@ -1892,6 +1893,8 @@ xfs_free_buftarg( list_lru_destroy(&btp->bt_lru); blkdev_issue_flush(btp->bt_bdev); + if (btp->bt_daxdev) + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); fs_put_dax(btp->bt_daxdev); kmem_free(btp); @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( struct block_device *bdev) { xfs_buftarg_t *btp; + int error; btp = kmem_zalloc(sizeof(*btp), KM_NOFS); @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( btp->bt_dev = bdev->bd_dev; btp->bt_bdev = bdev; btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); + if (btp->bt_daxdev) { + error = dax_register_holder(btp->bt_daxdev, mp, + &xfs_dax_holder_operations); + if (error) { + xfs_err(mp, "DAX device already in use?!"); + goto error_free; + } + } /* * Buffer IO error rate limiting. Limit it to no more than 10 messages diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 33e26690a8c4..d4d36c5bef11 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -542,6 +542,9 @@ xfs_do_force_shutdown( } else if (flags & SHUTDOWN_CORRUPT_INCORE) { tag = XFS_PTAG_SHUTDOWN_CORRUPT; why = "Corruption of in-memory data"; + } else if (flags & SHUTDOWN_CORRUPT_ONDISK) { + tag = XFS_PTAG_SHUTDOWN_CORRUPT; + why = "Corruption of on-disk metadata"; } else { tag = XFS_PTAG_SHUTDOWN_IOERROR; why = "Metadata I/O Error"; diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 00720a02e761..47ff4ac53c4c 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -435,6 +435,7 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, int flags, char *fname, #define SHUTDOWN_LOG_IO_ERROR 0x0002 /* write attempt to the log failed */ #define SHUTDOWN_FORCE_UMOUNT 0x0004 /* shutdown from a forced unmount */ #define SHUTDOWN_CORRUPT_INCORE 0x0008 /* corrupt in-memory data structures */ +#define SHUTDOWN_CORRUPT_ONDISK 0x0010 /* corrupt metadata on device */ #define XFS_SHUTDOWN_STRINGS \ { SHUTDOWN_META_IO_ERROR, "metadata_io" }, \ diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c new file mode 100644 index 000000000000..b057e9f7eb31 --- /dev/null +++ b/fs/xfs/xfs_notify_failure.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022 Fujitsu. All Rights Reserved. + */ + +#include "xfs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_alloc.h" +#include "xfs_bit.h" +#include "xfs_btree.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtalloc.h" +#include "xfs_trans.h" + +#include <linux/mm.h> +#include <linux/dax.h> + +struct failure_info { + xfs_agblock_t startblock; + xfs_extlen_t blockcount; + int mf_flags; +}; + +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) +static pgoff_t +xfs_failure_pgoff( + struct xfs_mount *mp, + const struct xfs_rmap_irec *rec, + const struct failure_info *notify) +{ + uint64_t pos = rec->rm_offset; + + if (notify->startblock > rec->rm_startblock) + pos += XFS_FSB_TO_B(mp, + notify->startblock - rec->rm_startblock); + return pos >> PAGE_SHIFT; +} + +static unsigned long +xfs_failure_pgcnt( + struct xfs_mount *mp, + const struct xfs_rmap_irec *rec, + const struct failure_info *notify) +{ + xfs_agblock_t end_rec; + xfs_agblock_t end_notify; + xfs_agblock_t start_cross; + xfs_agblock_t end_cross; + + start_cross = max(rec->rm_startblock, notify->startblock); + + end_rec = rec->rm_startblock + rec->rm_blockcount; + end_notify = notify->startblock + notify->blockcount; + end_cross = min(end_rec, end_notify); + + return XFS_FSB_TO_B(mp, end_cross - start_cross) >> PAGE_SHIFT; +} + +static int +xfs_dax_failure_fn( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *data) +{ + struct xfs_mount *mp = cur->bc_mp; + struct xfs_inode *ip; + struct failure_info *notify = data; + int error = 0; + + if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) || + (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); + return -EFSCORRUPTED; + } + + /* Get files that incore, filter out others that are not in use. */ + error = xfs_iget(mp, cur->bc_tp, rec->rm_owner, XFS_IGET_INCORE, + 0, &ip); + /* Continue the rmap query if the inode isn't incore */ + if (error == -ENODATA) + return 0; + if (error) + return error; + + error = mf_dax_kill_procs(VFS_I(ip)->i_mapping, + xfs_failure_pgoff(mp, rec, notify), + xfs_failure_pgcnt(mp, rec, notify), + notify->mf_flags); + xfs_irele(ip); + return error; +} +#else +static int +xfs_dax_failure_fn( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *data) +{ + struct xfs_mount *mp = cur->bc_mp; + + /* No other option besides shutting down the fs. */ + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); + return -EFSCORRUPTED; +} +#endif /* CONFIG_MEMORY_FAILURE && CONFIG_FS_DAX */ + +static int +xfs_dax_notify_ddev_failure( + struct xfs_mount *mp, + xfs_daddr_t daddr, + xfs_daddr_t bblen, + int mf_flags) +{ + struct xfs_trans *tp = NULL; + struct xfs_btree_cur *cur = NULL; + struct xfs_buf *agf_bp = NULL; + int error = 0; + xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr); + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); + xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen); + xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno); + + error = xfs_trans_alloc_empty(mp, &tp); + if (error) + return error; + + for (; agno <= end_agno; agno++) { + struct xfs_rmap_irec ri_low = { }; + struct xfs_rmap_irec ri_high; + struct failure_info notify; + struct xfs_agf *agf; + xfs_agblock_t agend; + + error = xfs_alloc_read_agf(mp, tp, agno, 0, &agf_bp); + if (error) + break; + + cur = xfs_rmapbt_init_cursor(mp, tp, agf_bp, agf_bp->b_pag); + + /* + * Set the rmap range from ri_low to ri_high, which represents + * a [start, end] where we looking for the files or metadata. + * The part of range out of a AG will be ignored. So, it's fine + * to set ri_low to "startblock" in all loops. When it reaches + * the last AG, set the ri_high to "endblock" to make sure we + * actually end at the end. + */ + memset(&ri_high, 0xFF, sizeof(ri_high)); + ri_low.rm_startblock = XFS_FSB_TO_AGBNO(mp, fsbno); + if (agno == end_agno) + ri_high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsbno); + + agf = agf_bp->b_addr; + agend = min(be32_to_cpu(agf->agf_length), + ri_high.rm_startblock); + notify.startblock = ri_low.rm_startblock; + notify.blockcount = agend - ri_low.rm_startblock; + + error = xfs_rmap_query_range(cur, &ri_low, &ri_high, + xfs_dax_failure_fn, ¬ify); + xfs_btree_del_cursor(cur, error); + xfs_trans_brelse(tp, agf_bp); + if (error) + break; + + fsbno = XFS_AGB_TO_FSB(mp, agno + 1, 0); + } + + xfs_trans_cancel(tp); + return error; +} + +static int +xfs_dax_notify_failure( + struct dax_device *dax_dev, + u64 offset, + u64 len, + int mf_flags) +{ + struct xfs_mount *mp = dax_holder(dax_dev); + u64 ddev_start; + u64 ddev_end; + + if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) { + xfs_warn(mp, + "notify_failure() not supported on realtime device!"); + return -EOPNOTSUPP; + } + + if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev && + mp->m_logdev_targp != mp->m_ddev_targp) { + xfs_err(mp, "ondisk log corrupt, shutting down fs!"); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); + return -EFSCORRUPTED; + } + + if (!xfs_has_rmapbt(mp)) { + xfs_warn(mp, "notify_failure() needs rmapbt enabled!"); + return -EOPNOTSUPP; + } + + ddev_start = mp->m_ddev_targp->bt_dax_part_off; + ddev_end = ddev_start + + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1; + + /* Ignore the range out of filesystem area */ + if ((offset + len) < ddev_start) + return -ENXIO; + if (offset > ddev_end) + return -ENXIO; + + /* Calculate the real range when it touches the boundary */ + if (offset > ddev_start) + offset -= ddev_start; + else { + len -= ddev_start - offset; + offset = 0; + } + if ((offset + len) > ddev_end) + len -= ddev_end - offset; + + return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len), + mf_flags); +} + +const struct dax_holder_operations xfs_dax_holder_operations = { + .notify_failure = xfs_dax_notify_failure, +}; diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h new file mode 100644 index 000000000000..76187b9620f9 --- /dev/null +++ b/fs/xfs/xfs_notify_failure.h @@ -0,0 +1,10 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022 Fujitsu. All Rights Reserved. + */ +#ifndef __XFS_NOTIFY_FAILURE_H__ +#define __XFS_NOTIFY_FAILURE_H__ + +extern const struct dax_holder_operations xfs_dax_holder_operations; + +#endif /* __XFS_NOTIFY_FAILURE_H__ */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e8f37bdc8354..b8de6ed2c888 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -353,6 +353,12 @@ xfs_setup_dax_always( return -EINVAL; } + if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) { + xfs_alert(mp, + "need rmapbt when both DAX and reflink enabled."); + return -EINVAL; + } + xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"); return 0;
Introduce xfs_notify_failure.c to handle failure related works, such as implement ->notify_failure(), register/unregister dax holder in xfs, and so on. If the rmap feature of XFS enabled, we can query it to find files and metadata which are associated with the corrupt data. For now all we do is kill processes with that file mapped into their address spaces, but future patches could actually do something about corrupt metadata. After that, the memory failure needs to notify the processes who are using those files. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- fs/xfs/Makefile | 1 + fs/xfs/xfs_buf.c | 12 ++ fs/xfs/xfs_fsops.c | 3 + fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_notify_failure.c | 235 ++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_notify_failure.h | 10 ++ fs/xfs/xfs_super.c | 6 + 7 files changed, 268 insertions(+) create mode 100644 fs/xfs/xfs_notify_failure.c create mode 100644 fs/xfs/xfs_notify_failure.h