diff mbox series

fuse: fix stale data issue of virtiofs with dax

Message ID 20231105063608.68114-1-obuil.liubo@gmail.com (mailing list archive)
State New, archived
Headers show
Series fuse: fix stale data issue of virtiofs with dax | expand

Commit Message

Liu Bo Nov. 5, 2023, 6:36 a.m. UTC
From: Liu Bo <bo.liu@linux.alibaba.com>

In open(O_TRUNC), it doesn't cleanup the corresponding dax mapping of
an inode, which could expose stale data to users, which can be easily
reproduced by accessing overlayfs through virtiofs with dax.

The reproducer is,
0. mkdir /home/test/lower /home/test/upper /home/test/work
1. echo "aaaaaa" > /home/test/lower/foobar
2. sudo mount -t overlay overlay -olowerdir=/home/test/lower,upperdir=/home/test/upper,workdir=/home/test/work /mnt/ovl
3. start a guest VM configured with virtiofs passthrough(_dax_ enabled and cache policy is always)
   and use /mnt/ovl as the source directory.
4. ssh into the guest VM
5. mount virtiofs onto /mnt/test
5. cat /mnt/test/foobar
aaaaaa
6. echo "xxxx" > /mnt/test/foobar
7. cat /mnt/test/foobar

w/o this patch, step 7 shows "aaaa" rather than "xxxx".

Step 5 reads the content of file 'foobar' by setting up a dax mapping, and the
mapping eventually maps /home/test/lower/foobar because there is no copy up
at this moment.

In step 6, 'foobar' is opened with O_TRUNC and the viriofs daemon on host side
triggers a copy-up, "xxxx" is eventually written to /home/test/upper/foobar.
Since 'foobar' is truncated to size 0 when writting, writes go with non-dax io
path and update the file size in guest VM accordingly.

However, dax mapping still refers to the /home/test/lower/foobar, so what step
7 reads is /home/test/lower/foobar but with the new size, which shows "aaaa"
rather "xxxx".

Reported-by: Cameron <cameron@northflank.com>
Tested-by: Cameron <cameron@northflank.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/dir.c    | 5 +++++
 fs/fuse/fuse_i.h | 1 +
 3 files changed, 15 insertions(+)

Comments

kernel test robot Nov. 5, 2023, 12:33 p.m. UTC | #1
Hi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mszeredi-fuse/for-next]
[also build test WARNING on linus/master v6.6 next-20231103]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/obuil-liubo-gmail-com/fuse-fix-stale-data-issue-of-virtiofs-with-dax/20231105-143819
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
patch link:    https://lore.kernel.org/r/20231105063608.68114-1-obuil.liubo%40gmail.com
patch subject: [PATCH] fuse: fix stale data issue of virtiofs with dax
config: s390-defconfig (https://download.01.org/0day-ci/archive/20231105/202311052017.XxUVgk2W-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231105/202311052017.XxUVgk2W-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311052017.XxUVgk2W-lkp@intel.com/

All warnings (new ones prefixed by >>):

   fs/fuse/dax.c: In function 'fuse_dax_inode_cleanup_range':
>> fs/fuse/dax.c:392:28: warning: unused variable 'fi' [-Wunused-variable]
     392 |         struct fuse_inode *fi = get_fuse_inode(inode);
         |                            ^~


vim +/fi +392 fs/fuse/dax.c

   387	
   388	/* Callers need to make sure fi->i_mmap_sem is held. */
   389	void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start)
   390	{
   391		struct fuse_conn *fc = get_fuse_conn(inode);
 > 392		struct fuse_inode *fi = get_fuse_inode(inode);
   393	
   394		inode_reclaim_dmap_range(fc->dax, inode, start, -1);
   395	}
   396
Miklos Szeredi Nov. 13, 2023, 2:45 p.m. UTC | #2
[adding Vivek Goyal and virtio-fs list]

On Sun, 5 Nov 2023 at 07:36, <obuil.liubo@gmail.com> wrote:
>
> From: Liu Bo <bo.liu@linux.alibaba.com>
>
> In open(O_TRUNC), it doesn't cleanup the corresponding dax mapping of
> an inode, which could expose stale data to users, which can be easily
> reproduced by accessing overlayfs through virtiofs with dax.
>
> The reproducer is,
> 0. mkdir /home/test/lower /home/test/upper /home/test/work
> 1. echo "aaaaaa" > /home/test/lower/foobar
> 2. sudo mount -t overlay overlay -olowerdir=/home/test/lower,upperdir=/home/test/upper,workdir=/home/test/work /mnt/ovl
> 3. start a guest VM configured with virtiofs passthrough(_dax_ enabled and cache policy is always)
>    and use /mnt/ovl as the source directory.
> 4. ssh into the guest VM
> 5. mount virtiofs onto /mnt/test
> 5. cat /mnt/test/foobar
> aaaaaa
> 6. echo "xxxx" > /mnt/test/foobar
> 7. cat /mnt/test/foobar
>
> w/o this patch, step 7 shows "aaaa" rather than "xxxx".

This is generally unsupported.  See Documentation/filesystems/overlayfs.rst:

"Changes to underlying filesystems
---------------------------------

Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed.  If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock.

..."

Thanks,
Miklos


>
> Step 5 reads the content of file 'foobar' by setting up a dax mapping, and the
> mapping eventually maps /home/test/lower/foobar because there is no copy up
> at this moment.
>
> In step 6, 'foobar' is opened with O_TRUNC and the viriofs daemon on host side
> triggers a copy-up, "xxxx" is eventually written to /home/test/upper/foobar.
> Since 'foobar' is truncated to size 0 when writting, writes go with non-dax io
> path and update the file size in guest VM accordingly.
>
> However, dax mapping still refers to the /home/test/lower/foobar, so what step
> 7 reads is /home/test/lower/foobar but with the new size, which shows "aaaa"
> rather "xxxx".
>
> Reported-by: Cameron <cameron@northflank.com>
> Tested-by: Cameron <cameron@northflank.com>
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/dir.c    | 5 +++++
>  fs/fuse/fuse_i.h | 1 +
>  3 files changed, 15 insertions(+)
>
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 8e74f278a3f6..d7f9ec7f4597 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -385,6 +385,15 @@ void fuse_dax_inode_cleanup(struct inode *inode)
>         WARN_ON(fi->dax->nr);
>  }
>
> +/* Callers need to make sure fi->i_mmap_sem is held. */
> +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start)
> +{
> +       struct fuse_conn *fc = get_fuse_conn(inode);
> +       struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +       inode_reclaim_dmap_range(fc->dax, inode, start, -1);
> +}
> +
>  static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length)
>  {
>         iomap->addr = IOMAP_NULL_ADDR;
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index f67bef9d83c4..be7892e052b5 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1788,6 +1788,9 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
>                          */
>                         i_size_write(inode, 0);
>                         truncate_pagecache(inode, 0);
> +                       if (fault_blocked && FUSE_IS_DAX(inode))
> +                               fuse_dax_inode_cleanup(inode);
> +
>                         goto out;
>                 }
>                 file = NULL;
> @@ -1883,6 +1886,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
>             S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) {
>                 truncate_pagecache(inode, outarg.attr.size);
>                 invalidate_inode_pages2(mapping);
> +               if (fault_blocked && FUSE_IS_DAX(inode))
> +                       fuse_dax_inode_cleanup_range(inode, outarg.attr.size);
>         }
>
>         clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 9b7fc7d3c7f1..02fa7aa2bd56 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1305,6 +1305,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start);
>  void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
> --
> 2.32.1 (Apple Git-133)
>
diff mbox series

Patch

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 8e74f278a3f6..d7f9ec7f4597 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -385,6 +385,15 @@  void fuse_dax_inode_cleanup(struct inode *inode)
 	WARN_ON(fi->dax->nr);
 }
 
+/* Callers need to make sure fi->i_mmap_sem is held. */
+void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	inode_reclaim_dmap_range(fc->dax, inode, start, -1);
+}
+
 static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length)
 {
 	iomap->addr = IOMAP_NULL_ADDR;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f67bef9d83c4..be7892e052b5 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1788,6 +1788,9 @@  int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 			 */
 			i_size_write(inode, 0);
 			truncate_pagecache(inode, 0);
+			if (fault_blocked && FUSE_IS_DAX(inode))
+				fuse_dax_inode_cleanup(inode);
+
 			goto out;
 		}
 		file = NULL;
@@ -1883,6 +1886,8 @@  int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 	    S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) {
 		truncate_pagecache(inode, outarg.attr.size);
 		invalidate_inode_pages2(mapping);
+		if (fault_blocked && FUSE_IS_DAX(inode))
+			fuse_dax_inode_cleanup_range(inode, outarg.attr.size);
 	}
 
 	clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9b7fc7d3c7f1..02fa7aa2bd56 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1305,6 +1305,7 @@  void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start);
 void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);