Message ID | 20210630134449.16851-1-lhenriques@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v10] vfs: fix copy_file_range regression in cross-fs copies | expand |
On Wed, Jun 30, 2021 at 4:44 PM Luis Henriques <lhenriques@suse.de> wrote: > > A regression has been reported by Nicolas Boichat, found while using the > copy_file_range syscall to copy a tracefs file. Before commit > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the > kernel would return -EXDEV to userspace when trying to copy a file across > different filesystems. After this commit, the syscall doesn't fail anymore > and instead returns zero (zero bytes copied), as this file's content is > generated on-the-fly and thus reports a size of zero. > > This patch restores some cross-filesystem copy restrictions that existed > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across > devices"). Filesystems are still allowed to fall-back to the VFS > generic_copy_file_range() implementation, but that has now to be done > explicitly. > > nfsd is also modified to fall-back into generic_copy_file_range() in case > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV. > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ > Reported-by: Nicolas Boichat <drinkcat@chromium.org> > Reported-by: kernel test robot <oliver.sang@intel.com> > Signed-off-by: Luis Henriques <lhenriques@suse.de> > --- > Changes since v9 > - the early return from the syscall when len is zero now checks if the > filesystem is implemented, returning -EOPNOTSUPP if it is not and 0 > otherwise. Issue reported by test robot. What issue was reported? > (obviously, dropped Amir's Reviewed-by and Olga's Tested-by tags) > Changes since v8 > - Simply added Amir's Reviewed-by and Olga's Tested-by > Changes since v7 > - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the > error returned is always related to the 'copy' operation > Changes since v6 > - restored i_sb checks for the clone operation > Changes since v5 > - check if ->copy_file_range is NULL before calling it > Changes since v4 > - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP > or -EXDEV. > Changes since v3 > - dropped the COPY_FILE_SPLICE flag > - kept the f_op's checks early in generic_copy_file_checks, implementing > Amir's suggestions > - modified nfsd to use generic_copy_file_range() > Changes since v2 > - do all the required checks earlier, in generic_copy_file_checks(), > adding new checks for ->remap_file_range > - new COPY_FILE_SPLICE flag > - don't remove filesystem's fallback to generic_copy_file_range() > - updated commit changelog (and subject) > Changes since v1 (after Amir review) > - restored do_copy_file_range() helper > - return -EOPNOTSUPP if fs doesn't implement CFR > - updated commit description > > fs/nfsd/vfs.c | 8 +++++++- > fs/read_write.c | 51 ++++++++++++++++++++++++------------------------- > 2 files changed, 32 insertions(+), 27 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 15adf1f6ab21..f54a88b3b4a2 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -569,6 +569,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, > ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst, > u64 dst_pos, u64 count) > { > + ssize_t ret; > > /* > * Limit copy to 4MB to prevent indefinitely blocking an nfsd > @@ -579,7 +580,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst, > * limit like this and pipeline multiple COPY requests. > */ > count = min_t(u64, count, 1 << 22); > - return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0); > + ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0); > + > + if (ret == -EOPNOTSUPP || ret == -EXDEV) > + ret = generic_copy_file_range(src, src_pos, dst, dst_pos, > + count, 0); > + return ret; > } > > __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp, > diff --git a/fs/read_write.c b/fs/read_write.c > index 9db7adf160d2..7ad07063c551 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1395,28 +1395,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, > } > EXPORT_SYMBOL(generic_copy_file_range); > > -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in, > - struct file *file_out, loff_t pos_out, > - size_t len, unsigned int flags) > -{ > - /* > - * Although we now allow filesystems to handle cross sb copy, passing > - * a file of the wrong filesystem type to filesystem driver can result > - * in an attempt to dereference the wrong type of ->private_data, so > - * avoid doing that until we really have a good reason. NFS defines > - * several different file_system_type structures, but they all end up > - * using the same ->copy_file_range() function pointer. > - */ > - if (file_out->f_op->copy_file_range && > - file_out->f_op->copy_file_range == file_in->f_op->copy_file_range) > - return file_out->f_op->copy_file_range(file_in, pos_in, > - file_out, pos_out, > - len, flags); > - > - return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len, > - flags); > -} > - > /* > * Performs necessary checks before doing a file copy > * > @@ -1434,6 +1412,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, > loff_t size_in; > int ret; > > + /* > + * Although we now allow filesystems to handle cross sb copy, passing > + * a file of the wrong filesystem type to filesystem driver can result > + * in an attempt to dereference the wrong type of ->private_data, so > + * avoid doing that until we really have a good reason. NFS defines > + * several different file_system_type structures, but they all end up > + * using the same ->copy_file_range() function pointer. > + */ > + if (file_out->f_op->copy_file_range) { > + if (file_in->f_op->copy_file_range != > + file_out->f_op->copy_file_range) > + return -EXDEV; > + } else if (file_in->f_op->remap_file_range) { > + if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb) > + return -EXDEV; > + } else { > + return -EOPNOTSUPP; > + } > + > ret = generic_file_rw_checks(file_in, file_out); > if (ret) > return ret; > @@ -1498,10 +1495,11 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > return ret; > > if (len == 0) > - return 0; > + return file_out->f_op->copy_file_range ? 0 : -EOPNOTSUPP; What is this supposed to do? Please add a comment. It seems to me that following the checks in generic_copy_file_checks() this can only return -EOPNOTSUPP if (!file_out->f_op->copy_file_range && file_in->f_op->remap_file_range) Was that really the intention? Why? Thanks, Amir.
On Wed, Jun 30, 2021 at 05:56:34PM +0300, Amir Goldstein wrote: > On Wed, Jun 30, 2021 at 4:44 PM Luis Henriques <lhenriques@suse.de> wrote: > > > > A regression has been reported by Nicolas Boichat, found while using the > > copy_file_range syscall to copy a tracefs file. Before commit > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the > > kernel would return -EXDEV to userspace when trying to copy a file across > > different filesystems. After this commit, the syscall doesn't fail anymore > > and instead returns zero (zero bytes copied), as this file's content is > > generated on-the-fly and thus reports a size of zero. > > > > This patch restores some cross-filesystem copy restrictions that existed > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across > > devices"). Filesystems are still allowed to fall-back to the VFS > > generic_copy_file_range() implementation, but that has now to be done > > explicitly. > > > > nfsd is also modified to fall-back into generic_copy_file_range() in case > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV. > > > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ > > Reported-by: Nicolas Boichat <drinkcat@chromium.org> > > Reported-by: kernel test robot <oliver.sang@intel.com> > > Signed-off-by: Luis Henriques <lhenriques@suse.de> > > --- > > Changes since v9 > > - the early return from the syscall when len is zero now checks if the > > filesystem is implemented, returning -EOPNOTSUPP if it is not and 0 > > otherwise. Issue reported by test robot. > > What issue was reported? Here's the link to my previous email: https://lore.kernel.org/linux-fsdevel/877dk1zibo.fsf@suse.de/ ... which reminds me that I need to also send a patch to fix the fstest. (Although the test as-is actually allowed to find this bug...) Cheers, -- Luís
On Wed, Jun 30, 2021 at 6:06 PM Luis Henriques <lhenriques@suse.de> wrote: > > On Wed, Jun 30, 2021 at 05:56:34PM +0300, Amir Goldstein wrote: > > On Wed, Jun 30, 2021 at 4:44 PM Luis Henriques <lhenriques@suse.de> wrote: > > > > > > A regression has been reported by Nicolas Boichat, found while using the > > > copy_file_range syscall to copy a tracefs file. Before commit > > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the > > > kernel would return -EXDEV to userspace when trying to copy a file across > > > different filesystems. After this commit, the syscall doesn't fail anymore > > > and instead returns zero (zero bytes copied), as this file's content is > > > generated on-the-fly and thus reports a size of zero. > > > > > > This patch restores some cross-filesystem copy restrictions that existed > > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across > > > devices"). Filesystems are still allowed to fall-back to the VFS > > > generic_copy_file_range() implementation, but that has now to be done > > > explicitly. > > > > > > nfsd is also modified to fall-back into generic_copy_file_range() in case > > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV. > > > > > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") > > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ > > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ > > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ > > > Reported-by: Nicolas Boichat <drinkcat@chromium.org> > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > > Signed-off-by: Luis Henriques <lhenriques@suse.de> > > > --- > > > Changes since v9 > > > - the early return from the syscall when len is zero now checks if the > > > filesystem is implemented, returning -EOPNOTSUPP if it is not and 0 > > > otherwise. Issue reported by test robot. > > > > What issue was reported? > > Here's the link to my previous email: > > https://lore.kernel.org/linux-fsdevel/877dk1zibo.fsf@suse.de/ > Sorry, I missed it. I guess the subject was not aluring enough ;-) So your patch does not fix the root cause. The solution is to remove the (len == 0) short-circuit as you first suggested. The problem is this: A program tries to check for CFR support by calling CFR with zero length. The XFS filesystem driver (in the test robot report) supports CFR via the remap_file_range() method in general, but not on the particular filesystem instance that was formatted without reflink support. The intention of the program was to test for CFR support on the particular filesystem instance, so the short-circuit response is wrong. Note that vfs_clone_file_range() does NOT short circuit (len == 0). That is (allegedly) because it needs to call into the filesystem method to know if the filesystem instance supports clone_file_range. The reason that your patch is wrong is because the same situation can happen with a filesystem driver that has a copy_file_range() method, but a particular instance does not support copy_file_range(). For example, overlayfs has an ovl_copy_file_range() method, so it would short circuit zero CFR, but if in a particular overlayfs, the upper fs does not support CFR, then the overlayfs instance does not support CFR either. > ... which reminds me that I need to also send a patch to fix the fstest. > (Although the test as-is actually allowed to find this bug...) > Not sure why you'd want to fix the test. The test check with a zero length file seems valid to me. Thanks, Amir.
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 15adf1f6ab21..f54a88b3b4a2 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -569,6 +569,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst, u64 dst_pos, u64 count) { + ssize_t ret; /* * Limit copy to 4MB to prevent indefinitely blocking an nfsd @@ -579,7 +580,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst, * limit like this and pipeline multiple COPY requests. */ count = min_t(u64, count, 1 << 22); - return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0); + ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0); + + if (ret == -EOPNOTSUPP || ret == -EXDEV) + ret = generic_copy_file_range(src, src_pos, dst, dst_pos, + count, 0); + return ret; } __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp, diff --git a/fs/read_write.c b/fs/read_write.c index 9db7adf160d2..7ad07063c551 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1395,28 +1395,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, } EXPORT_SYMBOL(generic_copy_file_range); -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - size_t len, unsigned int flags) -{ - /* - * Although we now allow filesystems to handle cross sb copy, passing - * a file of the wrong filesystem type to filesystem driver can result - * in an attempt to dereference the wrong type of ->private_data, so - * avoid doing that until we really have a good reason. NFS defines - * several different file_system_type structures, but they all end up - * using the same ->copy_file_range() function pointer. - */ - if (file_out->f_op->copy_file_range && - file_out->f_op->copy_file_range == file_in->f_op->copy_file_range) - return file_out->f_op->copy_file_range(file_in, pos_in, - file_out, pos_out, - len, flags); - - return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len, - flags); -} - /* * Performs necessary checks before doing a file copy * @@ -1434,6 +1412,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, loff_t size_in; int ret; + /* + * Although we now allow filesystems to handle cross sb copy, passing + * a file of the wrong filesystem type to filesystem driver can result + * in an attempt to dereference the wrong type of ->private_data, so + * avoid doing that until we really have a good reason. NFS defines + * several different file_system_type structures, but they all end up + * using the same ->copy_file_range() function pointer. + */ + if (file_out->f_op->copy_file_range) { + if (file_in->f_op->copy_file_range != + file_out->f_op->copy_file_range) + return -EXDEV; + } else if (file_in->f_op->remap_file_range) { + if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb) + return -EXDEV; + } else { + return -EOPNOTSUPP; + } + ret = generic_file_rw_checks(file_in, file_out); if (ret) return ret; @@ -1498,10 +1495,11 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, return ret; if (len == 0) - return 0; + return file_out->f_op->copy_file_range ? 0 : -EOPNOTSUPP; file_start_write(file_out); + ret = -EOPNOTSUPP; /* * Try cloning first, this is supported by more file systems, and * more efficient if both clone and copy are supported (e.g. NFS). @@ -1520,9 +1518,10 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, } } - ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len, - flags); - WARN_ON_ONCE(ret == -EOPNOTSUPP); + if (file_out->f_op->copy_file_range) + ret = file_out->f_op->copy_file_range(file_in, pos_in, + file_out, pos_out, + len, flags); done: if (ret > 0) { fsnotify_access(file_in);
A regression has been reported by Nicolas Boichat, found while using the copy_file_range syscall to copy a tracefs file. Before commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the kernel would return -EXDEV to userspace when trying to copy a file across different filesystems. After this commit, the syscall doesn't fail anymore and instead returns zero (zero bytes copied), as this file's content is generated on-the-fly and thus reports a size of zero. This patch restores some cross-filesystem copy restrictions that existed prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices"). Filesystems are still allowed to fall-back to the VFS generic_copy_file_range() implementation, but that has now to be done explicitly. nfsd is also modified to fall-back into generic_copy_file_range() in case vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV. Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ Reported-by: Nicolas Boichat <drinkcat@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Luis Henriques <lhenriques@suse.de> --- Changes since v9 - the early return from the syscall when len is zero now checks if the filesystem is implemented, returning -EOPNOTSUPP if it is not and 0 otherwise. Issue reported by test robot. (obviously, dropped Amir's Reviewed-by and Olga's Tested-by tags) Changes since v8 - Simply added Amir's Reviewed-by and Olga's Tested-by Changes since v7 - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the error returned is always related to the 'copy' operation Changes since v6 - restored i_sb checks for the clone operation Changes since v5 - check if ->copy_file_range is NULL before calling it Changes since v4 - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP or -EXDEV. Changes since v3 - dropped the COPY_FILE_SPLICE flag - kept the f_op's checks early in generic_copy_file_checks, implementing Amir's suggestions - modified nfsd to use generic_copy_file_range() Changes since v2 - do all the required checks earlier, in generic_copy_file_checks(), adding new checks for ->remap_file_range - new COPY_FILE_SPLICE flag - don't remove filesystem's fallback to generic_copy_file_range() - updated commit changelog (and subject) Changes since v1 (after Amir review) - restored do_copy_file_range() helper - return -EOPNOTSUPP if fs doesn't implement CFR - updated commit description fs/nfsd/vfs.c | 8 +++++++- fs/read_write.c | 51 ++++++++++++++++++++++++------------------------- 2 files changed, 32 insertions(+), 27 deletions(-)