Message ID | 20181026201057.36899-4-olga.kornievskaia@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | client-side support for "inter" SSC copy | expand |
On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > +++ b/fs/read_write.c > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > } > } > > + if (pos_in >= i_size_read(inode_in)) > + return -EINVAL; > + > if (file_out->f_op->copy_file_range) { > ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, > pos_out, len, flags); Is this the right place to check this? Surely we should be checking for this before calling clone_file_range()?
On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > From: Olga Kornievskaia <kolga@netapp.com> > > Input source offset can't be beyond the end of the file. > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- > fs/read_write.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/read_write.c b/fs/read_write.c > index fb4ffca..b3b304e 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > } > } > > + if (pos_in >= i_size_read(inode_in)) > + return -EINVAL; > + vfs_copy_file_range seems ot be missing a wide range of checks. rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the checks in generic_write_checks() apply, right? And the same security issues like stripping setuid bits, etc? And we need to touch atime on the source file, too? We've just merged 5 or so patches in 4.19-rc8 and we're ready to merge another ~30 patch series to fix all the stuff missing from the clone/dedupe file range operations that make them safe and robust. It seems like copy_file_range is all the checks it needs, too? Cheers, Dave.
On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > > > Input source offset can't be beyond the end of the file. > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > fs/read_write.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > index fb4ffca..b3b304e 100644 > > --- a/fs/read_write.c > > +++ b/fs/read_write.c > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > } > > } > > > > + if (pos_in >= i_size_read(inode_in)) > > + return -EINVAL; > > + > > vfs_copy_file_range seems ot be missing a wide range of checks. > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > checks in generic_write_checks() apply, right? And the same security > issues like stripping setuid bits, etc? And we need to touch > atime on the source file, too? Yes sound like needed checks. > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > merge another ~30 patch series to fix all the stuff missing from the > clone/dedupe file range operations that make them safe and robust. > It seems like copy_file_range is all the checks it needs, too? Are you proposing to not do this check now in favor of the proper work that will do all of those checks you listed above? I can not volunteer to provide this comprehensive check. However if this is the path community decides is the best then I can move this check into NFS for now and remove it once VFS provides such check later. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Fri, Oct 26, 2018 at 5:26 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > +++ b/fs/read_write.c > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > } > > } > > > > + if (pos_in >= i_size_read(inode_in)) > > + return -EINVAL; > > + > > if (file_out->f_op->copy_file_range) { > > ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, > > pos_out, len, flags); > > Is this the right place to check this? Surely we should be checking for > this before calling clone_file_range()? Ops, indeed this is the wrong place. If I were to keep this check here then I need to move it where it was originally located. However, right now I'm included to not do the check in VFS at all and move it to NFS.
On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > Input source offset can't be beyond the end of the file. > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > --- > > > fs/read_write.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > index fb4ffca..b3b304e 100644 > > > --- a/fs/read_write.c > > > +++ b/fs/read_write.c > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > } > > > } > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > + return -EINVAL; > > > + > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > checks in generic_write_checks() apply, right? And the same security > > issues like stripping setuid bits, etc? And we need to touch > > atime on the source file, too? > > Yes sound like needed checks. > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > merge another ~30 patch series to fix all the stuff missing from the > > clone/dedupe file range operations that make them safe and robust. > > It seems like copy_file_range is all the checks it needs, too? > > Are you proposing to not do this check now in favor of the proper work > that will do all of those checks you listed above? No, I'm saying that if you're adding one check, there's a whole heap of checks that still need to be added, *especially* if this is going to fall back to page cache copy between superblocks that may have different limits and constraints. There's security issues in this API. They need to be fixed before we allow it to do more and potentially expose more problems due to it's wider capability. > I can not volunteer > to provide this comprehensive check. Why not? Cheers, Dave.
On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > fs/read_write.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > index fb4ffca..b3b304e 100644 > > > > --- a/fs/read_write.c > > > > +++ b/fs/read_write.c > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > } > > > > } > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > + return -EINVAL; > > > > + > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > checks in generic_write_checks() apply, right? And the same security > > > issues like stripping setuid bits, etc? And we need to touch > > > atime on the source file, too? > > > > Yes sound like needed checks. > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > merge another ~30 patch series to fix all the stuff missing from the > > > clone/dedupe file range operations that make them safe and robust. > > > It seems like copy_file_range is all the checks it needs, too? > > > > Are you proposing to not do this check now in favor of the proper work > > that will do all of those checks you listed above? > > No, I'm saying that if you're adding one check, there's a whole heap > of checks that still need to be added, *especially* if this is going > to fall back to page cache copy between superblocks that may have > different limits and constraints. > > There's security issues in this API. They need to be fixed before we > allow it to do more and potentially expose more problems due to it's > wider capability. Sounds like you are arguing that enabling generic copy_file_range() via do_splice() isn't a good thing without those checks. I'm totally OK with removing this functionality. As I mentioned earlier I added it as I thought it was beneficial and I assumed that do_splice() was a generic enough API to handle what was needed. What this patch series was intended for was removing/relocating the cross device check and it sounds like I should be keeping it just to that. > > I can not volunteer > > to provide this comprehensive check. > > Why not? I don't have the expertise in the VFS code. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > fs/read_write.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > index fb4ffca..b3b304e 100644 > > > > --- a/fs/read_write.c > > > > +++ b/fs/read_write.c > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > } > > > > } > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > + return -EINVAL; > > > > + > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > checks in generic_write_checks() apply, right? And the same security > > > issues like stripping setuid bits, etc? And we need to touch > > > atime on the source file, too? > > > > Yes sound like needed checks. > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > merge another ~30 patch series to fix all the stuff missing from the > > > clone/dedupe file range operations that make them safe and robust. > > > It seems like copy_file_range is all the checks it needs, too? > > > > Are you proposing to not do this check now in favor of the proper work > > that will do all of those checks you listed above? > > No, I'm saying that if you're adding one check, there's a whole heap > of checks that still need to be added, *especially* if this is going > to fall back to page cache copy between superblocks that may have > different limits and constraints. > > There's security issues in this API. They need to be fixed before we > allow it to do more and potentially expose more problems due to it's > wider capability. Before I totally give up on this feature, can you help me understand your concerns with allowing the generic copy_file_range via do_splice(). I have mentioned I'm not a VFS expert thus I come from just looking at the available documentation and the code. I don't see any restrictions on the files being passed in the do_splice_direct(). There are no restrictions that they must be from the same filesystem or file system type. But perhaps this not the concern you had but more about checking validity of arguments? I have looked at Dave Wong's, if I'm not mistaken these 2 are the relevant patches: [PATCH 02/28] vfs: check file ranges before cloning files -- a couple but not all checks apply to copy_file_range() . specifically, the offsets don't wrap and offset isn't past eof (as my patch suggests). Other checks have to do with aligned memory which I don't believe is needed or other dedup requirement that don't apply. [PATCH 04/28] vfs: strengthen checking of file range inputs to generic_remap_checks -- these checks apply to the code once we fall back to the do_splice(). it looks to me that perhaps exporting generic_access_check_limits()/generic_write_check_limits so that they can be used by the copy_file_range when it get pulled in. Also, can you elaborate one what "security issues" are present in this API? Is it "stripping setuid bits" and so something like calling file_remove_priv() that should be done when the fallback to the do_splice_direct() happens? As for the atime, wouldn't the ->copy_file_range() be updating the file attributes? I guess for the fallback case, the attributes need to be updated. If those are checks/issues needed to be addressed and would then get the generic copy_file_range() in, I could give a go at a patch (or 2). > > > I can not volunteer > > to provide this comprehensive check. > > Why not? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Tue, Oct 30, 2018 at 5:10 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > > --- > > > > > fs/read_write.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > > index fb4ffca..b3b304e 100644 > > > > > --- a/fs/read_write.c > > > > > +++ b/fs/read_write.c > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > > } > > > > > } > > > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > > + return -EINVAL; > > > > > + > > > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > > checks in generic_write_checks() apply, right? And the same security > > > > issues like stripping setuid bits, etc? And we need to touch > > > > atime on the source file, too? > > > > > > Yes sound like needed checks. > > > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > > merge another ~30 patch series to fix all the stuff missing from the > > > > clone/dedupe file range operations that make them safe and robust. > > > > It seems like copy_file_range is all the checks it needs, too? > > > > > > Are you proposing to not do this check now in favor of the proper work > > > that will do all of those checks you listed above? > > > > No, I'm saying that if you're adding one check, there's a whole heap > > of checks that still need to be added, *especially* if this is going > > to fall back to page cache copy between superblocks that may have > > different limits and constraints. > > > > There's security issues in this API. They need to be fixed before we > > allow it to do more and potentially expose more problems due to it's > > wider capability. > > Before I totally give up on this feature, can you help me understand > your concerns with allowing the generic copy_file_range via > do_splice(). > > I have mentioned I'm not a VFS expert thus I come from just looking at > the available documentation and the code. > > I don't see any restrictions on the files being passed in the > do_splice_direct(). There are no restrictions that they must be from > the same filesystem or file system type. But perhaps this not the > concern you had but more about checking validity of arguments? > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the apologizes Darrick Wong's patches. > relevant patches: > [PATCH 02/28] vfs: check file ranges before cloning files > -- a couple but not all checks apply to copy_file_range() . > specifically, the offsets don't wrap and offset isn't past eof (as my > patch suggests). Other checks have to do with aligned memory which I > don't believe is needed or other dedup requirement that don't apply. > > [PATCH 04/28] vfs: strengthen checking of file range inputs to > generic_remap_checks > -- these checks apply to the code once we fall back to the > do_splice(). it looks to me that perhaps exporting > generic_access_check_limits()/generic_write_check_limits so that they > can be used by the copy_file_range when it get pulled in. > > Also, can you elaborate one what "security issues" are present in this > API? Is it "stripping setuid bits" and so something like calling > file_remove_priv() that should be done when the fallback to the > do_splice_direct() happens? > > As for the atime, wouldn't the ->copy_file_range() be updating the > file attributes? I guess for the fallback case, the attributes need to > be updated. > > If those are checks/issues needed to be addressed and would then get > the generic copy_file_range() in, I could give a go at a patch (or 2). > > > > > I can not volunteer > > > to provide this comprehensive check. > > > > Why not? > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com
On Tue, Oct 30, 2018 at 09:40:09AM -0400, Olga Kornievskaia wrote: > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > > --- > > > > > fs/read_write.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > > index fb4ffca..b3b304e 100644 > > > > > --- a/fs/read_write.c > > > > > +++ b/fs/read_write.c > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > > } > > > > > } > > > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > > + return -EINVAL; > > > > > + > > > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > > checks in generic_write_checks() apply, right? And the same security > > > > issues like stripping setuid bits, etc? And we need to touch > > > > atime on the source file, too? > > > > > > Yes sound like needed checks. > > > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > > merge another ~30 patch series to fix all the stuff missing from the > > > > clone/dedupe file range operations that make them safe and robust. > > > > It seems like copy_file_range is all the checks it needs, too? > > > > > > Are you proposing to not do this check now in favor of the proper work > > > that will do all of those checks you listed above? > > > > No, I'm saying that if you're adding one check, there's a whole heap > > of checks that still need to be added, *especially* if this is going > > to fall back to page cache copy between superblocks that may have > > different limits and constraints. > > > > There's security issues in this API. They need to be fixed before we > > allow it to do more and potentially expose more problems due to it's > > wider capability. > > Sounds like you are arguing that enabling generic copy_file_range() > via do_splice() isn't a good thing without those checks. It's already enabled for filesystems that don't provide ->remap_file_range or ->copy_file_range. Your changes don't introduce new problems, that just made me aware that there are serious problems here, too. The problem we've got now is that filesystem that use ->remap_file_range() are going to have very different error detection and handling when Darrick's patchset is merged. They are going to reject setuid files, obey rlimit, LFS, superblock max sizes, etc. while ->copy_file_range() and do_splice_direct() methods are not. That really needs to be fixed. > I'm totally > OK with removing this functionality. As I mentioned earlier I added it > as I thought it was beneficial and I assumed that do_splice() was a > generic enough API to handle what was needed. That doesn't fix the problems that are already there. :/ > > > I can not volunteer > > > to provide this comprehensive check. > > > > Why not? > > I don't have the expertise in the VFS code. Just like all the other people without VFS expertise who fix problems introduced in code written by people without VFS expertise. I'm certainly no VFS expert, either, I've just seen, found and helped fix a bunch of serious problems in the VFS dedupe/reflink code that were uncovered by data corruptions on XFS filesystems. That's part of being a filesystem developer - the VFS is "common property" of all the filesystems and everyone has to do their bit to maintain it.... Cheers, Dave.
On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote: > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > > --- > > > > > fs/read_write.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > > index fb4ffca..b3b304e 100644 > > > > > --- a/fs/read_write.c > > > > > +++ b/fs/read_write.c > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > > } > > > > > } > > > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > > + return -EINVAL; > > > > > + > > > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > > checks in generic_write_checks() apply, right? And the same security > > > > issues like stripping setuid bits, etc? And we need to touch > > > > atime on the source file, too? > > > > > > Yes sound like needed checks. > > > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > > merge another ~30 patch series to fix all the stuff missing from the > > > > clone/dedupe file range operations that make them safe and robust. > > > > It seems like copy_file_range is all the checks it needs, too? > > > > > > Are you proposing to not do this check now in favor of the proper work > > > that will do all of those checks you listed above? > > > > No, I'm saying that if you're adding one check, there's a whole heap > > of checks that still need to be added, *especially* if this is going > > to fall back to page cache copy between superblocks that may have > > different limits and constraints. > > > > There's security issues in this API. They need to be fixed before we > > allow it to do more and potentially expose more problems due to it's > > wider capability. > > Before I totally give up on this feature, can you help me understand > your concerns with allowing the generic copy_file_range via > do_splice(). it's not do_splice_direct() i'm concerned about. It's /writing data without adequate checks/ that I'm concerned about. ->copy_file_range() also writes data, so it needs to undergo the same safety checks as well. > I have mentioned I'm not a VFS expert thus I come from just looking at > the available documentation and the code. > > I don't see any restrictions on the files being passed in the > do_splice_direct(). There are no restrictions that they must be from > the same filesystem or file system type. But perhaps this not the > concern you had but more about checking validity of arguments? > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the > relevant patches: > [PATCH 02/28] vfs: check file ranges before cloning files > -- a couple but not all checks apply to copy_file_range() . Yes, of course - clone/dedupe have different constraints, but the core checks are still needed for copy_file_range(). For example, the man page says: EINVAL Requested range extends beyond the end of the source file; or the flags argument is not 0. Your patch above doesn't actually check that - it only checks if the pos_in is beyond EOF. It needs to check if pos_in + len is beyond EOF. After checking for wraps, of course. > [PATCH 04/28] vfs: strengthen checking of file range inputs to > generic_remap_checks > -- these checks apply to the code once we fall back to the > do_splice(). man page says: EFBIG An attempt was made to write a file that exceeds the implementation-defined maximum file size or the process's file size limit, or to write at a position past the maximum allowed offset. These conditions apply to the destination file regards of the method used to copy the data. That's what the generic methods now check for clone/dedupe, and need to be used here, too. e.g. In the case of NFS, if the user ion the NFS client is not allowed to copy a file (e.g. say user RLIMIT restrictions) then that should not be bypassed just because the NFS client can do a server side copy. The restriction has been placed on the local user calling copy_file_range(), not on the server side implementation. > Also, can you elaborate one what "security issues" are present in this > API? Is it "stripping setuid bits" and so something like calling > file_remove_priv() that should be done when the fallback to the > do_splice_direct() happens? From 4.19-rc8: 7debbf015f58 xfs: update ctime and remove suid before cloning files Which then got moved into the generic remap_file_range code in Darrick's "vfs: remap helper should update destination inode metadata" patch: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d We can't assume that a server side copy is going to strip setuid bits or even update target files c/mtimes. > As for the atime, wouldn't the ->copy_file_range() be updating the > file attributes? I guess for the fallback case, the attributes need to > be updated. Generic VFS code should take care of atime updates. Hence if the source file needs an atime update, it should be in the generic code and done for all copy methods. > If those are checks/issues needed to be addressed and would then get > the generic copy_file_range() in, I could give a go at a patch (or 2). yes, those are the ones I'm aware of. BUt there may be more, I haven't really looked that closely... Cheers, Dave.
On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@fromorbit.com> wrote: > > On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote: > > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > > > --- > > > > > > fs/read_write.c | 3 +++ > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > > > index fb4ffca..b3b304e 100644 > > > > > > --- a/fs/read_write.c > > > > > > +++ b/fs/read_write.c > > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > > > } > > > > > > } > > > > > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > > > + return -EINVAL; > > > > > > + > > > > > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > > > checks in generic_write_checks() apply, right? And the same security > > > > > issues like stripping setuid bits, etc? And we need to touch > > > > > atime on the source file, too? > > > > > > > > Yes sound like needed checks. > > > > > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > > > merge another ~30 patch series to fix all the stuff missing from the > > > > > clone/dedupe file range operations that make them safe and robust. > > > > > It seems like copy_file_range is all the checks it needs, too? > > > > > > > > Are you proposing to not do this check now in favor of the proper work > > > > that will do all of those checks you listed above? > > > > > > No, I'm saying that if you're adding one check, there's a whole heap > > > of checks that still need to be added, *especially* if this is going > > > to fall back to page cache copy between superblocks that may have > > > different limits and constraints. > > > > > > There's security issues in this API. They need to be fixed before we > > > allow it to do more and potentially expose more problems due to it's > > > wider capability. > > > > Before I totally give up on this feature, can you help me understand > > your concerns with allowing the generic copy_file_range via > > do_splice(). > > it's not do_splice_direct() i'm concerned about. It's /writing data > without adequate checks/ that I'm concerned about. > ->copy_file_range() also writes data, so it needs to undergo the > same safety checks as well. Thank you Dave for clarifying and elaborating on the points. As you pointed out this concerns apply to the current code the same way as to the patch series. Those concerns should be address however I feel like they shouldn't be the responsibility of this particular patch series. Therefore, I ask for the community to either make any final comments for any changes that are needed to "version 7" patches and if no more comments arise I would like to ask for this to be added to the queue for the next kernel version. Then the next patch series would be just VFS and would add appropriate checks and then allow for the generic copy_file_range() via do_splice. > > I have mentioned I'm not a VFS expert thus I come from just looking at > > the available documentation and the code. > > > > I don't see any restrictions on the files being passed in the > > do_splice_direct(). There are no restrictions that they must be from > > the same filesystem or file system type. But perhaps this not the > > concern you had but more about checking validity of arguments? > > > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the > > relevant patches: > > [PATCH 02/28] vfs: check file ranges before cloning files > > -- a couple but not all checks apply to copy_file_range() . > > Yes, of course - clone/dedupe have different constraints, but the > core checks are still needed for copy_file_range(). > > For example, the man page says: > > EINVAL > Requested range extends beyond the end of the source > file; or the flags argument is not 0. > > Your patch above doesn't actually check that - it only checks if the > pos_in is beyond EOF. It needs to check if pos_in + len is beyond > EOF. After checking for wraps, of course. There was a reason why I didn't include the "pos_in + len" check. It sparked the conversation why should "pos_in + len" be an error, when a "read" system call would just return a "short" read and EOF. So I dropped the check for "pst_in + len" to be an error. https://www.spinics.net/lists/linux-nfs/msg62627.html (probably not all the conversations but some of them) > > [PATCH 04/28] vfs: strengthen checking of file range inputs to > > generic_remap_checks > > -- these checks apply to the code once we fall back to the > > do_splice(). > > man page says: > > EFBIG > An attempt was made to write a file that exceeds the > implementation-defined maximum file size or the process's > file size limit, or to write at a position past the maximum > allowed offset. > > These conditions apply to the destination file regards of the method > used to copy the data. That's what the generic methods now check for > clone/dedupe, and need to be used here, too. Agreed and once Darrek patches are in, copy_file_range() can use them too. > e.g. In the case of NFS, if the user ion the NFS client is not > allowed to copy a file (e.g. say user RLIMIT restrictions) then that > should not be bypassed just because the NFS client can do a server > side copy. The restriction has been placed on the local user calling > copy_file_range(), not on the server side implementation. > > > Also, can you elaborate one what "security issues" are present in this > > API? Is it "stripping setuid bits" and so something like calling > > file_remove_priv() that should be done when the fallback to the > > do_splice_direct() happens? > > From 4.19-rc8: > > 7debbf015f58 xfs: update ctime and remove suid before cloning files > > Which then got moved into the generic remap_file_range code in > Darrick's "vfs: remap helper should update destination inode > metadata" patch: > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d > > We can't assume that a server side copy is going to strip setuid > bits or even update target files c/mtimes. I would like to discuss your concerns about updating attributes (c/m/atimes), why shouldn't it be a ->copy_file_range() responsibility. copy_file_rage is basically a read+write. As far as I can tell, vfs_read and vfs_write (in VFS) don't deal with updating attributes. I'm guessing it's assumed that underlying file systems are going to take care of it (unless of course I misread the code). > > As for the atime, wouldn't the ->copy_file_range() be updating the > > file attributes? I guess for the fallback case, the attributes need to > > be updated. > > Generic VFS code should take care of atime updates. Hence if the > source file needs an atime update, it should be in the generic code > and done for all copy methods. > > > If those are checks/issues needed to be addressed and would then get > > the generic copy_file_range() in, I could give a go at a patch (or 2). > > yes, those are the ones I'm aware of. BUt there may be more, I > haven't really looked that closely... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Wed, Oct 31, 2018 at 10:51:48AM -0400, Olga Kornievskaia wrote: > On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote: > > > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > > > > --- > > > > > > > fs/read_write.c | 3 +++ > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > > > > index fb4ffca..b3b304e 100644 > > > > > > > --- a/fs/read_write.c > > > > > > > +++ b/fs/read_write.c > > > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > > > > + return -EINVAL; > > > > > > > + > > > > > > > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > > > > checks in generic_write_checks() apply, right? And the same security > > > > > > issues like stripping setuid bits, etc? And we need to touch > > > > > > atime on the source file, too? > > > > > > > > > > Yes sound like needed checks. > > > > > > > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > > > > merge another ~30 patch series to fix all the stuff missing from the > > > > > > clone/dedupe file range operations that make them safe and robust. > > > > > > It seems like copy_file_range is all the checks it needs, too? > > > > > > > > > > Are you proposing to not do this check now in favor of the proper work > > > > > that will do all of those checks you listed above? > > > > > > > > No, I'm saying that if you're adding one check, there's a whole heap > > > > of checks that still need to be added, *especially* if this is going > > > > to fall back to page cache copy between superblocks that may have > > > > different limits and constraints. > > > > > > > > There's security issues in this API. They need to be fixed before we > > > > allow it to do more and potentially expose more problems due to it's > > > > wider capability. > > > > > > Before I totally give up on this feature, can you help me understand > > > your concerns with allowing the generic copy_file_range via > > > do_splice(). > > > > it's not do_splice_direct() i'm concerned about. It's /writing data > > without adequate checks/ that I'm concerned about. > > ->copy_file_range() also writes data, so it needs to undergo the > > same safety checks as well. > > Thank you Dave for clarifying and elaborating on the points. As you > pointed out this concerns apply to the current code the same way as to > the patch series. Those concerns should be address however I feel like > they shouldn't be the responsibility of this particular patch series. > Therefore, I ask for the community to either make any final comments > for any changes that are needed to "version 7" patches and if no more > comments arise I would like to ask for this to be added to the queue > for the next kernel version. > > Then the next patch series would be just VFS and would add appropriate > checks and then allow for the generic copy_file_range() via do_splice. That's fine by me. > > > > I have mentioned I'm not a VFS expert thus I come from just looking at > > > the available documentation and the code. > > > > > > I don't see any restrictions on the files being passed in the > > > do_splice_direct(). There are no restrictions that they must be from > > > the same filesystem or file system type. But perhaps this not the > > > concern you had but more about checking validity of arguments? > > > > > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the > > > relevant patches: > > > [PATCH 02/28] vfs: check file ranges before cloning files > > > -- a couple but not all checks apply to copy_file_range() . > > > > Yes, of course - clone/dedupe have different constraints, but the > > core checks are still needed for copy_file_range(). > > > > For example, the man page says: > > > > EINVAL > > Requested range extends beyond the end of the source > > file; or the flags argument is not 0. > > > > Your patch above doesn't actually check that - it only checks if the > > pos_in is beyond EOF. It needs to check if pos_in + len is beyond > > EOF. After checking for wraps, of course. > > There was a reason why I didn't include the "pos_in + len" check. It > sparked the conversation why should "pos_in + len" be an error, when a > "read" system call would just return a "short" read and EOF. So I > dropped the check for "pst_in + len" to be an error. So man page patches will be required, too. :) Basically, we need to nail down the expected semantics, make sure they are correctly documented and /enforced consistently/ across all filesystems. > > > -- these checks apply to the code once we fall back to the > > > do_splice(). > > > > man page says: > > > > EFBIG > > An attempt was made to write a file that exceeds the > > implementation-defined maximum file size or the process's > > file size limit, or to write at a position past the maximum > > allowed offset. > > > > These conditions apply to the destination file regards of the method > > used to copy the data. That's what the generic methods now check for > > clone/dedupe, and need to be used here, too. > > Agreed and once Darrek patches are in, copy_file_range() can use them too. Should be in the next couple of days. > > 7debbf015f58 xfs: update ctime and remove suid before cloning files > > > > Which then got moved into the generic remap_file_range code in > > Darrick's "vfs: remap helper should update destination inode > > metadata" patch: > > > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d > > > > We can't assume that a server side copy is going to strip setuid > > bits or even update target files c/mtimes. > > I would like to discuss your concerns about updating attributes > (c/m/atimes), why shouldn't it be a ->copy_file_range() > responsibility. copy_file_rage is basically a read+write. As far as I > can tell, vfs_read and vfs_write (in VFS) don't deal with updating > attributes. You're looking at the wrong level. The VFS layer is the first multiplexing layer, allowing filesystems to select a method of handling functionality. They then make use of "generic helpers" to implement the required functionality, and they contain the required updates. ie.g. A list of generic helpers with atime update callers from my cscope index: f fs/pipe.c pipe_read 343 file_accessed(filp); h fs/readdir.c iterate_dir 56 file_accessed(file); i fs/splice.c generic_file_splice_read 311 file_accessed(in); j fs/splice.c splice_direct_to_actor 992 file_accessed(in); p mm/filemap.c generic_file_buffered_read 2299 file_accessed(filp); q mm/filemap.c generic_file_read_iter 2339 file_accessed(file); r mm/filemap.c generic_file_mmap 2736 file_accessed(file); These are effectively reference implementations of the file reading infrastructure. Filesystems often have customised implementations but they all must contain the same functioanlity and behaviour as the reference implementation. > I'm guessing it's assumed that underlying file systems are > going to take care of it (unless of course I misread the code). Only the ones that don't specifically call the generic helper to do the work. IOWs, what I'd like to see is a generic_copy_file_range() as the reference implemenation using a page cache copy. This contains all the required checks, timestamp updates, etc. If the filesystem does not supply ->copy_file_range, then generic_copy_file_range() is called, not do_splice_direct(). Indeed, a filesystem should be able to do: .copy_file_range = xfs_copy_file_range, xfs_copy_file_range(...) { trace_xfs_copy_file_range(...) return generic_copy_file_range(....); } and have everything work correctly. Cheers, Dave.
diff --git a/fs/read_write.c b/fs/read_write.c index fb4ffca..b3b304e 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, } } + if (pos_in >= i_size_read(inode_in)) + return -EINVAL; + if (file_out->f_op->copy_file_range) { ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, len, flags);