Message ID | 20210212124354.6.Idc9c3110d708aa0df9d8fe5a6246524dc8469dae@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add generated flag to filesystem struct to block copy_file_range | expand |
On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote: > copy_file_range (which calls generic_copy_file_checks) uses the > inode file size to adjust the copy count parameter. This breaks > with special filesystems like procfs/sysfs/debugfs/tracefs, where > the file size appears to be zero, but content is actually returned > when a read operation is performed. Other issues would also > happen on partial writes, as the function would attempt to seek > in the input file. > > Use the newly introduced FS_GENERATED_CONTENT filesystem flag > to return -EOPNOTSUPP: applications can then retry with a more > usual read/write based file copy (the fallback code is usually > already present to handle older kernels). > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> > --- > > fs/read_write.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/read_write.c b/fs/read_write.c > index 0029ff2b0ca8..80322e89fb0a 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > if (flags != 0) > return -EINVAL; > > + if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT) > + return -EOPNOTSUPP; Why not declare a dummy copy_file_range_nop function that returns EOPNOTSUPP and point all of these filesystems at it? (Or, I guess in these days where function pointers are the enemy, create a #define that is a cast of 0x1, and fix do_copy_file_range to return EOPNOTSUPP if it sees that?) --D > + > ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len, > flags); > if (unlikely(ret)) > -- > 2.30.0.478.g8a0d178c01-goog >
On Thu, Feb 11, 2021 at 08:53:47PM -0800, Darrick J. Wong wrote: > On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote: > > copy_file_range (which calls generic_copy_file_checks) uses the > > inode file size to adjust the copy count parameter. This breaks > > with special filesystems like procfs/sysfs/debugfs/tracefs, where > > the file size appears to be zero, but content is actually returned > > when a read operation is performed. Other issues would also > > happen on partial writes, as the function would attempt to seek > > in the input file. > > > > Use the newly introduced FS_GENERATED_CONTENT filesystem flag > > to return -EOPNOTSUPP: applications can then retry with a more > > usual read/write based file copy (the fallback code is usually > > already present to handle older kernels). > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> > > --- > > > > fs/read_write.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > index 0029ff2b0ca8..80322e89fb0a 100644 > > --- a/fs/read_write.c > > +++ b/fs/read_write.c > > @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > if (flags != 0) > > return -EINVAL; > > > > + if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT) > > + return -EOPNOTSUPP; > > Why not declare a dummy copy_file_range_nop function that returns > EOPNOTSUPP and point all of these filesystems at it? > > (Or, I guess in these days where function pointers are the enemy, > create a #define that is a cast of 0x1, and fix do_copy_file_range to > return EOPNOTSUPP if it sees that?) Oh, I see, because that doesn't help if the source file is procfs and the dest file is (say) xfs, because the generic version will try to do splice magic and *poof*. I guess the other nit thatI can think of at this late hour is ... what about the other virtual filesystems like configfs and whatnot? Should we have a way to flag them as "this can't be the source of a CFR request" as well? Or is it just trace/debug/proc/sysfs that have these "zero size but readable" speshul behaviors? --D > > --D > > > + > > ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len, > > flags); > > if (unlikely(ret)) > > -- > > 2.30.0.478.g8a0d178c01-goog > >
On Fri, Feb 12, 2021 at 12:59 PM Darrick J. Wong <djwong@kernel.org> wrote: > > On Thu, Feb 11, 2021 at 08:53:47PM -0800, Darrick J. Wong wrote: > > On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote: > > > copy_file_range (which calls generic_copy_file_checks) uses the > > > inode file size to adjust the copy count parameter. This breaks > > > with special filesystems like procfs/sysfs/debugfs/tracefs, where > > > the file size appears to be zero, but content is actually returned > > > when a read operation is performed. Other issues would also > > > happen on partial writes, as the function would attempt to seek > > > in the input file. > > > > > > Use the newly introduced FS_GENERATED_CONTENT filesystem flag > > > to return -EOPNOTSUPP: applications can then retry with a more > > > usual read/write based file copy (the fallback code is usually > > > already present to handle older kernels). > > > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> > > > --- > > > > > > fs/read_write.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > index 0029ff2b0ca8..80322e89fb0a 100644 > > > --- a/fs/read_write.c > > > +++ b/fs/read_write.c > > > @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > if (flags != 0) > > > return -EINVAL; > > > > > > + if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT) > > > + return -EOPNOTSUPP; > > > > Why not declare a dummy copy_file_range_nop function that returns > > EOPNOTSUPP and point all of these filesystems at it? > > > > (Or, I guess in these days where function pointers are the enemy, > > create a #define that is a cast of 0x1, and fix do_copy_file_range to > > return EOPNOTSUPP if it sees that?) I was pondering abusing ERR_PTR(-EOPNOTSUPP) for this purpose ,-P > > Oh, I see, because that doesn't help if the source file is procfs and > the dest file is (say) xfs, because the generic version will try to do > splice magic and *poof*. Yep. I mean, we could still add a check if the file_in->f_op->copy_file_range == copy_file_range_nop in do_copy_file_range... But then we'd need to sprinkle .copy_file_range = copy_file_range_nop in many many places (~700 as a lower bound[1]), since the file operation structure is defined at the file level, not at the FS level, and people are likely to forget... [1] $ git grep "struct file_operations.*=" | grep debug | wc -l 631 $ git grep "struct file_operations.*=" | grep trace | wc -l 84 > > I guess the other nit thatI can think of at this late hour is ... what > about the other virtual filesystems like configfs and whatnot? Should > we have a way to flag them as "this can't be the source of a CFR > request" as well? > > Or is it just trace/debug/proc/sysfs that have these "zero size but > readable" speshul behaviors? I did try to audit the other filesystems. The ones I spotted: - devpts should be fine (only device nodes in there) - I think pstore doesn't need the flag as it's RAM-backed and persistent. But yes, I missed configfs, thanks for catching that. I think we need to add the flag for that one (looks like the sizes are all 4K). > > --D > > > > > --D > > > > > + > > > ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len, > > > flags); > > > if (unlikely(ret)) > > > -- > > > 2.30.0.478.g8a0d178c01-goog > > >
diff --git a/fs/read_write.c b/fs/read_write.c index 0029ff2b0ca8..80322e89fb0a 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (flags != 0) return -EINVAL; + if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT) + return -EOPNOTSUPP; + ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len, flags); if (unlikely(ret))
copy_file_range (which calls generic_copy_file_checks) uses the inode file size to adjust the copy count parameter. This breaks with special filesystems like procfs/sysfs/debugfs/tracefs, where the file size appears to be zero, but content is actually returned when a read operation is performed. Other issues would also happen on partial writes, as the function would attempt to seek in the input file. Use the newly introduced FS_GENERATED_CONTENT filesystem flag to return -EOPNOTSUPP: applications can then retry with a more usual read/write based file copy (the fallback code is usually already present to handle older kernels). Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> --- fs/read_write.c | 3 +++ 1 file changed, 3 insertions(+)