Message ID | 153913028015.32295.15993665528948323051.stgit@magnolia (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: fixes for serious clone/dedupe problems | expand |
On Wed, Oct 10, 2018 at 3:11 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > File range remapping, if allowed to run past the destination file's EOF, > is an optimization on a regular file write. Regular file writes that > extend the file length are subject to various constraints which are not > checked by range cloning. > > This is a correctness problem because we're never allowed to touch > ranges that the page cache can't support (s_maxbytes); we're not > supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't > set; and we must obey resource limits (RLIMIT_FSIZE). > > Therefore, add these checks to the new generic_remap_checks function so > that we curtail unexpected behavior. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > mm/filemap.c | 39 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 39 insertions(+) > > > diff --git a/mm/filemap.c b/mm/filemap.c > index 14041a8468ba..59056bd9c58a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) > } > EXPORT_SYMBOL(generic_write_checks); > > +static int > +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count) > +{ > + struct inode *inode = file->f_mapping->host; > + > + /* Don't exceed the LFS limits. */ > + if (unlikely(pos + *count > MAX_NON_LFS && > + !(file->f_flags & O_LARGEFILE))) { > + if (pos >= MAX_NON_LFS) > + return -EFBIG; > + *count = min(*count, MAX_NON_LFS - (uint64_t)pos); > + } > + > + /* Don't operate on ranges the page cache doesn't support. */ > + if (unlikely(pos >= inode->i_sb->s_maxbytes)) > + return -EFBIG; > + > + *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos); > + return 0; > +} > + Sorry. I haven't explained myself properly last time. What I meant is that it hurts my eyes to see generic_write_checks() and generic_remap_check_limits() which from the line of (limit != RLIM_INFINITY) are exactly the same thing. Yes, generic_remap_check_limits() uses iov_iter_truncate(), but that's a minor semantic change - it can be easily resolved by creating a dummy iter in generic_remap_checks() instead of passing int *count. You could say that this is nit picking, but the very reason this patch set exists it because clone/dedup implementation did not use the same range checks of write to begin with, so it just seems wrong to diverge them at this point. So to be clear, I suggest that generic_write_checks() should use your generic_remap_check_limits() helper. If you disagree and others can live with this minor duplication, fine by me. Thanks, Amir.
On Wed, Oct 10, 2018 at 08:23:27AM +0300, Amir Goldstein wrote: > On Wed, Oct 10, 2018 at 3:11 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > File range remapping, if allowed to run past the destination file's EOF, > > is an optimization on a regular file write. Regular file writes that > > extend the file length are subject to various constraints which are not > > checked by range cloning. > > > > This is a correctness problem because we're never allowed to touch > > ranges that the page cache can't support (s_maxbytes); we're not > > supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't > > set; and we must obey resource limits (RLIMIT_FSIZE). > > > > Therefore, add these checks to the new generic_remap_checks function so > > that we curtail unexpected behavior. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > mm/filemap.c | 39 +++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 39 insertions(+) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 14041a8468ba..59056bd9c58a 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) > > } > > EXPORT_SYMBOL(generic_write_checks); > > > > +static int > > +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count) > > +{ > > + struct inode *inode = file->f_mapping->host; > > + > > + /* Don't exceed the LFS limits. */ > > + if (unlikely(pos + *count > MAX_NON_LFS && > > + !(file->f_flags & O_LARGEFILE))) { > > + if (pos >= MAX_NON_LFS) > > + return -EFBIG; > > + *count = min(*count, MAX_NON_LFS - (uint64_t)pos); > > + } > > + > > + /* Don't operate on ranges the page cache doesn't support. */ > > + if (unlikely(pos >= inode->i_sb->s_maxbytes)) > > + return -EFBIG; > > + > > + *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos); > > + return 0; > > +} > > + > > Sorry. I haven't explained myself properly last time. > What I meant is that it hurts my eyes to see generic_write_checks() and > generic_remap_check_limits() which from the line of (limit != RLIM_INFINITY) > are exactly the same thing. Yes, generic_remap_check_limits() uses > iov_iter_truncate(), but that's a minor semantic change - it can be easily > resolved by creating a dummy iter in generic_remap_checks() instead of > passing int *count. Making a fake kiocb and iterator seem like a terribly fragile idea. How about I make the common helper take a pos and *count, and generic_write_checks can translate that into iov_iter_truncate? > You could say that this is nit picking, but the very reason this patch > set exists > it because clone/dedup implementation did not use the same range checks > of write to begin with, so it just seems wrong to diverge them at this point. > > So to be clear, I suggest that generic_write_checks() should use your > generic_remap_check_limits() helper. > If you disagree and others can live with this minor duplication, fine by me. Nah, I think I misunderstood you the first time, sorry about that. --D > Thanks, > Amir.
On Wed, Oct 10, 2018 at 8:01 PM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > On Wed, Oct 10, 2018 at 08:23:27AM +0300, Amir Goldstein wrote: > > On Wed, Oct 10, 2018 at 3:11 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > File range remapping, if allowed to run past the destination file's EOF, > > > is an optimization on a regular file write. Regular file writes that > > > extend the file length are subject to various constraints which are not > > > checked by range cloning. > > > > > > This is a correctness problem because we're never allowed to touch > > > ranges that the page cache can't support (s_maxbytes); we're not > > > supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't > > > set; and we must obey resource limits (RLIMIT_FSIZE). > > > > > > Therefore, add these checks to the new generic_remap_checks function so > > > that we curtail unexpected behavior. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > --- > > > mm/filemap.c | 39 +++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 39 insertions(+) > > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 14041a8468ba..59056bd9c58a 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) > > > } > > > EXPORT_SYMBOL(generic_write_checks); > > > > > > +static int > > > +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count) > > > +{ > > > + struct inode *inode = file->f_mapping->host; > > > + > > > + /* Don't exceed the LFS limits. */ > > > + if (unlikely(pos + *count > MAX_NON_LFS && > > > + !(file->f_flags & O_LARGEFILE))) { > > > + if (pos >= MAX_NON_LFS) > > > + return -EFBIG; > > > + *count = min(*count, MAX_NON_LFS - (uint64_t)pos); > > > + } > > > + > > > + /* Don't operate on ranges the page cache doesn't support. */ > > > + if (unlikely(pos >= inode->i_sb->s_maxbytes)) > > > + return -EFBIG; > > > + > > > + *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos); > > > + return 0; > > > +} > > > + > > > > Sorry. I haven't explained myself properly last time. > > What I meant is that it hurts my eyes to see generic_write_checks() and > > generic_remap_check_limits() which from the line of (limit != RLIM_INFINITY) > > are exactly the same thing. Yes, generic_remap_check_limits() uses > > iov_iter_truncate(), but that's a minor semantic change - it can be easily > > resolved by creating a dummy iter in generic_remap_checks() instead of > > passing int *count. > > Making a fake kiocb and iterator seem like a terribly fragile idea. > > How about I make the common helper take a pos and *count, and > generic_write_checks can translate that into iov_iter_truncate? > Seems good to me. Thanks, Amir.
diff --git a/mm/filemap.c b/mm/filemap.c index 14041a8468ba..59056bd9c58a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2974,6 +2974,27 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) } EXPORT_SYMBOL(generic_write_checks); +static int +generic_remap_check_limits(struct file *file, loff_t pos, uint64_t *count) +{ + struct inode *inode = file->f_mapping->host; + + /* Don't exceed the LFS limits. */ + if (unlikely(pos + *count > MAX_NON_LFS && + !(file->f_flags & O_LARGEFILE))) { + if (pos >= MAX_NON_LFS) + return -EFBIG; + *count = min(*count, MAX_NON_LFS - (uint64_t)pos); + } + + /* Don't operate on ranges the page cache doesn't support. */ + if (unlikely(pos >= inode->i_sb->s_maxbytes)) + return -EFBIG; + + *count = min(*count, inode->i_sb->s_maxbytes - (uint64_t)pos); + return 0; +} + /* * Performs necessary checks before doing a clone. * @@ -2992,6 +3013,7 @@ int generic_remap_checks(struct file *file_in, loff_t pos_in, uint64_t bcount; loff_t size_in, size_out; loff_t bs = inode_out->i_sb->s_blocksize; + int ret; /* The start of both ranges must be aligned to an fs block. */ if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_out, bs)) @@ -3015,6 +3037,23 @@ int generic_remap_checks(struct file *file_in, loff_t pos_in, return -EINVAL; count = min(count, size_in - (uint64_t)pos_in); + /* Don't exceed RLMIT_FSIZE in the file we're writing into. */ + if (limit != RLIM_INFINITY) { + if (pos_out >= limit) { + send_sig(SIGXFSZ, current, 0); + return -EFBIG; + } + count = min(count, limit - (uint64_t)pos_out); + } + + ret = generic_remap_check_limits(file_in, pos_in, &count); + if (ret) + return ret; + + ret = generic_remap_check_limits(file_out, pos_out, &count); + if (ret) + return ret; + /* * If the user wanted us to link to the infile's EOF, round up to the * next block boundary for this check.