Message ID | 20190610174007.4818-1-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ceph: copy_file_range needs to strip setuid bits and update timestamps | expand |
On Mon, Jun 10, 2019 at 7:40 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Because ceph doesn't hold destination inode lock throughout the copy, > strip setuid bits before and after copy. > > The destination inode mtime is updated before and after the copy and the > source inode atime is updated after the copy, similar to the filesystem > ->read_iter() implementation. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > > Hi Ilya, > > Please consider applying this patch to ceph branch after merging > Darrick's copy-file-range-fixes branch from: > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git > > The series (including this patch) was tested on ceph by > Luis Henriques using new copy_range xfstests. > > AFAIK, only fallback from ceph to generic_copy_file_range() > implementation was tested and not the actual ceph clustered > copy_file_range. Zheng, Jeff, please take a look. Thanks, Ilya
Amir Goldstein <amir73il@gmail.com> writes: > Because ceph doesn't hold destination inode lock throughout the copy, > strip setuid bits before and after copy. > > The destination inode mtime is updated before and after the copy and the > source inode atime is updated after the copy, similar to the filesystem > ->read_iter() implementation. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > > Hi Ilya, > > Please consider applying this patch to ceph branch after merging > Darrick's copy-file-range-fixes branch from: > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git > > The series (including this patch) was tested on ceph by > Luis Henriques using new copy_range xfstests. > > AFAIK, only fallback from ceph to generic_copy_file_range() > implementation was tested and not the actual ceph clustered > copy_file_range. JFYI I've also run tests that exercise the ceph implementation, i.e. that actually do the copy offload. It's the xfstests that (AFAIR) only exercise the generic VFS copy_file_range as they never meet the requirements for this copy offload to happen (for example, the copy must be at least the same length as the files object size which is 4M by default). Cheers,
On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote: > Because ceph doesn't hold destination inode lock throughout the copy, > strip setuid bits before and after copy. > > The destination inode mtime is updated before and after the copy and the > source inode atime is updated after the copy, similar to the filesystem > ->read_iter() implementation. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > > Hi Ilya, > > Please consider applying this patch to ceph branch after merging > Darrick's copy-file-range-fixes branch from: > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git > > The series (including this patch) was tested on ceph by > Luis Henriques using new copy_range xfstests. > > AFAIK, only fallback from ceph to generic_copy_file_range() > implementation was tested and not the actual ceph clustered > copy_file_range. > > Thanks, > Amir. > > fs/ceph/file.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index c5517ffeb11c..b04c97c7d393 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > goto out; > } > > + /* Should dst_inode lock be held throughout the copy operation? */ > + inode_lock(dst_inode); > + ret = file_modified(dst_file); > + inode_unlock(dst_inode); > + if (ret < 0) { > + dout("failed to modify dst file before copy (%zd)\n", ret); > + goto out; > + } > + I don't see anything that guarantees that the mode of the destination file is up to date at this point. file_modified() just ends up checking the mode cached in the inode. I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference to AUTH_SHARED caps on the destination inode, and then call file_modified() after we get those caps. That would also mean that we wouldn't need to do this a second time after the copy. The catch is that if we did need to issue a setattr, I'm not sure if we'd need to release those caps first. Luis, Zheng, thoughts? > /* > * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other > * clients may have dirty data in their caches. And OSDs know nothing > @@ -2099,6 +2108,14 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > out: > ceph_free_cap_flush(prealloc_cf); > > + file_accessed(src_file); > + /* To be on the safe side, try to remove privs also after copy */ > + inode_lock(dst_inode); > + err = file_modified(dst_file); > + inode_unlock(dst_inode); > + if (err < 0) > + dout("failed to modify dst file after copy (%d)\n", err); > + > return ret; > } >
Jeff Layton <jlayton@kernel.org> writes: > On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote: >> Because ceph doesn't hold destination inode lock throughout the copy, >> strip setuid bits before and after copy. >> >> The destination inode mtime is updated before and after the copy and the >> source inode atime is updated after the copy, similar to the filesystem >> ->read_iter() implementation. >> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com> >> --- >> >> Hi Ilya, >> >> Please consider applying this patch to ceph branch after merging >> Darrick's copy-file-range-fixes branch from: >> git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git >> >> The series (including this patch) was tested on ceph by >> Luis Henriques using new copy_range xfstests. >> >> AFAIK, only fallback from ceph to generic_copy_file_range() >> implementation was tested and not the actual ceph clustered >> copy_file_range. >> >> Thanks, >> Amir. >> >> fs/ceph/file.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index c5517ffeb11c..b04c97c7d393 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> goto out; >> } >> >> + /* Should dst_inode lock be held throughout the copy operation? */ >> + inode_lock(dst_inode); >> + ret = file_modified(dst_file); >> + inode_unlock(dst_inode); >> + if (ret < 0) { >> + dout("failed to modify dst file before copy (%zd)\n", ret); >> + goto out; >> + } >> + > > I don't see anything that guarantees that the mode of the destination > file is up to date at this point. file_modified() just ends up checking > the mode cached in the inode. > > I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference > to AUTH_SHARED caps on the destination inode, and then call > file_modified() after we get those caps. That would also mean that we > wouldn't need to do this a second time after the copy. > > The catch is that if we did need to issue a setattr, I'm not sure if > we'd need to release those caps first. > > Luis, Zheng, thoughts? Hmm... I missed that. IIRC the FILE_WR caps allow to modify some metadata (such as timestamps, and file size). I suppose it doesn't allow to cache the mode, does it? If it does, fixing it would be a matter of moving the code a bit further down. If it doesn't the ceph_copy_file_range function already has this problem, as it calls file_update_time. And I wonder if other code paths have this problem too. Obviously, the chunk below will have the same problem. Cheers,
On Thu, 2019-06-13 at 16:50 +0100, Luis Henriques wrote: > Jeff Layton <jlayton@kernel.org> writes: > > > On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote: > > > Because ceph doesn't hold destination inode lock throughout the copy, > > > strip setuid bits before and after copy. > > > > > > The destination inode mtime is updated before and after the copy and the > > > source inode atime is updated after the copy, similar to the filesystem > > > ->read_iter() implementation. > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > > > > Hi Ilya, > > > > > > Please consider applying this patch to ceph branch after merging > > > Darrick's copy-file-range-fixes branch from: > > > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git > > > > > > The series (including this patch) was tested on ceph by > > > Luis Henriques using new copy_range xfstests. > > > > > > AFAIK, only fallback from ceph to generic_copy_file_range() > > > implementation was tested and not the actual ceph clustered > > > copy_file_range. > > > > > > Thanks, > > > Amir. > > > > > > fs/ceph/file.c | 17 +++++++++++++++++ > > > 1 file changed, 17 insertions(+) > > > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > > index c5517ffeb11c..b04c97c7d393 100644 > > > --- a/fs/ceph/file.c > > > +++ b/fs/ceph/file.c > > > @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > > > goto out; > > > } > > > > > > + /* Should dst_inode lock be held throughout the copy operation? */ > > > + inode_lock(dst_inode); > > > + ret = file_modified(dst_file); > > > + inode_unlock(dst_inode); > > > + if (ret < 0) { > > > + dout("failed to modify dst file before copy (%zd)\n", ret); > > > + goto out; > > > + } > > > + > > > > I don't see anything that guarantees that the mode of the destination > > file is up to date at this point. file_modified() just ends up checking > > the mode cached in the inode. > > > > I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference > > to AUTH_SHARED caps on the destination inode, and then call > > file_modified() after we get those caps. That would also mean that we > > wouldn't need to do this a second time after the copy. > > > > The catch is that if we did need to issue a setattr, I'm not sure if > > we'd need to release those caps first. > > > > Luis, Zheng, thoughts? > > Hmm... I missed that. IIRC the FILE_WR caps allow to modify some > metadata (such as timestamps, and file size). I suppose it doesn't > allow to cache the mode, does it? No, W caps don't guarantee that the mode won't change. You need As or Ax caps for that. > If it does, fixing it would be a > matter of moving the code a bit further down. If it doesn't the > ceph_copy_file_range function already has this problem, as it calls > file_update_time. And I wonder if other code paths have this problem > too. > I think you mean file_remove_privs, but yes...the write codepath has a similar problem. file_remove_privs is called before acquiring any caps, so the same thing could happen there too. It'd be good to fix both places, but taking As cap references in the write codepath could have performance impact in some cases. OTOH, they don't change that much, so maybe that's OK. > Obviously, the chunk below will have the same problem. > Right. If however, we have this code take an As cap reference before doing the copy, then we can be sure that the mode can't change until we drop them. That way we wouldn't need the second call.
Jeff Layton <jlayton@kernel.org> writes: > On Thu, 2019-06-13 at 16:50 +0100, Luis Henriques wrote: >> Jeff Layton <jlayton@kernel.org> writes: >> >> > On Mon, 2019-06-10 at 20:40 +0300, Amir Goldstein wrote: >> > > Because ceph doesn't hold destination inode lock throughout the copy, >> > > strip setuid bits before and after copy. >> > > >> > > The destination inode mtime is updated before and after the copy and the >> > > source inode atime is updated after the copy, similar to the filesystem >> > > ->read_iter() implementation. >> > > >> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> >> > > --- >> > > >> > > Hi Ilya, >> > > >> > > Please consider applying this patch to ceph branch after merging >> > > Darrick's copy-file-range-fixes branch from: >> > > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git >> > > >> > > The series (including this patch) was tested on ceph by >> > > Luis Henriques using new copy_range xfstests. >> > > >> > > AFAIK, only fallback from ceph to generic_copy_file_range() >> > > implementation was tested and not the actual ceph clustered >> > > copy_file_range. >> > > >> > > Thanks, >> > > Amir. >> > > >> > > fs/ceph/file.c | 17 +++++++++++++++++ >> > > 1 file changed, 17 insertions(+) >> > > >> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> > > index c5517ffeb11c..b04c97c7d393 100644 >> > > --- a/fs/ceph/file.c >> > > +++ b/fs/ceph/file.c >> > > @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> > > goto out; >> > > } >> > > >> > > + /* Should dst_inode lock be held throughout the copy operation? */ >> > > + inode_lock(dst_inode); >> > > + ret = file_modified(dst_file); >> > > + inode_unlock(dst_inode); >> > > + if (ret < 0) { >> > > + dout("failed to modify dst file before copy (%zd)\n", ret); >> > > + goto out; >> > > + } >> > > + >> > >> > I don't see anything that guarantees that the mode of the destination >> > file is up to date at this point. file_modified() just ends up checking >> > the mode cached in the inode. >> > >> > I wonder if we ought to fix get_rd_wr_caps() to also acquire a reference >> > to AUTH_SHARED caps on the destination inode, and then call >> > file_modified() after we get those caps. That would also mean that we >> > wouldn't need to do this a second time after the copy. >> > >> > The catch is that if we did need to issue a setattr, I'm not sure if >> > we'd need to release those caps first. >> > >> > Luis, Zheng, thoughts? >> >> Hmm... I missed that. IIRC the FILE_WR caps allow to modify some >> metadata (such as timestamps, and file size). I suppose it doesn't >> allow to cache the mode, does it? > > No, W caps don't guarantee that the mode won't change. You need As or Ax > caps for that. > >> If it does, fixing it would be a >> matter of moving the code a bit further down. If it doesn't the >> ceph_copy_file_range function already has this problem, as it calls >> file_update_time. And I wonder if other code paths have this problem >> too. >> > > I think you mean file_remove_privs, but yes...the write codepath has a > similar problem. file_remove_privs is called before acquiring any caps, > so the same thing could happen there too. > > It'd be good to fix both places, but taking As cap references in the > write codepath could have performance impact in some cases. OTOH, they > don't change that much, so maybe that's OK. > >> Obviously, the chunk below will have the same problem. >> > > Right. If however, we have this code take an As cap reference before > doing the copy, then we can be sure that the mode can't change until we > drop them. That way we wouldn't need the second call. So, do you think the patch below would be enough? It's totally untested, but I wanted to know if that would be acceptable before running some tests on it. Cheers,
On Fri, 2019-06-14 at 09:52 +0100, Luis Henriques wrote: > So, do you think the patch below would be enough? It's totally > untested, but I wanted to know if that would be acceptable before > running some tests on it. > > Cheers, > -- > Luis > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index c5517ffeb11c..f6b0683dd8dc 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -1949,6 +1949,21 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > goto out; > } > > + ret = ceph_do_getattr(dst_inode, CEPH_CAP_AUTH_SHARED, false); > + if (ret < 0) { > + dout("failed to get auth caps on dst file (%zd)\n", ret); > + goto out; > + } > + I think this is still racy. You could lose As caps before file_modified is called. IMO, this code should hold a reference to As caps until the c_f_r operation is complete. That may get tricky however if you do need to issue a setattr to change the mode, as the MDS may try to recall As caps at that point. You won't be able to release them until you drop the reference, so will that deadlock? I'm not sure. > + /* Should dst_inode lock be held throughout the copy operation? */ > + inode_lock(dst_inode); > + ret = file_modified(dst_file); > + inode_unlock(dst_inode); > + if (ret < 0) { > + dout("failed to modify dst file before copy (%zd)\n", ret); > + goto out; > + } > + > /* > * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other > * clients may have dirty data in their caches. And OSDs know nothing
On Fri, 2019-06-14 at 07:43 -0400, Jeff Layton wrote: > On Fri, 2019-06-14 at 09:52 +0100, Luis Henriques wrote: > > So, do you think the patch below would be enough? It's totally > > untested, but I wanted to know if that would be acceptable before > > running some tests on it. > > > > Cheers, > > -- > > Luis > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > index c5517ffeb11c..f6b0683dd8dc 100644 > > --- a/fs/ceph/file.c > > +++ b/fs/ceph/file.c > > @@ -1949,6 +1949,21 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > > goto out; > > } > > > > + ret = ceph_do_getattr(dst_inode, CEPH_CAP_AUTH_SHARED, false); > > + if (ret < 0) { > > + dout("failed to get auth caps on dst file (%zd)\n", ret); > > + goto out; > > + } > > + > > I think this is still racy. You could lose As caps before file_modified > is called. IMO, this code should hold a reference to As caps until the > c_f_r operation is complete. > > That may get tricky however if you do need to issue a setattr to change > the mode, as the MDS may try to recall As caps at that point. You won't > be able to release them until you drop the reference, so will that > deadlock? I'm not sure. > That said...in many (most?) cases the client will already have As caps on the file from the permission check during open, and mode changes (and AUTH cap revokes) are relatively rare. So, the race I'm talking about probably almost never happens in practice. But...privilege escalation attacks often involve setuid changes, so I think we really ought to be careful here. In any case, if holding a reference to As is not feasible, then I we could take the original version of the patch, and maybe pair it with the getattr above. It's not ideal, but it's better than nothing. > > + /* Should dst_inode lock be held throughout the copy operation? */ > > + inode_lock(dst_inode); > > + ret = file_modified(dst_file); > > + inode_unlock(dst_inode); > > + if (ret < 0) { > > + dout("failed to modify dst file before copy (%zd)\n", ret); > > + goto out; > > + } > > + > > /* > > * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other > > * clients may have dirty data in their caches. And OSDs know nothing
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index c5517ffeb11c..b04c97c7d393 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1949,6 +1949,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, goto out; } + /* Should dst_inode lock be held throughout the copy operation? */ + inode_lock(dst_inode); + ret = file_modified(dst_file); + inode_unlock(dst_inode); + if (ret < 0) { + dout("failed to modify dst file before copy (%zd)\n", ret); + goto out; + } + /* * We need FILE_WR caps for dst_ci and FILE_RD for src_ci as other * clients may have dirty data in their caches. And OSDs know nothing @@ -2099,6 +2108,14 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, out: ceph_free_cap_flush(prealloc_cf); + file_accessed(src_file); + /* To be on the safe side, try to remove privs also after copy */ + inode_lock(dst_inode); + err = file_modified(dst_file); + inode_unlock(dst_inode); + if (err < 0) + dout("failed to modify dst file after copy (%d)\n", err); + return ret; }
Because ceph doesn't hold destination inode lock throughout the copy, strip setuid bits before and after copy. The destination inode mtime is updated before and after the copy and the source inode atime is updated after the copy, similar to the filesystem ->read_iter() implementation. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- Hi Ilya, Please consider applying this patch to ceph branch after merging Darrick's copy-file-range-fixes branch from: git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git The series (including this patch) was tested on ceph by Luis Henriques using new copy_range xfstests. AFAIK, only fallback from ceph to generic_copy_file_range() implementation was tested and not the actual ceph clustered copy_file_range. Thanks, Amir. fs/ceph/file.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)