Message ID | 1473856994-27463-5-git-send-email-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote: > Use vfs_copy_file_range() helper instead of calling do_splice_direct() > when copying up file data. > When copying up within the same fs, which supports copy_file_range(), > fs implementation can be more efficient then do_splice_direct(). > vfs_copy_file_range() helper falls back to do_splice_direct() if > it cannot use the file system's copy_file_range() implementation. > > A previous change added a vfs_clone_file_range() call before > the data copy loop, so this change is only effective for filesystems > that support copy_file_range() and *do not* support clone_file_range(). > At the moment, there are no such filesystems in the kernel that > can be used as overlayfs upper, so I tested this change by disabling > the vfs_clone_file_range() call. > > Tested correct behavior when lower and upper are on: > 1. same ext4 (copy) > 2. same xfs + reflink patches + mkfs.xfs (copy) > 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink) > 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy) > > For comparison, on my laptop, xfstest overlay/001 (copy up of large > sparse files) takes less than 1 second in the xfs reflink setup vs. > 25 seconds on the rest of the setups. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/overlayfs/copy_up.c | 10 +++++----- > fs/read_write.c | 1 + > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c > index ba039f8..a6d6bac 100644 > --- a/fs/overlayfs/copy_up.c > +++ b/fs/overlayfs/copy_up.c > @@ -146,7 +146,6 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) > /* Can't clone, so now we try to copy the data */ > error = 0; > > - /* FIXME: copy up sparse files efficiently */ > while (len) { > size_t this_len = OVL_COPY_UP_CHUNK_SIZE; > long bytes; > @@ -159,15 +158,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) > break; > } > > - bytes = do_splice_direct(old_file, &old_pos, > - new_file, &new_pos, > - this_len, SPLICE_F_MOVE); > + bytes = vfs_copy_file_range(old_file, old_pos, > + new_file, new_pos, > + this_len, 0); > if (bytes <= 0) { > error = bytes; > break; > } > - WARN_ON(old_pos != new_pos); > > + old_pos += bytes; > + new_pos += bytes; > len -= bytes; > } > out: > diff --git a/fs/read_write.c b/fs/read_write.c > index 6975fe8..dfc083a 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1515,6 +1515,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > file_out->f_op->copy_file_range) > ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, > pos_out, len, flags); > + /* FIXME: copy sparse file range efficiently */ > if (ret == -EOPNOTSUPP) > ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); > -- > 2.7.4 > Guys, I just hit a lockdep warning on nesting an mnt_want_write() with this patch because overlay already holds the write lock when getting to ovl_copy_up_data() I don't know how I missed it before. Is it really a problem to nest this lock? Should I factor our another set of _locked helpers from the vfs_{copy,clone}_file_range helpers? =========== [ 675.814754] ============================================= [ 675.814755] [ INFO: possible recursive locking detected ] [ 675.814757] 4.8.0-rc5+ #5 Tainted: G W O [ 675.814758] --------------------------------------------- [ 675.814759] xfs_io/15241 is trying to acquire lock: [ 675.814760] (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814767] but task is already holding lock: [ 675.814768] (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814772] other info that might help us debug this: [ 675.814773] Possible unsafe locking scenario: [ 675.814774] CPU0 [ 675.814775] ---- [ 675.814775] lock(sb_writers#7); [ 675.814777] lock(sb_writers#7); [ 675.814779] *** DEADLOCK *** [ 675.814780] May be due to missing lock nesting notation [ 675.814782] 4 locks held by xfs_io/15241: [ 675.814783] #0: (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814787] #1: (&type->s_vfs_rename_key){+.+.+.}, at: [<ffffffffa227f992>] lock_rename+0x32/0x100 [ 675.814791] #2: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffffa227fa40>] lock_rename+0xe0/0x100 [ 675.814796] #3: (&type->i_mutex_dir_key#2/5){+.+.+.}, at: [<ffffffffa227fa55>] lock_rename+0xf5/0x100 [ 675.814800] stack backtrace: [ 675.814802] CPU: 2 PID: 15241 Comm: xfs_io Tainted: G W O 4.8.0-rc5+ #5 [ 675.814804] Hardware name: Dell Inc. Latitude E7450/0D8H72, BIOS A13 05/17/2016 [ 675.814805] 0000000000000086 00000000029bfc05 ffff9cd454ab78b0 ffffffffa2474d83 [ 675.814808] ffffffffa3a7b6f0 ffff9cd4685f0000 ffff9cd454ab7980 ffffffffa20ed54d [ 675.814810] 0000000000000296 0000000400000246 ffff9cd400000000 ffffffffa3392101 [ 675.814813] Call Trace: [ 675.814816] [<ffffffffa2474d83>] dump_stack+0x85/0xc2 [ 675.814820] [<ffffffffa20ed54d>] __lock_acquire+0x148d/0x14f0 [ 675.814822] [<ffffffffa22ec170>] ? dquot_file_open+0x40/0x50 [ 675.814825] [<ffffffffa20eda70>] lock_acquire+0x100/0x1f0 [ 675.814826] [<ffffffffa2277144>] ? __sb_start_write+0xb4/0xf0 [ 675.814829] [<ffffffffa20e6b1f>] percpu_down_read+0x4f/0xa0 [ 675.814830] [<ffffffffa2277144>] ? __sb_start_write+0xb4/0xf0 [ 675.814832] [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814834] [<ffffffffa23cdd2d>] ? security_file_permission+0x3d/0xc0 [ 675.814837] [<ffffffffa229ba88>] mnt_want_write_file+0x28/0x60 [ 675.814839] [<ffffffffa2274dd2>] vfs_copy_file_range+0xc2/0x270 [ 675.814844] [<ffffffffc0c797f6>] ovl_copy_up_one+0x4d6/0x710 [overlay] [ 675.814847] [<ffffffffc0c79b16>] ovl_copy_up+0xe6/0x118 [overlay] [ 675.814850] [<ffffffffc0c75d6d>] ovl_open_maybe_copy_up+0x8d/0xd0 [overlay] [ 675.814852] [<ffffffffc0c73303>] ovl_d_real+0xd3/0x130 [overlay] [ 675.814854] [<ffffffffa227209f>] vfs_open+0x6f/0x80 [ 675.814856] [<ffffffffa2284698>] path_openat+0x2a8/0xb80 [ 675.814858] [<ffffffffa22864e1>] do_filp_open+0x91/0x100 [ 675.814861] [<ffffffffa292abe7>] ? _raw_spin_unlock+0x27/0x40 [ 675.814862] [<ffffffffa2297f39>] ? __alloc_fd+0xf9/0x210 [ 675.814864] [<ffffffffa2272494>] do_sys_open+0x124/0x210 [ 675.814867] [<ffffffffa227259e>] SyS_open+0x1e/0x20 [ 675.814868] [<ffffffffa292b540>] entry_SYSCALL_64_fastpath+0x23/0xc1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 22, 2016 at 10:49 AM, Amir Goldstein <amir73il@gmail.com> wrote: > Guys, I just hit a lockdep warning on nesting an mnt_want_write() with > this patch > because overlay already holds the write lock when getting to ovl_copy_up_data() > I don't know how I missed it before. > > Is it really a problem to nest this lock? > Should I factor our another set of _locked helpers from the > vfs_{copy,clone}_file_range helpers? > vfs_{copy,clone}_file_range should call sb_start_write() instead of mnt_want_write_file(), the checks against FMODE_WRITE + S_ISREG ensure that the __mnt_want_write() part is already held on the file. That still leaves the lockdep warning. We could do __vfs_{copy,clone}_file_range() variants without the sb_start_write()/sb_end_write() and add the non-underscore variants as static inline to fs.h that do call the sb_start/end_write. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 22, 2016 at 5:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, Sep 22, 2016 at 10:49 AM, Amir Goldstein <amir73il@gmail.com> wrote: >> Guys, I just hit a lockdep warning on nesting an mnt_want_write() with >> this patch >> because overlay already holds the write lock when getting to ovl_copy_up_data() >> I don't know how I missed it before. >> >> Is it really a problem to nest this lock? >> Should I factor our another set of _locked helpers from the >> vfs_{copy,clone}_file_range helpers? >> > > vfs_{copy,clone}_file_range should call sb_start_write() instead of > mnt_want_write_file(), the checks against FMODE_WRITE + S_ISREG ensure > that the __mnt_want_write() part is already held on the file. Wait a minute, I think you got the solution but mixed it up a bit. IIUC, vfs_{clone,copy} should call __mnt_want_write_file() instead of mnt_want_write_file(), which will skip sb_start_write() because file is already open for write and no lockdep warning and no problem. Am I missing something? > > That still leaves the lockdep warning. We could do > __vfs_{copy,clone}_file_range() variants without the > sb_start_write()/sb_end_write() and add the non-underscore variants as > static inline to fs.h that do call the sb_start/end_write. > > Thanks, > Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 22, 2016 at 6:44 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Thu, Sep 22, 2016 at 5:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Thu, Sep 22, 2016 at 10:49 AM, Amir Goldstein <amir73il@gmail.com> wrote: >>> Guys, I just hit a lockdep warning on nesting an mnt_want_write() with >>> this patch >>> because overlay already holds the write lock when getting to ovl_copy_up_data() >>> I don't know how I missed it before. >>> >>> Is it really a problem to nest this lock? >>> Should I factor our another set of _locked helpers from the >>> vfs_{copy,clone}_file_range helpers? >>> >> >> vfs_{copy,clone}_file_range should call sb_start_write() instead of >> mnt_want_write_file(), the checks against FMODE_WRITE + S_ISREG ensure >> that the __mnt_want_write() part is already held on the file. > > Wait a minute, I think you got the solution but mixed it up a bit. > IIUC, vfs_{clone,copy} should call __mnt_want_write_file() > instead of mnt_want_write_file(), which will skip sb_start_write() > because file is already open for write and no lockdep warning > and no problem. > > Am I missing something? > Ah.. I got it. Anyway a short survey of fs/namei.c shows that the custom is to have mnt_want_write() in either the syscall or the do_XXX helper and not in the vfs_XXX helper, so I will conform to this standard. >> >> That still leaves the lockdep warning. We could do >> __vfs_{copy,clone}_file_range() variants without the >> sb_start_write()/sb_end_write() and add the non-underscore variants as >> static inline to fs.h that do call the sb_start/end_write. >> >> Thanks, >> Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index ba039f8..a6d6bac 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -146,7 +146,6 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) /* Can't clone, so now we try to copy the data */ error = 0; - /* FIXME: copy up sparse files efficiently */ while (len) { size_t this_len = OVL_COPY_UP_CHUNK_SIZE; long bytes; @@ -159,15 +158,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) break; } - bytes = do_splice_direct(old_file, &old_pos, - new_file, &new_pos, - this_len, SPLICE_F_MOVE); + bytes = vfs_copy_file_range(old_file, old_pos, + new_file, new_pos, + this_len, 0); if (bytes <= 0) { error = bytes; break; } - WARN_ON(old_pos != new_pos); + old_pos += bytes; + new_pos += bytes; len -= bytes; } out: diff --git a/fs/read_write.c b/fs/read_write.c index 6975fe8..dfc083a 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1515,6 +1515,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, file_out->f_op->copy_file_range) ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, len, flags); + /* FIXME: copy sparse file range efficiently */ if (ret == -EOPNOTSUPP) ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
Use vfs_copy_file_range() helper instead of calling do_splice_direct() when copying up file data. When copying up within the same fs, which supports copy_file_range(), fs implementation can be more efficient then do_splice_direct(). vfs_copy_file_range() helper falls back to do_splice_direct() if it cannot use the file system's copy_file_range() implementation. A previous change added a vfs_clone_file_range() call before the data copy loop, so this change is only effective for filesystems that support copy_file_range() and *do not* support clone_file_range(). At the moment, there are no such filesystems in the kernel that can be used as overlayfs upper, so I tested this change by disabling the vfs_clone_file_range() call. Tested correct behavior when lower and upper are on: 1. same ext4 (copy) 2. same xfs + reflink patches + mkfs.xfs (copy) 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink) 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy) For comparison, on my laptop, xfstest overlay/001 (copy up of large sparse files) takes less than 1 second in the xfs reflink setup vs. 25 seconds on the rest of the setups. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- fs/overlayfs/copy_up.c | 10 +++++----- fs/read_write.c | 1 + 2 files changed, 6 insertions(+), 5 deletions(-)