diff mbox series

vfs: allow using kernel buffer during fiemap operation

Message ID bc30483b-7f9b-df4e-7143-8646aeb4b5a2@I-love.SAKURA.ne.jp (mailing list archive)
State Mainlined, archived
Headers show
Series vfs: allow using kernel buffer during fiemap operation | expand

Commit Message

Tetsuo Handa April 17, 2023, 2:03 p.m. UTC
syzbot is reporting circular locking dependency between ntfs_file_mmap()
(which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency)
and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock =>
mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap
interface") implemented fiemap_fill_next_extent() using copy_to_user()
where direct mm->mmap_lock dependency is inevitable.

Since ntfs3 does not want to release ni->ni_lock and/or ni->file.run_lock
in order to make sure that "struct ATTRIB" does not change during
ioctl(FS_IOC_FIEMAP) request, let's make it possible to call
fiemap_fill_next_extent() with filesystem locks held.

This patch adds fiemap_fill_next_kernel_extent() which spools
"struct fiemap_extent" to dynamically allocated kernel buffer, and
fiemap_copy_kernel_extent() which copies spooled "struct fiemap_extent"
to userspace buffer after filesystem locks are released.

Reported-by: syzbot <syzbot+96cee7d33ca3f87eee86@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=96cee7d33ca3f87eee86
Reported-by: syzbot <syzbot+c300ab283ba3bc072439@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=c300ab283ba3bc072439
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 fs/ioctl.c             | 52 ++++++++++++++++++++++++++++++++++++------
 fs/ntfs3/file.c        |  4 ++++
 fs/ntfs3/frecord.c     | 10 ++++----
 include/linux/fiemap.h | 24 +++++++++++++++++--
 4 files changed, 76 insertions(+), 14 deletions(-)

Comments

Al Viro April 20, 2023, 9 p.m. UTC | #1
On Mon, Apr 17, 2023 at 11:03:26PM +0900, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency between ntfs_file_mmap()
> (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency)
> and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock =>
> mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap
> interface") implemented fiemap_fill_next_extent() using copy_to_user()
> where direct mm->mmap_lock dependency is inevitable.
> 
> Since ntfs3 does not want to release ni->ni_lock and/or ni->file.run_lock
> in order to make sure that "struct ATTRIB" does not change during
> ioctl(FS_IOC_FIEMAP) request, let's make it possible to call
> fiemap_fill_next_extent() with filesystem locks held.
> 
> This patch adds fiemap_fill_next_kernel_extent() which spools
> "struct fiemap_extent" to dynamically allocated kernel buffer, and
> fiemap_copy_kernel_extent() which copies spooled "struct fiemap_extent"
> to userspace buffer after filesystem locks are released.

Ugh...  I'm pretty certain that this is a wrong approach.  What is going
on in ntfs_file_mmap() that requires that kind of locking?

AFAICS, that's the part that looks odd...  Details, please.
Al Viro April 20, 2023, 9:11 p.m. UTC | #2
On Thu, Apr 20, 2023 at 10:00:45PM +0100, Al Viro wrote:
> On Mon, Apr 17, 2023 at 11:03:26PM +0900, Tetsuo Handa wrote:
> > syzbot is reporting circular locking dependency between ntfs_file_mmap()
> > (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency)
> > and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock =>
> > mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap
> > interface") implemented fiemap_fill_next_extent() using copy_to_user()
> > where direct mm->mmap_lock dependency is inevitable.
> > 
> > Since ntfs3 does not want to release ni->ni_lock and/or ni->file.run_lock
> > in order to make sure that "struct ATTRIB" does not change during
> > ioctl(FS_IOC_FIEMAP) request, let's make it possible to call
> > fiemap_fill_next_extent() with filesystem locks held.
> > 
> > This patch adds fiemap_fill_next_kernel_extent() which spools
> > "struct fiemap_extent" to dynamically allocated kernel buffer, and
> > fiemap_copy_kernel_extent() which copies spooled "struct fiemap_extent"
> > to userspace buffer after filesystem locks are released.
> 
> Ugh...  I'm pretty certain that this is a wrong approach.  What is going
> on in ntfs_file_mmap() that requires that kind of locking?
> 
> AFAICS, that's the part that looks odd...  Details, please.

                if (ni->i_valid < to) {
                        if (!inode_trylock(inode)) {
                                err = -EAGAIN;
                                goto out;
                        }
                        err = ntfs_extend_initialized_size(file, ni,
                                                           ni->i_valid, to);
                        inode_unlock(inode);
                        if (err)
                                goto out;
                }

See that inode_trylock() there?  That's another sign of the same
problem; it's just that their internal locks (taken by
ntfs_extend_initialized_size()) are taken without the same
kind of papering over the problem.

'to' here is guaranteed to be under the i_size_read(inode); is
that some kind of delayed allocation?  Or lazy extending
truncate(), perhaps?  I'm not familiar with ntfs codebase (or
ntfs layout, for that matter), so could somebody describe what
exactly is going on in that code?

Frankly, my impression is that this stuff is done in the wrong
place...
diff mbox series

Patch

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5b2481cd4750..60ddc2760932 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -112,11 +112,10 @@  static int ioctl_fibmap(struct file *filp, int __user *p)
 #define SET_UNKNOWN_FLAGS	(FIEMAP_EXTENT_DELALLOC)
 #define SET_NO_UNMOUNTED_IO_FLAGS	(FIEMAP_EXTENT_DATA_ENCRYPTED)
 #define SET_NOT_ALIGNED_FLAGS	(FIEMAP_EXTENT_DATA_TAIL|FIEMAP_EXTENT_DATA_INLINE)
-int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
-			    u64 phys, u64 len, u32 flags)
+int do_fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
+			       u64 phys, u64 len, u32 flags, bool is_kernel)
 {
 	struct fiemap_extent extent;
-	struct fiemap_extent __user *dest = fieinfo->fi_extents_start;
 
 	/* only count the extents */
 	if (fieinfo->fi_extents_max == 0) {
@@ -140,16 +139,55 @@  int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
 	extent.fe_length = len;
 	extent.fe_flags = flags;
 
-	dest += fieinfo->fi_extents_mapped;
-	if (copy_to_user(dest, &extent, sizeof(extent)))
-		return -EFAULT;
+	if (!is_kernel) {
+		struct fiemap_extent __user *dest = fieinfo->fi_extents_start;
+
+		dest += fieinfo->fi_extents_mapped;
+		if (copy_to_user(dest, &extent, sizeof(extent)))
+			return -EFAULT;
+	} else {
+		struct fiemap_extent_list *entry = kmalloc(sizeof(*entry), GFP_NOFS);
+
+		if (!entry)
+			return -ENOMEM;
+		memmove(&entry->extent, &extent, sizeof(extent));
+		list_add_tail(&entry->list, &fieinfo->fi_extents_list);
+	}
 
 	fieinfo->fi_extents_mapped++;
 	if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max)
 		return 1;
 	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
 }
-EXPORT_SYMBOL(fiemap_fill_next_extent);
+EXPORT_SYMBOL(do_fiemap_fill_next_extent);
+
+int fiemap_copy_kernel_extent(struct fiemap_extent_info *fieinfo, int err)
+{
+	struct fiemap_extent __user *dest;
+	struct fiemap_extent_list *entry, *tmp;
+	unsigned int len = 0;
+
+	list_for_each_entry(entry, &fieinfo->fi_extents_list, list)
+		len++;
+	if (!len)
+		return err;
+	fieinfo->fi_extents_mapped -= len;
+	dest = fieinfo->fi_extents_start + fieinfo->fi_extents_mapped;
+	list_for_each_entry(entry, &fieinfo->fi_extents_list, list) {
+		if (copy_to_user(dest, &entry->extent, sizeof(entry->extent))) {
+			err = -EFAULT;
+			break;
+		}
+		dest++;
+		fieinfo->fi_extents_mapped++;
+	}
+	list_for_each_entry_safe(entry, tmp, &fieinfo->fi_extents_list, list) {
+		list_del(&entry->list);
+		kfree(entry);
+	}
+	return err;
+}
+EXPORT_SYMBOL(fiemap_copy_kernel_extent);
 
 /**
  * fiemap_prep - check validity of requested flags for fiemap
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index e9bdc1ff08c9..1a3e28f71599 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -1145,12 +1145,16 @@  int ntfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	if (err)
 		return err;
 
+	INIT_LIST_HEAD(&fieinfo->fi_extents_list);
+
 	ni_lock(ni);
 
 	err = ni_fiemap(ni, fieinfo, start, len);
 
 	ni_unlock(ni);
 
+	err = fiemap_copy_kernel_extent(fieinfo, err);
+
 	return err;
 }
 
diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c
index f1df52dfab74..b70f9dfb71ab 100644
--- a/fs/ntfs3/frecord.c
+++ b/fs/ntfs3/frecord.c
@@ -1941,8 +1941,7 @@  int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
 	}
 
 	if (!attr || !attr->non_res) {
-		err = fiemap_fill_next_extent(
-			fieinfo, 0, 0,
+		err = fiemap_fill_next_kernel_extent(fieinfo, 0, 0,
 			attr ? le32_to_cpu(attr->res.data_size) : 0,
 			FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_LAST |
 				FIEMAP_EXTENT_MERGED);
@@ -2037,8 +2036,8 @@  int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
 			if (vbo + dlen >= end)
 				flags |= FIEMAP_EXTENT_LAST;
 
-			err = fiemap_fill_next_extent(fieinfo, vbo, lbo, dlen,
-						      flags);
+			err = fiemap_fill_next_kernel_extent(fieinfo, vbo, lbo,
+							     dlen, flags);
 			if (err < 0)
 				break;
 			if (err == 1) {
@@ -2058,7 +2057,8 @@  int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
 		if (vbo + bytes >= end)
 			flags |= FIEMAP_EXTENT_LAST;
 
-		err = fiemap_fill_next_extent(fieinfo, vbo, lbo, bytes, flags);
+		err = fiemap_fill_next_kernel_extent(fieinfo, vbo, lbo, bytes,
+						     flags);
 		if (err < 0)
 			break;
 		if (err == 1) {
diff --git a/include/linux/fiemap.h b/include/linux/fiemap.h
index c50882f19235..10cb33ed80a9 100644
--- a/include/linux/fiemap.h
+++ b/include/linux/fiemap.h
@@ -5,17 +5,37 @@ 
 #include <uapi/linux/fiemap.h>
 #include <linux/fs.h>
 
+struct fiemap_extent_list {
+	struct list_head list;
+	struct fiemap_extent extent;
+};
+
 struct fiemap_extent_info {
 	unsigned int fi_flags;		/* Flags as passed from user */
 	unsigned int fi_extents_mapped;	/* Number of mapped extents */
 	unsigned int fi_extents_max;	/* Size of fiemap_extent array */
 	struct fiemap_extent __user *fi_extents_start; /* Start of
 							fiemap_extent array */
+	struct list_head fi_extents_list; /* List of fiemap_extent_list */
 };
 
 int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 start, u64 *len, u32 supported_flags);
-int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
-			    u64 phys, u64 len, u32 flags);
+int do_fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
+			       u64 phys, u64 len, u32 flags, bool is_kernel);
+
+static inline int fiemap_fill_next_extent(struct fiemap_extent_info *info,
+					  u64 logical, u64 phys, u64 len, u32 flags)
+{
+	return do_fiemap_fill_next_extent(info, logical, phys, len, flags, false);
+}
+
+static inline int fiemap_fill_next_kernel_extent(struct fiemap_extent_info *info,
+						 u64 logical, u64 phys, u64 len, u32 flags)
+{
+	return do_fiemap_fill_next_extent(info, logical, phys, len, flags, true);
+}
+
+int fiemap_copy_kernel_extent(struct fiemap_extent_info *info, int err);
 
 #endif /* _LINUX_FIEMAP_H 1 */