diff mbox series

[v8,09/19] fsnotify: generate pre-content permission event on truncate

Message ID 23af8201db6ac2efdea94f09ab067d81ba5de7a7.1731684329.git.josef@toxicpanda.com (mailing list archive)
State New
Headers show
Series fanotify: add pre-content hooks | expand

Commit Message

Josef Bacik Nov. 15, 2024, 3:30 p.m. UTC
From: Amir Goldstein <amir73il@gmail.com>

Generate FS_PRE_ACCESS event before truncate, without sb_writers held.

Move the security hooks also before sb_start_write() to conform with
other security hooks (e.g. in write, fallocate).

The event will have a range info of the page surrounding the new size
to provide an opportunity to fill the conetnt at the end of file before
truncating to non-page aligned size.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/open.c                | 31 +++++++++++++++++++++----------
 include/linux/fsnotify.h | 20 ++++++++++++++++++++
 2 files changed, 41 insertions(+), 10 deletions(-)

Comments

Jan Kara Nov. 20, 2024, 3:23 p.m. UTC | #1
On Fri 15-11-24 10:30:22, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> Generate FS_PRE_ACCESS event before truncate, without sb_writers held.
> 
> Move the security hooks also before sb_start_write() to conform with
> other security hooks (e.g. in write, fallocate).
> 
> The event will have a range info of the page surrounding the new size
> to provide an opportunity to fill the conetnt at the end of file before
> truncating to non-page aligned size.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

I was thinking about this. One small issue is that similarly as the
filesystems may do RMW of tail page during truncate, they will do RMW of
head & tail pages on hole punch or zero range so we should have some
strategically sprinkled fsnotify_truncate_perm() calls there as well.
That's easy enough to fix.

But there's another problem which I'm more worried about: If we have
a file 64k large, user punches 12k..20k and then does read for 0..64k, then
how does HSM daemon in userspace know what data to fill in? When we'll have
modify pre-content event, daemon can watch it and since punch will send modify
for 12k-20k, the daemon knows the local (empty) page cache is the source of
truth. But without modify event this is just a recipe for data corruption
AFAICT.

So it seems the current setting with access pre-content event has only chance
to work reliably in read-only mode? So we should probably refuse writeable
open if file is being watched for pre-content events and similarly refuse
truncate?

								Honza
Amir Goldstein Nov. 20, 2024, 3:57 p.m. UTC | #2
On Wed, Nov 20, 2024 at 4:23 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 15-11-24 10:30:22, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > Generate FS_PRE_ACCESS event before truncate, without sb_writers held.
> >
> > Move the security hooks also before sb_start_write() to conform with
> > other security hooks (e.g. in write, fallocate).
> >
> > The event will have a range info of the page surrounding the new size
> > to provide an opportunity to fill the conetnt at the end of file before
> > truncating to non-page aligned size.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> I was thinking about this. One small issue is that similarly as the
> filesystems may do RMW of tail page during truncate, they will do RMW of
> head & tail pages on hole punch or zero range so we should have some
> strategically sprinkled fsnotify_truncate_perm() calls there as well.
> That's easy enough to fix.

fallocate already has fsnotify_file_area_perm() hook.
What is missing?

>
> But there's another problem which I'm more worried about: If we have
> a file 64k large, user punches 12k..20k and then does read for 0..64k, then
> how does HSM daemon in userspace know what data to fill in? When we'll have
> modify pre-content event, daemon can watch it and since punch will send modify
> for 12k-20k, the daemon knows the local (empty) page cache is the source of
> truth. But without modify event this is just a recipe for data corruption
> AFAICT.
>
> So it seems the current setting with access pre-content event has only chance
> to work reliably in read-only mode? So we should probably refuse writeable
> open if file is being watched for pre-content events and similarly refuse
> truncate?

I am confused. not sure I understand the problem.

In the events that you specific, punch hole WILL generate a FS_PRE_ACCESS
event for 12k-20k.

When HSM gets a FS_PRE_ACCESS event for 12k-20k it MUST fill the content
and keep to itself that 12k-20k is the source of truth from now on.

The extra FS_PRE_ACCESS event on 0..64k absolutely does not change that.
IOW, a FS_PRE_ACCESS event on 0..64k definitely does NOT mean that
HSM NEEDS to fill content in 0..64k, it just means that it MAY needs
to fill content
if it hasn't done that for a range before the event.

To reiterate this important point, it is HSM responsibility to maintain the
 "content filled map" per file is its own way, under no circumstances it is
assumed that fiemap or page cache state has anything to do with the
"content filled map".

The *only* thing that HSM can assume if that if its "content filled map"
is empty for some range, then page cache is NOT yet populated in that
range and that also relies on how HSM and mount are being initialized
and exposed to users.

Did I misunderstand your concern?

Thanks,
Amir.
Jan Kara Nov. 20, 2024, 4:16 p.m. UTC | #3
On Wed 20-11-24 16:57:30, Amir Goldstein wrote:
> On Wed, Nov 20, 2024 at 4:23 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 15-11-24 10:30:22, Josef Bacik wrote:
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > Generate FS_PRE_ACCESS event before truncate, without sb_writers held.
> > >
> > > Move the security hooks also before sb_start_write() to conform with
> > > other security hooks (e.g. in write, fallocate).
> > >
> > > The event will have a range info of the page surrounding the new size
> > > to provide an opportunity to fill the conetnt at the end of file before
> > > truncating to non-page aligned size.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > I was thinking about this. One small issue is that similarly as the
> > filesystems may do RMW of tail page during truncate, they will do RMW of
> > head & tail pages on hole punch or zero range so we should have some
> > strategically sprinkled fsnotify_truncate_perm() calls there as well.
> > That's easy enough to fix.
> 
> fallocate already has fsnotify_file_area_perm() hook.
> What is missing?

Sorry, I've missed that in the patch that was adding it.

> > But there's another problem which I'm more worried about: If we have
> > a file 64k large, user punches 12k..20k and then does read for 0..64k, then
> > how does HSM daemon in userspace know what data to fill in? When we'll have
> > modify pre-content event, daemon can watch it and since punch will send modify
> > for 12k-20k, the daemon knows the local (empty) page cache is the source of
> > truth. But without modify event this is just a recipe for data corruption
> > AFAICT.
> >
> > So it seems the current setting with access pre-content event has only chance
> > to work reliably in read-only mode? So we should probably refuse writeable
> > open if file is being watched for pre-content events and similarly refuse
> > truncate?
> 
> I am confused. not sure I understand the problem.
> 
> In the events that you specific, punch hole WILL generate a FS_PRE_ACCESS
> event for 12k-20k.
> 
> When HSM gets a FS_PRE_ACCESS event for 12k-20k it MUST fill the content
> and keep to itself that 12k-20k is the source of truth from now on.

Ah, right. I've got confused and didn't realize we'll be sending FS_PRE_ACCESS
for 12k-20k. Thanks for clarification!

> The extra FS_PRE_ACCESS event on 0..64k absolutely does not change that.
> IOW, a FS_PRE_ACCESS event on 0..64k definitely does NOT mean that
> HSM NEEDS to fill content in 0..64k, it just means that it MAY needs
> to fill content
> if it hasn't done that for a range before the event.
> 
> To reiterate this important point, it is HSM responsibility to maintain the
>  "content filled map" per file is its own way, under no circumstances it is
> assumed that fiemap or page cache state has anything to do with the
> "content filled map".
> 
> The *only* thing that HSM can assume if that if its "content filled map"
> is empty for some range, then page cache is NOT yet populated in that
> range and that also relies on how HSM and mount are being initialized
> and exposed to users.

OK, understood and makes sense.

								Honza
diff mbox series

Patch

diff --git a/fs/open.c b/fs/open.c
index 1a9483872e1f..d11d373dca80 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -81,14 +81,18 @@  long vfs_truncate(const struct path *path, loff_t length)
 	if (!S_ISREG(inode->i_mode))
 		return -EINVAL;
 
-	error = mnt_want_write(path->mnt);
-	if (error)
-		goto out;
-
 	idmap = mnt_idmap(path->mnt);
 	error = inode_permission(idmap, inode, MAY_WRITE);
 	if (error)
-		goto mnt_drop_write_and_out;
+		return error;
+
+	error = fsnotify_truncate_perm(path, length);
+	if (error)
+		return error;
+
+	error = mnt_want_write(path->mnt);
+	if (error)
+		return error;
 
 	error = -EPERM;
 	if (IS_APPEND(inode))
@@ -114,7 +118,7 @@  long vfs_truncate(const struct path *path, loff_t length)
 	put_write_access(inode);
 mnt_drop_write_and_out:
 	mnt_drop_write(path->mnt);
-out:
+
 	return error;
 }
 EXPORT_SYMBOL_GPL(vfs_truncate);
@@ -175,11 +179,18 @@  long do_ftruncate(struct file *file, loff_t length, int small)
 	/* Check IS_APPEND on real upper inode */
 	if (IS_APPEND(file_inode(file)))
 		return -EPERM;
-	sb_start_write(inode->i_sb);
+
 	error = security_file_truncate(file);
-	if (!error)
-		error = do_truncate(file_mnt_idmap(file), dentry, length,
-				    ATTR_MTIME | ATTR_CTIME, file);
+	if (error)
+		return error;
+
+	error = fsnotify_truncate_perm(&file->f_path, length);
+	if (error)
+		return error;
+
+	sb_start_write(inode->i_sb);
+	error = do_truncate(file_mnt_idmap(file), dentry, length,
+			    ATTR_MTIME | ATTR_CTIME, file);
 	sb_end_write(inode->i_sb);
 
 	return error;
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index ce189b4778a5..08893429a818 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -217,6 +217,21 @@  static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 	return fsnotify_path(&file->f_path, FS_ACCESS_PERM);
 }
 
+/*
+ * fsnotify_truncate_perm - permission hook before file truncate
+ */
+static inline int fsnotify_truncate_perm(const struct path *path, loff_t length)
+{
+	struct inode *inode = d_inode(path->dentry);
+
+	if (!(inode->i_sb->s_iflags & SB_I_ALLOW_HSM) ||
+	    !fsnotify_sb_has_priority_watchers(inode->i_sb,
+					       FSNOTIFY_PRIO_PRE_CONTENT))
+		return 0;
+
+	return fsnotify_pre_content(path, &length, 0);
+}
+
 /*
  * fsnotify_file_perm - permission hook before file access (unknown range)
  */
@@ -255,6 +270,11 @@  static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 	return 0;
 }
 
+static inline int fsnotify_truncate_perm(const struct path *path, loff_t length)
+{
+	return 0;
+}
+
 static inline int fsnotify_file_perm(struct file *file, int perm_mask)
 {
 	return 0;