Message ID | 23af8201db6ac2efdea94f09ab067d81ba5de7a7.1731684329.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fanotify: add pre-content hooks | expand |
On Fri 15-11-24 10:30:22, Josef Bacik wrote: > From: Amir Goldstein <amir73il@gmail.com> > > Generate FS_PRE_ACCESS event before truncate, without sb_writers held. > > Move the security hooks also before sb_start_write() to conform with > other security hooks (e.g. in write, fallocate). > > The event will have a range info of the page surrounding the new size > to provide an opportunity to fill the conetnt at the end of file before > truncating to non-page aligned size. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> I was thinking about this. One small issue is that similarly as the filesystems may do RMW of tail page during truncate, they will do RMW of head & tail pages on hole punch or zero range so we should have some strategically sprinkled fsnotify_truncate_perm() calls there as well. That's easy enough to fix. But there's another problem which I'm more worried about: If we have a file 64k large, user punches 12k..20k and then does read for 0..64k, then how does HSM daemon in userspace know what data to fill in? When we'll have modify pre-content event, daemon can watch it and since punch will send modify for 12k-20k, the daemon knows the local (empty) page cache is the source of truth. But without modify event this is just a recipe for data corruption AFAICT. So it seems the current setting with access pre-content event has only chance to work reliably in read-only mode? So we should probably refuse writeable open if file is being watched for pre-content events and similarly refuse truncate? Honza
On Wed, Nov 20, 2024 at 4:23 PM Jan Kara <jack@suse.cz> wrote: > > On Fri 15-11-24 10:30:22, Josef Bacik wrote: > > From: Amir Goldstein <amir73il@gmail.com> > > > > Generate FS_PRE_ACCESS event before truncate, without sb_writers held. > > > > Move the security hooks also before sb_start_write() to conform with > > other security hooks (e.g. in write, fallocate). > > > > The event will have a range info of the page surrounding the new size > > to provide an opportunity to fill the conetnt at the end of file before > > truncating to non-page aligned size. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > I was thinking about this. One small issue is that similarly as the > filesystems may do RMW of tail page during truncate, they will do RMW of > head & tail pages on hole punch or zero range so we should have some > strategically sprinkled fsnotify_truncate_perm() calls there as well. > That's easy enough to fix. fallocate already has fsnotify_file_area_perm() hook. What is missing? > > But there's another problem which I'm more worried about: If we have > a file 64k large, user punches 12k..20k and then does read for 0..64k, then > how does HSM daemon in userspace know what data to fill in? When we'll have > modify pre-content event, daemon can watch it and since punch will send modify > for 12k-20k, the daemon knows the local (empty) page cache is the source of > truth. But without modify event this is just a recipe for data corruption > AFAICT. > > So it seems the current setting with access pre-content event has only chance > to work reliably in read-only mode? So we should probably refuse writeable > open if file is being watched for pre-content events and similarly refuse > truncate? I am confused. not sure I understand the problem. In the events that you specific, punch hole WILL generate a FS_PRE_ACCESS event for 12k-20k. When HSM gets a FS_PRE_ACCESS event for 12k-20k it MUST fill the content and keep to itself that 12k-20k is the source of truth from now on. The extra FS_PRE_ACCESS event on 0..64k absolutely does not change that. IOW, a FS_PRE_ACCESS event on 0..64k definitely does NOT mean that HSM NEEDS to fill content in 0..64k, it just means that it MAY needs to fill content if it hasn't done that for a range before the event. To reiterate this important point, it is HSM responsibility to maintain the "content filled map" per file is its own way, under no circumstances it is assumed that fiemap or page cache state has anything to do with the "content filled map". The *only* thing that HSM can assume if that if its "content filled map" is empty for some range, then page cache is NOT yet populated in that range and that also relies on how HSM and mount are being initialized and exposed to users. Did I misunderstand your concern? Thanks, Amir.
On Wed 20-11-24 16:57:30, Amir Goldstein wrote: > On Wed, Nov 20, 2024 at 4:23 PM Jan Kara <jack@suse.cz> wrote: > > > > On Fri 15-11-24 10:30:22, Josef Bacik wrote: > > > From: Amir Goldstein <amir73il@gmail.com> > > > > > > Generate FS_PRE_ACCESS event before truncate, without sb_writers held. > > > > > > Move the security hooks also before sb_start_write() to conform with > > > other security hooks (e.g. in write, fallocate). > > > > > > The event will have a range info of the page surrounding the new size > > > to provide an opportunity to fill the conetnt at the end of file before > > > truncating to non-page aligned size. > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > I was thinking about this. One small issue is that similarly as the > > filesystems may do RMW of tail page during truncate, they will do RMW of > > head & tail pages on hole punch or zero range so we should have some > > strategically sprinkled fsnotify_truncate_perm() calls there as well. > > That's easy enough to fix. > > fallocate already has fsnotify_file_area_perm() hook. > What is missing? Sorry, I've missed that in the patch that was adding it. > > But there's another problem which I'm more worried about: If we have > > a file 64k large, user punches 12k..20k and then does read for 0..64k, then > > how does HSM daemon in userspace know what data to fill in? When we'll have > > modify pre-content event, daemon can watch it and since punch will send modify > > for 12k-20k, the daemon knows the local (empty) page cache is the source of > > truth. But without modify event this is just a recipe for data corruption > > AFAICT. > > > > So it seems the current setting with access pre-content event has only chance > > to work reliably in read-only mode? So we should probably refuse writeable > > open if file is being watched for pre-content events and similarly refuse > > truncate? > > I am confused. not sure I understand the problem. > > In the events that you specific, punch hole WILL generate a FS_PRE_ACCESS > event for 12k-20k. > > When HSM gets a FS_PRE_ACCESS event for 12k-20k it MUST fill the content > and keep to itself that 12k-20k is the source of truth from now on. Ah, right. I've got confused and didn't realize we'll be sending FS_PRE_ACCESS for 12k-20k. Thanks for clarification! > The extra FS_PRE_ACCESS event on 0..64k absolutely does not change that. > IOW, a FS_PRE_ACCESS event on 0..64k definitely does NOT mean that > HSM NEEDS to fill content in 0..64k, it just means that it MAY needs > to fill content > if it hasn't done that for a range before the event. > > To reiterate this important point, it is HSM responsibility to maintain the > "content filled map" per file is its own way, under no circumstances it is > assumed that fiemap or page cache state has anything to do with the > "content filled map". > > The *only* thing that HSM can assume if that if its "content filled map" > is empty for some range, then page cache is NOT yet populated in that > range and that also relies on how HSM and mount are being initialized > and exposed to users. OK, understood and makes sense. Honza
diff --git a/fs/open.c b/fs/open.c index 1a9483872e1f..d11d373dca80 100644 --- a/fs/open.c +++ b/fs/open.c @@ -81,14 +81,18 @@ long vfs_truncate(const struct path *path, loff_t length) if (!S_ISREG(inode->i_mode)) return -EINVAL; - error = mnt_want_write(path->mnt); - if (error) - goto out; - idmap = mnt_idmap(path->mnt); error = inode_permission(idmap, inode, MAY_WRITE); if (error) - goto mnt_drop_write_and_out; + return error; + + error = fsnotify_truncate_perm(path, length); + if (error) + return error; + + error = mnt_want_write(path->mnt); + if (error) + return error; error = -EPERM; if (IS_APPEND(inode)) @@ -114,7 +118,7 @@ long vfs_truncate(const struct path *path, loff_t length) put_write_access(inode); mnt_drop_write_and_out: mnt_drop_write(path->mnt); -out: + return error; } EXPORT_SYMBOL_GPL(vfs_truncate); @@ -175,11 +179,18 @@ long do_ftruncate(struct file *file, loff_t length, int small) /* Check IS_APPEND on real upper inode */ if (IS_APPEND(file_inode(file))) return -EPERM; - sb_start_write(inode->i_sb); + error = security_file_truncate(file); - if (!error) - error = do_truncate(file_mnt_idmap(file), dentry, length, - ATTR_MTIME | ATTR_CTIME, file); + if (error) + return error; + + error = fsnotify_truncate_perm(&file->f_path, length); + if (error) + return error; + + sb_start_write(inode->i_sb); + error = do_truncate(file_mnt_idmap(file), dentry, length, + ATTR_MTIME | ATTR_CTIME, file); sb_end_write(inode->i_sb); return error; diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index ce189b4778a5..08893429a818 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -217,6 +217,21 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask, return fsnotify_path(&file->f_path, FS_ACCESS_PERM); } +/* + * fsnotify_truncate_perm - permission hook before file truncate + */ +static inline int fsnotify_truncate_perm(const struct path *path, loff_t length) +{ + struct inode *inode = d_inode(path->dentry); + + if (!(inode->i_sb->s_iflags & SB_I_ALLOW_HSM) || + !fsnotify_sb_has_priority_watchers(inode->i_sb, + FSNOTIFY_PRIO_PRE_CONTENT)) + return 0; + + return fsnotify_pre_content(path, &length, 0); +} + /* * fsnotify_file_perm - permission hook before file access (unknown range) */ @@ -255,6 +270,11 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask, return 0; } +static inline int fsnotify_truncate_perm(const struct path *path, loff_t length) +{ + return 0; +} + static inline int fsnotify_file_perm(struct file *file, int perm_mask) { return 0;