Message ID | 1a378ca2df2ce30e5aecf7145223906a427d9037.1721931241.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fanotify: add pre-content hooks | expand |
On Thu 25-07-24 14:19:45, Josef Bacik wrote: > From: Amir Goldstein <amir73il@gmail.com> > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events. > > This information is meant to be used by hierarchical storage managers > that want to fill partial content of files on first access to range. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/notify/fanotify/fanotify.h | 8 +++++++ > fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++ > include/uapi/linux/fanotify.h | 7 ++++++ > 3 files changed, 53 insertions(+) > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > index 93598b7d5952..7f06355afa1f 100644 > --- a/fs/notify/fanotify/fanotify.h > +++ b/fs/notify/fanotify/fanotify.h > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask) > mask & FANOTIFY_PERM_EVENTS; > } > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event) > +{ > + if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS)) > + return false; > + > + return FANOTIFY_PERM(event)->ppos; > +} Now I'm a bit confused. Can we have legally NULL ppos for an event from FANOTIFY_PRE_CONTENT_EVENTS? > +static size_t copy_range_info_to_user(struct fanotify_event *event, > + char __user *buf, int count) > +{ > + struct fanotify_perm_event *pevent = FANOTIFY_PERM(event); > + struct fanotify_event_info_range info = { }; > + size_t info_len = FANOTIFY_RANGE_INFO_LEN; > + > + if (WARN_ON_ONCE(info_len > count)) > + return -EFAULT; > + > + if (WARN_ON_ONCE(!pevent->ppos)) > + return 0; I think we should be returning some error here. Maybe EINVAL? Otherwise fanotify_event_len() will return different length than we actually generate and that could lead to strange failures later. > + > + info.hdr.info_type = FAN_EVENT_INFO_TYPE_RANGE; > + info.hdr.len = info_len; > + info.offset = *(pevent->ppos); > + info.count = pevent->count; > + > + if (copy_to_user(buf, &info, info_len)) > + return -EFAULT; > + > + return info_len; > +} > + > static int copy_info_records_to_user(struct fanotify_event *event, > struct fanotify_info *info, > unsigned int info_mode, int pidfd, ... > @@ -191,6 +192,12 @@ struct fanotify_event_info_error { > __u32 error_count; > }; > > +struct fanotify_event_info_range { > + struct fanotify_event_info_header hdr; > + __u32 count; > + __u64 offset; > +}; Just for the sake of future-proofing, I'd make 'count' u64. I understand it means growing fanotify_event_info_range by 8 bytes and we currently never do reads or writes larger than 2 GB but I don't think saving bytes like this is really a good tradeoff these days... Honza
On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 25-07-24 14:19:45, Josef Bacik wrote: > > From: Amir Goldstein <amir73il@gmail.com> > > > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events. > > > > This information is meant to be used by hierarchical storage managers > > that want to fill partial content of files on first access to range. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > fs/notify/fanotify/fanotify.h | 8 +++++++ > > fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++ > > include/uapi/linux/fanotify.h | 7 ++++++ > > 3 files changed, 53 insertions(+) > > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > > index 93598b7d5952..7f06355afa1f 100644 > > --- a/fs/notify/fanotify/fanotify.h > > +++ b/fs/notify/fanotify/fanotify.h > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask) > > mask & FANOTIFY_PERM_EVENTS; > > } > > > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event) > > +{ > > + if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS)) > > + return false; > > + > > + return FANOTIFY_PERM(event)->ppos; > > +} > > Now I'm a bit confused. Can we have legally NULL ppos for an event from > FANOTIFY_PRE_CONTENT_EVENTS? > Sorry for the very late reply... The short answer is that NULL FANOTIFY_PERM(event)->ppos simply means that fanotify_alloc_perm_event() was called with NULL range, which is the very common case of legacy permission events. The long answer is a bit convoluted, so bare with me. The long answer is to the question whether fsnotify_file_range() can be called with a NULL ppos. This shouldn't be possible AFAIK for regular files and directories, unless some fs that is marked with FS_ALLOW_HSM opens a regular file with FMODE_STREAM, which should not be happening IMO, but then the assertion belongs inside fsnotify_file_range(). However, there was another way to get NULL ppos before I added the patch "fsnotify: generate pre-content permission event on open" Which made this "half intentional" change: static inline int fsnotify_file_perm(struct file *file, int perm_mask) { - return fsnotify_file_area_perm(file, perm_mask, NULL, 0); + return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0); } In order to implement: "The event will have a range info of (0..0) to provide an opportunity to fill the entire file content on open." The problem is that do_open() was not the only caller of fsnotify_file_perm(). There is another call from iterate_dir() and the change above causes FS_PRE_ACCESS events on readdir to report the directory f_pos - Do we want that? I think we do, but HSM should be able to tell the difference between opendir() and readdir(), because my HSM only wants to fill dir content on the latter. I think that we need to decide if we want to allow pre-content events with no range reported (e.g. for readdir()) or if pre-content events must report a range, can report (0..-1) or something for "entire range". Thanks, Amir.
On Thu 24-10-24 12:06:35, Amir Goldstein wrote: > On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 25-07-24 14:19:45, Josef Bacik wrote: > > > From: Amir Goldstein <amir73il@gmail.com> > > > > > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info > > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events. > > > > > > This information is meant to be used by hierarchical storage managers > > > that want to fill partial content of files on first access to range. > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > fs/notify/fanotify/fanotify.h | 8 +++++++ > > > fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++ > > > include/uapi/linux/fanotify.h | 7 ++++++ > > > 3 files changed, 53 insertions(+) > > > > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > > > index 93598b7d5952..7f06355afa1f 100644 > > > --- a/fs/notify/fanotify/fanotify.h > > > +++ b/fs/notify/fanotify/fanotify.h > > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask) > > > mask & FANOTIFY_PERM_EVENTS; > > > } > > > > > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event) > > > +{ > > > + if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS)) > > > + return false; > > > + > > > + return FANOTIFY_PERM(event)->ppos; > > > +} > > > > Now I'm a bit confused. Can we have legally NULL ppos for an event from > > FANOTIFY_PRE_CONTENT_EVENTS? > > > > Sorry for the very late reply... > > The short answer is that NULL FANOTIFY_PERM(event)->ppos > simply means that fanotify_alloc_perm_event() was called with NULL > range, which is the very common case of legacy permission events. > > The long answer is a bit convoluted, so bare with me. > The long answer is to the question whether fsnotify_file_range() can > be called with a NULL ppos. > > This shouldn't be possible AFAIK for regular files and directories, > unless some fs that is marked with FS_ALLOW_HSM opens a regular > file with FMODE_STREAM, which should not be happening IMO, > but then the assertion belongs inside fsnotify_file_range(). > > However, there was another way to get NULL ppos before I added the patch > "fsnotify: generate pre-content permission event on open" > > Which made this "half intentional" change: > static inline int fsnotify_file_perm(struct file *file, int perm_mask) > { > - return fsnotify_file_area_perm(file, perm_mask, NULL, 0); > + return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0); > } > > In order to implement: > "The event will have a range info of (0..0) to provide an opportunity > to fill the entire file content on open." > > The problem is that do_open() was not the only caller of fsnotify_file_perm(). > There is another call from iterate_dir() and the change above causes > FS_PRE_ACCESS events on readdir to report the directory f_pos - > Do we want that? I think we do, but HSM should be able to tell the > difference between opendir() and readdir(), because my HSM only > wants to fill dir content on the latter. Well, I'm not so sure we want to report fpos on opendir / readdir(). For directories fpos is an opaque cookie that is filesystem dependent and you are not even allowed to carry it from open to open. It is valid only within that one open-close session if I remember right. So userspace HSM cannot do much with it and in my opinion reporting it to userspace is a recipe for abuse... I'm undecided whether we want to allow pre-access events without range or enforce 0-0 range. I don't think there's a big practical difference. Honza
On Thu, Oct 24, 2024 at 6:35 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 24-10-24 12:06:35, Amir Goldstein wrote: > > On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote: > > > On Thu 25-07-24 14:19:45, Josef Bacik wrote: > > > > From: Amir Goldstein <amir73il@gmail.com> > > > > > > > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info > > > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events. > > > > > > > > This information is meant to be used by hierarchical storage managers > > > > that want to fill partial content of files on first access to range. > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > fs/notify/fanotify/fanotify.h | 8 +++++++ > > > > fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++ > > > > include/uapi/linux/fanotify.h | 7 ++++++ > > > > 3 files changed, 53 insertions(+) > > > > > > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > > > > index 93598b7d5952..7f06355afa1f 100644 > > > > --- a/fs/notify/fanotify/fanotify.h > > > > +++ b/fs/notify/fanotify/fanotify.h > > > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask) > > > > mask & FANOTIFY_PERM_EVENTS; > > > > } > > > > > > > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event) > > > > +{ > > > > + if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS)) > > > > + return false; > > > > + > > > > + return FANOTIFY_PERM(event)->ppos; > > > > +} > > > > > > Now I'm a bit confused. Can we have legally NULL ppos for an event from > > > FANOTIFY_PRE_CONTENT_EVENTS? > > > > > > > Sorry for the very late reply... > > > > The short answer is that NULL FANOTIFY_PERM(event)->ppos > > simply means that fanotify_alloc_perm_event() was called with NULL > > range, which is the very common case of legacy permission events. > > > > The long answer is a bit convoluted, so bare with me. > > The long answer is to the question whether fsnotify_file_range() can > > be called with a NULL ppos. > > > > This shouldn't be possible AFAIK for regular files and directories, > > unless some fs that is marked with FS_ALLOW_HSM opens a regular > > file with FMODE_STREAM, which should not be happening IMO, > > but then the assertion belongs inside fsnotify_file_range(). > > > > However, there was another way to get NULL ppos before I added the patch > > "fsnotify: generate pre-content permission event on open" > > > > Which made this "half intentional" change: > > static inline int fsnotify_file_perm(struct file *file, int perm_mask) > > { > > - return fsnotify_file_area_perm(file, perm_mask, NULL, 0); > > + return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0); > > } > > > > In order to implement: > > "The event will have a range info of (0..0) to provide an opportunity > > to fill the entire file content on open." > > > > The problem is that do_open() was not the only caller of fsnotify_file_perm(). > > There is another call from iterate_dir() and the change above causes > > FS_PRE_ACCESS events on readdir to report the directory f_pos - > > Do we want that? I think we do, but HSM should be able to tell the > > difference between opendir() and readdir(), because my HSM only > > wants to fill dir content on the latter. > > Well, I'm not so sure we want to report fpos on opendir / readdir(). For > directories fpos is an opaque cookie that is filesystem dependent and you > are not even allowed to carry it from open to open. It is valid only within > that one open-close session if I remember right. So userspace HSM cannot do > much with it and in my opinion reporting it to userspace is a recipe for > abuse... > > I'm undecided whether we want to allow pre-access events without range or > enforce 0-0 range. I don't think there's a big practical difference. > So there is a practical difference. My HSM wants to fill dir content on readdir() and not on opendir(). Other HSMs may want to fill dir content on opendir(). It could do that if opendir() (as does open()) reports range [0..0] and readdir() reports no range. I agree that it is somewhat ugly. I am willing to take other semantics as long as HSM can tell the difference between opendir() and readdir(). Thanks, Amir.
On Thu 24-10-24 18:49:02, Amir Goldstein wrote: > On Thu, Oct 24, 2024 at 6:35 PM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 24-10-24 12:06:35, Amir Goldstein wrote: > > > On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 25-07-24 14:19:45, Josef Bacik wrote: > > > > > From: Amir Goldstein <amir73il@gmail.com> > > > > > > > > > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info > > > > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events. > > > > > > > > > > This information is meant to be used by hierarchical storage managers > > > > > that want to fill partial content of files on first access to range. > > > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > --- > > > > > fs/notify/fanotify/fanotify.h | 8 +++++++ > > > > > fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++ > > > > > include/uapi/linux/fanotify.h | 7 ++++++ > > > > > 3 files changed, 53 insertions(+) > > > > > > > > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > > > > > index 93598b7d5952..7f06355afa1f 100644 > > > > > --- a/fs/notify/fanotify/fanotify.h > > > > > +++ b/fs/notify/fanotify/fanotify.h > > > > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask) > > > > > mask & FANOTIFY_PERM_EVENTS; > > > > > } > > > > > > > > > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event) > > > > > +{ > > > > > + if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS)) > > > > > + return false; > > > > > + > > > > > + return FANOTIFY_PERM(event)->ppos; > > > > > +} > > > > > > > > Now I'm a bit confused. Can we have legally NULL ppos for an event from > > > > FANOTIFY_PRE_CONTENT_EVENTS? > > > > > > > > > > Sorry for the very late reply... > > > > > > The short answer is that NULL FANOTIFY_PERM(event)->ppos > > > simply means that fanotify_alloc_perm_event() was called with NULL > > > range, which is the very common case of legacy permission events. > > > > > > The long answer is a bit convoluted, so bare with me. > > > The long answer is to the question whether fsnotify_file_range() can > > > be called with a NULL ppos. > > > > > > This shouldn't be possible AFAIK for regular files and directories, > > > unless some fs that is marked with FS_ALLOW_HSM opens a regular > > > file with FMODE_STREAM, which should not be happening IMO, > > > but then the assertion belongs inside fsnotify_file_range(). > > > > > > However, there was another way to get NULL ppos before I added the patch > > > "fsnotify: generate pre-content permission event on open" > > > > > > Which made this "half intentional" change: > > > static inline int fsnotify_file_perm(struct file *file, int perm_mask) > > > { > > > - return fsnotify_file_area_perm(file, perm_mask, NULL, 0); > > > + return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0); > > > } > > > > > > In order to implement: > > > "The event will have a range info of (0..0) to provide an opportunity > > > to fill the entire file content on open." > > > > > > The problem is that do_open() was not the only caller of fsnotify_file_perm(). > > > There is another call from iterate_dir() and the change above causes > > > FS_PRE_ACCESS events on readdir to report the directory f_pos - > > > Do we want that? I think we do, but HSM should be able to tell the > > > difference between opendir() and readdir(), because my HSM only > > > wants to fill dir content on the latter. > > > > Well, I'm not so sure we want to report fpos on opendir / readdir(). For > > directories fpos is an opaque cookie that is filesystem dependent and you > > are not even allowed to carry it from open to open. It is valid only within > > that one open-close session if I remember right. So userspace HSM cannot do > > much with it and in my opinion reporting it to userspace is a recipe for > > abuse... > > > > I'm undecided whether we want to allow pre-access events without range or > > enforce 0-0 range. I don't think there's a big practical difference. > > > > So there is a practical difference. > My HSM wants to fill dir content on readdir() and not on opendir(). > Other HSMs may want to fill dir content on opendir(). > It could do that if opendir() (as does open()) reports range [0..0] > and readdir() reports no range. OK, this looks reasonably consistent so ack from me. Honza
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 93598b7d5952..7f06355afa1f 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask) mask & FANOTIFY_PERM_EVENTS; } +static inline bool fanotify_event_has_access_range(struct fanotify_event *event) +{ + if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS)) + return false; + + return FANOTIFY_PERM(event)->ppos; +} + static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse) { return container_of(fse, struct fanotify_event, fse); diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 5ece186d5c50..c3c8b2ea80b6 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -123,6 +123,8 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init; sizeof(struct fanotify_event_info_pidfd) #define FANOTIFY_ERROR_INFO_LEN \ (sizeof(struct fanotify_event_info_error)) +#define FANOTIFY_RANGE_INFO_LEN \ + (sizeof(struct fanotify_event_info_range)) static int fanotify_fid_info_len(int fh_len, int name_len) { @@ -182,6 +184,9 @@ static size_t fanotify_event_len(unsigned int info_mode, if (info_mode & FAN_REPORT_PIDFD) event_len += FANOTIFY_PIDFD_INFO_LEN; + if (fanotify_event_has_access_range(event)) + event_len += FANOTIFY_RANGE_INFO_LEN; + return event_len; } @@ -526,6 +531,30 @@ static int copy_pidfd_info_to_user(int pidfd, return info_len; } +static size_t copy_range_info_to_user(struct fanotify_event *event, + char __user *buf, int count) +{ + struct fanotify_perm_event *pevent = FANOTIFY_PERM(event); + struct fanotify_event_info_range info = { }; + size_t info_len = FANOTIFY_RANGE_INFO_LEN; + + if (WARN_ON_ONCE(info_len > count)) + return -EFAULT; + + if (WARN_ON_ONCE(!pevent->ppos)) + return 0; + + info.hdr.info_type = FAN_EVENT_INFO_TYPE_RANGE; + info.hdr.len = info_len; + info.offset = *(pevent->ppos); + info.count = pevent->count; + + if (copy_to_user(buf, &info, info_len)) + return -EFAULT; + + return info_len; +} + static int copy_info_records_to_user(struct fanotify_event *event, struct fanotify_info *info, unsigned int info_mode, int pidfd, @@ -647,6 +676,15 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; } + if (fanotify_event_has_access_range(event)) { + ret = copy_range_info_to_user(event, buf, count); + if (ret < 0) + return ret; + buf += ret; + count -= ret; + total_bytes += ret; + } + return total_bytes; } diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index c8dacedf73b9..7c92d0f6bf71 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -145,6 +145,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_DFID 3 #define FAN_EVENT_INFO_TYPE_PIDFD 4 #define FAN_EVENT_INFO_TYPE_ERROR 5 +#define FAN_EVENT_INFO_TYPE_RANGE 6 /* Special info types for FAN_RENAME */ #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10 @@ -191,6 +192,12 @@ struct fanotify_event_info_error { __u32 error_count; }; +struct fanotify_event_info_range { + struct fanotify_event_info_header hdr; + __u32 count; + __u64 offset; +}; + /* * User space may need to record additional information about its decision. * The extra information type records what kind of information is included.