Message ID | 20231207123825.4011620-4-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Prepare for fsnotify pre-content permission events | expand |
On Thu 07-12-23 14:38:24, Amir Goldstein wrote: > filesystem may be modified in the context of fanotify permission events > (e.g. by HSM service), so assert that sb freeze protection is not held. > > If the assertion fails, then the following deadlock would be possible: > > CPU0 CPU1 CPU2 > ------------------------------------------------------------------------- > file_start_write()#0 > ... > fsnotify_perm() > fanotify_get_response() => (read event and fill file) > ... > ... freeze_super() > ... sb_wait_write() > ... > vfs_write() > file_start_write()#1 > > This example demonstrates a use case of an hierarchical storage management > (HSM) service that uses fanotify permission events to fill the content of > a file before access, while a 3rd process starts fsfreeze. > > This creates a circular dependeny: > file_start_write()#0 => fanotify_get_response => > file_start_write()#1 => > sb_wait_write() => > file_end_write()#0 > > Where file_end_write()#0 can never be called and none of the threads can > make progress. > > The assertion is checked for both MAY_READ and MAY_WRITE permission > hooks in preparation for a pre-modify permission event. > > The assertion is not checked for an open permission event, because > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not > safe to write to filesystem in the content of an open permission event. ^^^^^ context BTW, isn't this a bit inconvenient? I mean filling file contents on open looks quite natural... Do you plan to fill files only on individual read / write events? I was under the impression simple HSM handlers would be doing it on open. > Signed-off-by: Amir Goldstein <amir73il@gmail.com> Anyway this particular change looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > include/linux/fsnotify.h | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h > index 926bb4461b9e..0a9d6a8a747a 100644 > --- a/include/linux/fsnotify.h > +++ b/include/linux/fsnotify.h > @@ -107,6 +107,13 @@ static inline int fsnotify_file_perm(struct file *file, int perm_mask) > { > __u32 fsnotify_mask = FS_ACCESS_PERM; > > + /* > + * filesystem may be modified in the context of permission events > + * (e.g. by HSM filling a file on access), so sb freeze protection > + * must not be held. > + */ > + lockdep_assert_once(file_write_not_started(file)); > + > if (!(perm_mask & MAY_READ)) > return 0; > > -- > 2.34.1 >
On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 07-12-23 14:38:24, Amir Goldstein wrote: > > filesystem may be modified in the context of fanotify permission events > > (e.g. by HSM service), so assert that sb freeze protection is not held. > > > > If the assertion fails, then the following deadlock would be possible: > > > > CPU0 CPU1 CPU2 > > ------------------------------------------------------------------------- > > file_start_write()#0 > > ... > > fsnotify_perm() > > fanotify_get_response() => (read event and fill file) > > ... > > ... freeze_super() > > ... sb_wait_write() > > ... > > vfs_write() > > file_start_write()#1 > > > > This example demonstrates a use case of an hierarchical storage management > > (HSM) service that uses fanotify permission events to fill the content of > > a file before access, while a 3rd process starts fsfreeze. > > > > This creates a circular dependeny: > > file_start_write()#0 => fanotify_get_response => > > file_start_write()#1 => > > sb_wait_write() => > > file_end_write()#0 > > > > Where file_end_write()#0 can never be called and none of the threads can > > make progress. > > > > The assertion is checked for both MAY_READ and MAY_WRITE permission > > hooks in preparation for a pre-modify permission event. > > > > The assertion is not checked for an open permission event, because > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not > > safe to write to filesystem in the content of an open permission event. > ^^^^^ context > > BTW, isn't this a bit inconvenient? I mean filling file contents on open > looks quite natural... Do you plan to fill files only on individual read / > write events? I was under the impression simple HSM handlers would be doing > it on open. > Naive HSMs perhaps... The problem with filling on open is that the pattern open();fstat();close() is quite common with applications and we found open() to be a sub-optimal predicate for near future read(). Filling the file on first read() access or directory on first readdir() access does a better job in "swapping in" the correct files. A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event. that's not any more or less simple than filling it on an OPEN_PERM event. Another point that could get lost when reading to above deadlock is that filling the file content before open(O_TRUNC) would be really dumb, because swap in is costly and you are going to throw away the data. If we really wanted to provide HSM with a safe way to fill files on open, we would probably need to report the open flags with the open event. I actually think that reporting the open flags would be nice even with an async open event. Thanks, Amir.
On Fri 08-12-23 23:02:35, Amir Goldstein wrote: > On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 07-12-23 14:38:24, Amir Goldstein wrote: > > > filesystem may be modified in the context of fanotify permission events > > > (e.g. by HSM service), so assert that sb freeze protection is not held. > > > > > > If the assertion fails, then the following deadlock would be possible: > > > > > > CPU0 CPU1 CPU2 > > > ------------------------------------------------------------------------- > > > file_start_write()#0 > > > ... > > > fsnotify_perm() > > > fanotify_get_response() => (read event and fill file) > > > ... > > > ... freeze_super() > > > ... sb_wait_write() > > > ... > > > vfs_write() > > > file_start_write()#1 > > > > > > This example demonstrates a use case of an hierarchical storage management > > > (HSM) service that uses fanotify permission events to fill the content of > > > a file before access, while a 3rd process starts fsfreeze. > > > > > > This creates a circular dependeny: > > > file_start_write()#0 => fanotify_get_response => > > > file_start_write()#1 => > > > sb_wait_write() => > > > file_end_write()#0 > > > > > > Where file_end_write()#0 can never be called and none of the threads can > > > make progress. > > > > > > The assertion is checked for both MAY_READ and MAY_WRITE permission > > > hooks in preparation for a pre-modify permission event. > > > > > > The assertion is not checked for an open permission event, because > > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not > > > safe to write to filesystem in the content of an open permission event. > > ^^^^^ context > > > > BTW, isn't this a bit inconvenient? I mean filling file contents on open > > looks quite natural... Do you plan to fill files only on individual read / > > write events? I was under the impression simple HSM handlers would be doing > > it on open. > > > > Naive HSMs perhaps... The problem with filling on open is that the pattern > open();fstat();close() is quite common with applications and we found open() > to be a sub-optimal predicate for near future read(). > > Filling the file on first read() access or directory on first > readdir() access does > a better job in "swapping in" the correct files. > A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event. > that's not any more or less simple than filling it on an OPEN_PERM event. > > Another point that could get lost when reading to above deadlock is that > filling the file content before open(O_TRUNC) would be really dumb, > because swap in is costly and you are going to throw away the data. > If we really wanted to provide HSM with a safe way to fill files on open, > we would probably need to report the open flags with the open event. > I actually think that reporting the open flags would be nice even with > an async open event. OK, thanks for explanation! Honza
On Mon, Dec 11, 2023 at 12:30 PM Jan Kara <jack@suse.cz> wrote: > > On Fri 08-12-23 23:02:35, Amir Goldstein wrote: > > On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@suse.cz> wrote: > > > > > > On Thu 07-12-23 14:38:24, Amir Goldstein wrote: > > > > filesystem may be modified in the context of fanotify permission events > > > > (e.g. by HSM service), so assert that sb freeze protection is not held. > > > > > > > > If the assertion fails, then the following deadlock would be possible: > > > > > > > > CPU0 CPU1 CPU2 > > > > ------------------------------------------------------------------------- > > > > file_start_write()#0 > > > > ... > > > > fsnotify_perm() > > > > fanotify_get_response() => (read event and fill file) > > > > ... > > > > ... freeze_super() > > > > ... sb_wait_write() > > > > ... > > > > vfs_write() > > > > file_start_write()#1 > > > > > > > > This example demonstrates a use case of an hierarchical storage management > > > > (HSM) service that uses fanotify permission events to fill the content of > > > > a file before access, while a 3rd process starts fsfreeze. > > > > > > > > This creates a circular dependeny: > > > > file_start_write()#0 => fanotify_get_response => > > > > file_start_write()#1 => > > > > sb_wait_write() => > > > > file_end_write()#0 > > > > > > > > Where file_end_write()#0 can never be called and none of the threads can > > > > make progress. > > > > > > > > The assertion is checked for both MAY_READ and MAY_WRITE permission > > > > hooks in preparation for a pre-modify permission event. > > > > > > > > The assertion is not checked for an open permission event, because > > > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not > > > > safe to write to filesystem in the content of an open permission event. > > > ^^^^^ context > > > > > > BTW, isn't this a bit inconvenient? I mean filling file contents on open > > > looks quite natural... Do you plan to fill files only on individual read / > > > write events? I was under the impression simple HSM handlers would be doing > > > it on open. > > > > > > > Naive HSMs perhaps... The problem with filling on open is that the pattern > > open();fstat();close() is quite common with applications and we found open() > > to be a sub-optimal predicate for near future read(). > > > > Filling the file on first read() access or directory on first > > readdir() access does > > a better job in "swapping in" the correct files. > > A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event. > > that's not any more or less simple than filling it on an OPEN_PERM event. > > > > Another point that could get lost when reading to above deadlock is that > > filling the file content before open(O_TRUNC) would be really dumb, > > because swap in is costly and you are going to throw away the data. > > If we really wanted to provide HSM with a safe way to fill files on open, > > we would probably need to report the open flags with the open event. > > I actually think that reporting the open flags would be nice even with > > an async open event. > > OK, thanks for explanation! Anyway, I will try to find a solution also for the OPEN_PERM deadlock. I have some sketches, but ->atomic_open() complicates things... Thanks, Amir.
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 926bb4461b9e..0a9d6a8a747a 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -107,6 +107,13 @@ static inline int fsnotify_file_perm(struct file *file, int perm_mask) { __u32 fsnotify_mask = FS_ACCESS_PERM; + /* + * filesystem may be modified in the context of permission events + * (e.g. by HSM filling a file on access), so sb freeze protection + * must not be held. + */ + lockdep_assert_once(file_write_not_started(file)); + if (!(perm_mask & MAY_READ)) return 0;
filesystem may be modified in the context of fanotify permission events (e.g. by HSM service), so assert that sb freeze protection is not held. If the assertion fails, then the following deadlock would be possible: CPU0 CPU1 CPU2 ------------------------------------------------------------------------- file_start_write()#0 ... fsnotify_perm() fanotify_get_response() => (read event and fill file) ... ... freeze_super() ... sb_wait_write() ... vfs_write() file_start_write()#1 This example demonstrates a use case of an hierarchical storage management (HSM) service that uses fanotify permission events to fill the content of a file before access, while a 3rd process starts fsfreeze. This creates a circular dependeny: file_start_write()#0 => fanotify_get_response => file_start_write()#1 => sb_wait_write() => file_end_write()#0 Where file_end_write()#0 can never be called and none of the threads can make progress. The assertion is checked for both MAY_READ and MAY_WRITE permission hooks in preparation for a pre-modify permission event. The assertion is not checked for an open permission event, because do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not safe to write to filesystem in the content of an open permission event. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- include/linux/fsnotify.h | 7 +++++++ 1 file changed, 7 insertions(+)