Message ID | 20250217133228.24405-3-luis@igalia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fuse: allow notify_inval for all inodes | expand |
On Mon, Feb 17, 2025 at 01:32:28PM +0000, Luis Henriques wrote: > Currently userspace is able to notify the kernel to invalidate the cache > for an inode. This means that, if all the inodes in a filesystem need to > be invalidated, then userspace needs to iterate through all of them and do > this kernel notification separately. > > This patch adds a new option that allows userspace to invalidate all the > inodes with a single notification operation. In addition to invalidate > all the inodes, it also shrinks the sb dcache. You still haven't justified why we should be exposing this functionality in a low level filesystem ioctl out of sight of the VFS. User driven VFS cache invalidation has long been considered a DOS-in-waiting, hence we don't allow user APIs to invalidate caches like this. This is one of the reasons that /proc/sys/vm/drop_caches requires root access - it's system debug and problem triage functionality, not a production system interface.... Every other situation where filesystems invalidate vfs caches is during mount, remount or unmount operations. Without actually explaining how this functionality is controlled and how user abuse is not possible (e.g. explain the permission model and/or how only root can run this operation), it is not really possible to determine whether we should unconditional allow VFS cache invalidation outside of the existing operation scope.... FInally, given that the VFS can only do best-effort invalidation and won't provide FUSE (or any other filesystem) with any cache invalidation guarantees outside of specific mount and unmount contexts, I'm not convinced that this is actually worth anything... -Dave.
On Mon, 17 Feb 2025 at 14:32, Luis Henriques <luis@igalia.com> wrote: > > Currently userspace is able to notify the kernel to invalidate the cache > for an inode. This means that, if all the inodes in a filesystem need to > be invalidated, then userspace needs to iterate through all of them and do > this kernel notification separately. > > This patch adds a new option that allows userspace to invalidate all the > inodes with a single notification operation. In addition to invalidate > all the inodes, it also shrinks the sb dcache. What is your use case for dropping all inodes? There's a reason for doing cache shrinking that I know of, and that's about resources held by the fuse server (e.g. open files) for each cached inode. In that case what's really needed is something that tells the fuse client (the kernel or in case of virtiofs, the guest kernel) to get rid of the N least recently used inodes. > Signed-off-by: Luis Henriques <luis@igalia.com> > --- > fs/fuse/inode.c | 34 ++++++++++++++++++++++++++++++++++ > include/uapi/linux/fuse.h | 3 +++ > 2 files changed, 37 insertions(+) > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index e9db2cb8c150..64fa0806e97d 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -547,6 +547,37 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, > return NULL; > } > > +static int fuse_reverse_inval_all(struct fuse_conn *fc) > +{ > + struct fuse_mount *fm; > + struct inode *inode; > + > + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); > + if (!inode || !fm) > + return -ENOENT; > + iput(inode); > + > + /* Remove all possible active references to cached inodes */ > + shrink_dcache_sb(fm->sb); > + > + /* Remove all unreferenced inodes from cache */ > + invalidate_inodes(fm->sb); After a dcache shrink, this is unnecessary. See " .drop_inode = generic_delete_inode," in fs/fuse/inode.c Thanks, Miklos
On Tue, 18 Feb 2025 at 01:55, Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Feb 17, 2025 at 01:32:28PM +0000, Luis Henriques wrote: > > Currently userspace is able to notify the kernel to invalidate the cache > > for an inode. This means that, if all the inodes in a filesystem need to > > be invalidated, then userspace needs to iterate through all of them and do > > this kernel notification separately. > > > > This patch adds a new option that allows userspace to invalidate all the > > inodes with a single notification operation. In addition to invalidate > > all the inodes, it also shrinks the sb dcache. > > You still haven't justified why we should be exposing this > functionality in a low level filesystem ioctl out of sight of the > VFS. > > User driven VFS cache invalidation has long been considered a > DOS-in-waiting, hence we don't allow user APIs to invalidate caches > like this. This is one of the reasons that /proc/sys/vm/drop_caches > requires root access - it's system debug and problem triage > functionality, not a production system interface.... > > Every other situation where filesystems invalidate vfs caches is > during mount, remount or unmount operations. Without actually > explaining how this functionality is controlled and how user abuse > is not possible (e.g. explain the permission model and/or how only > root can run this operation), it is not really possible to determine > whether we should unconditional allow VFS cache invalidation outside > of the existing operation scope.... I think you are grabbing the wrong end of the stick here. This is not about an arbitrary user being able to control caching behavior of a fuse filesystem. It's about the filesystem itself being able to control caching behavior. I'm not arguing for the validity of this particular patch, just saying that something like this could be valid. And as explained in my other reply there's actually a real problem out there waiting for a solution. Thanks, Miklos > > FInally, given that the VFS can only do best-effort invalidation > and won't provide FUSE (or any other filesystem) with any cache > invalidation guarantees outside of specific mount and unmount > contexts, I'm not convinced that this is actually worth anything... > > -Dave. > -- > Dave Chinner > david@fromorbit.com
On Tue, Feb 18 2025, Miklos Szeredi wrote: > On Tue, 18 Feb 2025 at 01:55, Dave Chinner <david@fromorbit.com> wrote: >> >> On Mon, Feb 17, 2025 at 01:32:28PM +0000, Luis Henriques wrote: >> > Currently userspace is able to notify the kernel to invalidate the cache >> > for an inode. This means that, if all the inodes in a filesystem need to >> > be invalidated, then userspace needs to iterate through all of them and do >> > this kernel notification separately. >> > >> > This patch adds a new option that allows userspace to invalidate all the >> > inodes with a single notification operation. In addition to invalidate >> > all the inodes, it also shrinks the sb dcache. >> >> You still haven't justified why we should be exposing this >> functionality in a low level filesystem ioctl out of sight of the >> VFS. >> >> User driven VFS cache invalidation has long been considered a >> DOS-in-waiting, hence we don't allow user APIs to invalidate caches >> like this. This is one of the reasons that /proc/sys/vm/drop_caches >> requires root access - it's system debug and problem triage >> functionality, not a production system interface.... >> >> Every other situation where filesystems invalidate vfs caches is >> during mount, remount or unmount operations. Without actually >> explaining how this functionality is controlled and how user abuse >> is not possible (e.g. explain the permission model and/or how only >> root can run this operation), it is not really possible to determine >> whether we should unconditional allow VFS cache invalidation outside >> of the existing operation scope.... > > I think you are grabbing the wrong end of the stick here. > > This is not about an arbitrary user being able to control caching > behavior of a fuse filesystem. It's about the filesystem itself being > able to control caching behavior. > > I'm not arguing for the validity of this particular patch, just saying > that something like this could be valid. And as explained in my other > reply there's actually a real problem out there waiting for a > solution. The problem I'm trying to solve is that, if a filesystem wants to ask the kernel to get rid of all inodes, it has to request the kernel to forget each one, individually. The specific filesystem I'm looking at is CVMFS, which is a read-only filesystem that needs to be able to update the full set of filesystem objects when a new generation snapshot becomes available. The obvious problem with the current solution (i.e. walking through all the inodes) is that it is slow. And if new snapshot generations succeed fast enough, memory usage becomes a major issue -- enough to have a helper daemon monitoring memory and do a full remount when it passes some predefined threshold. Obviously, I'm open to other solutions, including the one suggested by Miklos in is other reply -- to get rid of the N LRU inodes. I'm not sure how that could be implemented yet, but I can have a look into that if you think that's the right interface. Cheers,
On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: > The problem I'm trying to solve is that, if a filesystem wants to ask the > kernel to get rid of all inodes, it has to request the kernel to forget > each one, individually. The specific filesystem I'm looking at is CVMFS, > which is a read-only filesystem that needs to be able to update the full > set of filesystem objects when a new generation snapshot becomes > available. Yeah, we talked about this use case. As I remember there was a proposal to set an epoch, marking all objects for "revalidate needed", which I think is a better solution to the CVMFS problem, than just getting rid of unused objects. Thanks, Miklos
On Tue, Feb 18 2025, Miklos Szeredi wrote: > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: > >> The problem I'm trying to solve is that, if a filesystem wants to ask the >> kernel to get rid of all inodes, it has to request the kernel to forget >> each one, individually. The specific filesystem I'm looking at is CVMFS, >> which is a read-only filesystem that needs to be able to update the full >> set of filesystem objects when a new generation snapshot becomes >> available. > > Yeah, we talked about this use case. As I remember there was a > proposal to set an epoch, marking all objects for "revalidate needed", > which I think is a better solution to the CVMFS problem, than just > getting rid of unused objects. OK, so I think I'm missing some context here. And, obviously, I also miss some more knowledge on the filesystem itself. But, if I understand it correctly, the concept of 'inode' in CVMFS is very loose: when a new snapshot generation is available (you mentioned 'epoch', which is, I guess, the same thing) the inodes are all renewed -- the inode numbers aren't kept between generations/epochs. Do you have any links for such discussions, or any details on how this proposal is being implemented? This would probably be done mostly in user-space I guess, but it would still need a way to get rid of the unused inodes from old snapshots, right? (inodes from old snapshots still in use would obvious be kept aroud). Cheers,
On Tue, 18 Feb 2025 at 12:51, Luis Henriques <luis@igalia.com> wrote: > > On Tue, Feb 18 2025, Miklos Szeredi wrote: > > > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: > > > >> The problem I'm trying to solve is that, if a filesystem wants to ask the > >> kernel to get rid of all inodes, it has to request the kernel to forget > >> each one, individually. The specific filesystem I'm looking at is CVMFS, > >> which is a read-only filesystem that needs to be able to update the full > >> set of filesystem objects when a new generation snapshot becomes > >> available. > > > > Yeah, we talked about this use case. As I remember there was a > > proposal to set an epoch, marking all objects for "revalidate needed", > > which I think is a better solution to the CVMFS problem, than just > > getting rid of unused objects. > > OK, so I think I'm missing some context here. And, obviously, I also miss > some more knowledge on the filesystem itself. But, if I understand it > correctly, the concept of 'inode' in CVMFS is very loose: when a new > snapshot generation is available (you mentioned 'epoch', which is, I > guess, the same thing) the inodes are all renewed -- the inode numbers > aren't kept between generations/epochs. > > Do you have any links for such discussions, or any details on how this > proposal is being implemented? This would probably be done mostly in > user-space I guess, but it would still need a way to get rid of the unused > inodes from old snapshots, right? (inodes from old snapshots still in use > would obvious be kept aroud). I don't have links. Adding Valentin Volkl and Laura Promberger to the Cc list, maybe they can help with clarification. As far as I understand it would work by incrementing fc->epoch on FUSE_INVALIDATE_ALL. When an object is looked up/created the current epoch is copied to e.g. dentry->d_time. fuse_dentry_revalidate() then compares d_time with fc->epoch and forces an invalidate on mismatch. Only problem with this is that it seems very CVMFS specific, but I guess so is your proposal. Implementing the LRU purge is more generally useful, but I'm not sure if that helps CVMFS, since it would only get rid of unused objects. Thanks, Miklos
On Tue, Feb 18 2025, Miklos Szeredi wrote: > On Tue, 18 Feb 2025 at 12:51, Luis Henriques <luis@igalia.com> wrote: >> >> On Tue, Feb 18 2025, Miklos Szeredi wrote: >> >> > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: >> > >> >> The problem I'm trying to solve is that, if a filesystem wants to ask the >> >> kernel to get rid of all inodes, it has to request the kernel to forget >> >> each one, individually. The specific filesystem I'm looking at is CVMFS, >> >> which is a read-only filesystem that needs to be able to update the full >> >> set of filesystem objects when a new generation snapshot becomes >> >> available. >> > >> > Yeah, we talked about this use case. As I remember there was a >> > proposal to set an epoch, marking all objects for "revalidate needed", >> > which I think is a better solution to the CVMFS problem, than just >> > getting rid of unused objects. >> >> OK, so I think I'm missing some context here. And, obviously, I also miss >> some more knowledge on the filesystem itself. But, if I understand it >> correctly, the concept of 'inode' in CVMFS is very loose: when a new >> snapshot generation is available (you mentioned 'epoch', which is, I >> guess, the same thing) the inodes are all renewed -- the inode numbers >> aren't kept between generations/epochs. >> >> Do you have any links for such discussions, or any details on how this >> proposal is being implemented? This would probably be done mostly in >> user-space I guess, but it would still need a way to get rid of the unused >> inodes from old snapshots, right? (inodes from old snapshots still in use >> would obvious be kept aroud). > > I don't have links. Adding Valentin Volkl and Laura Promberger to the > Cc list, maybe they can help with clarification. > > As far as I understand it would work by incrementing fc->epoch on > FUSE_INVALIDATE_ALL. When an object is looked up/created the current > epoch is copied to e.g. dentry->d_time. fuse_dentry_revalidate() then > compares d_time with fc->epoch and forces an invalidate on mismatch. OK, so hopefully Valentin or Laura will be able to help providing some more details. But, from your description, we would still require this FUSE_INVALIDATE_ALL operation to exist in order to increment the epoch. And this new operation could do that *and* also already invalidate those unused objects. > Only problem with this is that it seems very CVMFS specific, but I > guess so is your proposal. > > Implementing the LRU purge is more generally useful, but I'm not sure > if that helps CVMFS, since it would only get rid of unused objects. The LRU inodes purge can indeed work for me as well, because my patch is also only getting rid of unused objects, right? Any inode still being referenced will be kept around. So, based on your reply, let me try to summarize a possible alternative solution, that I think would be useful for CVMFS but also generic enough for other filesystems: - Add a new operation FUSE_INVAL_LRU_INODES, which would get rid of, at most, 'N' unused inodes. - This operation would have an argument 'N' with the maximum number of inodes to invalidate. - In addition, it would also increment this new fuse_connection attribute 'epoch', to be used in the dentry revalidation as you suggested above - This 'N' could also be set to a pre-#define'ed value that would mean *all* (unused) inodes. Does this make sense? Would something like this be acceptable? Cheers,
On Tue, Feb 18, 2025 at 03:26:24PM +0100, Miklos Szeredi wrote: > On Tue, 18 Feb 2025 at 12:51, Luis Henriques <luis@igalia.com> wrote: > > > > On Tue, Feb 18 2025, Miklos Szeredi wrote: > > > > > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: > > > > > >> The problem I'm trying to solve is that, if a filesystem wants to ask the > > >> kernel to get rid of all inodes, it has to request the kernel to forget > > >> each one, individually. The specific filesystem I'm looking at is CVMFS, > > >> which is a read-only filesystem that needs to be able to update the full > > >> set of filesystem objects when a new generation snapshot becomes > > >> available. > > > > > > Yeah, we talked about this use case. As I remember there was a > > > proposal to set an epoch, marking all objects for "revalidate needed", > > > which I think is a better solution to the CVMFS problem, than just > > > getting rid of unused objects. > > > > OK, so I think I'm missing some context here. And, obviously, I also miss > > some more knowledge on the filesystem itself. But, if I understand it > > correctly, the concept of 'inode' in CVMFS is very loose: when a new > > snapshot generation is available (you mentioned 'epoch', which is, I > > guess, the same thing) the inodes are all renewed -- the inode numbers > > aren't kept between generations/epochs. > > > > Do you have any links for such discussions, or any details on how this > > proposal is being implemented? This would probably be done mostly in > > user-space I guess, but it would still need a way to get rid of the unused > > inodes from old snapshots, right? (inodes from old snapshots still in use > > would obvious be kept aroud). > > I don't have links. Adding Valentin Volkl and Laura Promberger to the > Cc list, maybe they can help with clarification. > > As far as I understand it would work by incrementing fc->epoch on > FUSE_INVALIDATE_ALL. When an object is looked up/created the current > epoch is copied to e.g. dentry->d_time. fuse_dentry_revalidate() then > compares d_time with fc->epoch and forces an invalidate on mismatch. That was what I was going to suggest before I got this far in the thread. i.e. the fs itself keeps a current snapshot ID, and it keeps track of what snapshot ID each inode was instantiated for. On access to the inode it checks the IDs and if they don't match, the inode needs revalidation and then the fs can take appropriate action. This also takes care of the active inode reference problem that invalidation has, in that it can revoke access to inodes that are not longer valid as soon as they are accessed by userspace. It is also entirely internal to the filesystem (as I suggested such functionality should be earlier in the thread) and does not require walking the VFS to find all cached inodes... -Dave.
On Tue, Feb 18, 2025 at 10:15:26AM +0100, Miklos Szeredi wrote: > On Tue, 18 Feb 2025 at 01:55, Dave Chinner <david@fromorbit.com> wrote: > > > > On Mon, Feb 17, 2025 at 01:32:28PM +0000, Luis Henriques wrote: > > > Currently userspace is able to notify the kernel to invalidate the cache > > > for an inode. This means that, if all the inodes in a filesystem need to > > > be invalidated, then userspace needs to iterate through all of them and do > > > this kernel notification separately. > > > > > > This patch adds a new option that allows userspace to invalidate all the > > > inodes with a single notification operation. In addition to invalidate > > > all the inodes, it also shrinks the sb dcache. > > > > You still haven't justified why we should be exposing this > > functionality in a low level filesystem ioctl out of sight of the > > VFS. > > > > User driven VFS cache invalidation has long been considered a > > DOS-in-waiting, hence we don't allow user APIs to invalidate caches > > like this. This is one of the reasons that /proc/sys/vm/drop_caches > > requires root access - it's system debug and problem triage > > functionality, not a production system interface.... > > > > Every other situation where filesystems invalidate vfs caches is > > during mount, remount or unmount operations. Without actually > > explaining how this functionality is controlled and how user abuse > > is not possible (e.g. explain the permission model and/or how only > > root can run this operation), it is not really possible to determine > > whether we should unconditional allow VFS cache invalidation outside > > of the existing operation scope.... > > I think you are grabbing the wrong end of the stick here. > > This is not about an arbitrary user being able to control caching > behavior of a fuse filesystem. Except the filesytsem is under control of some user, yes? Nobody has explained why such functionality is actually safe to expose to user controlled filesystems from a security and permissions perspective. At minimum, this needs to be explained in the commit message so when it gets abused years down the track we have a record of why we thought this was a safe thing to do... > It's about the filesystem itself being > able to control caching behavior. I'll give the same response regardless of the filesystem - walking the list of all VFS cached inodes to invalidate them -does not work- to invalidate the entire VFS inode cache. That guarantee is only provided in unmount and mount contexts once we have guaranteed that there are no remaining active references to any cached inode. i.e. this is functionality that -does not work- as the filesystem might want it to work. revoke() is what is needed to do full VFS cache invaliadtions, but that does not exist.... > I'm not arguing for the validity of this particular patch, just saying > that something like this could be valid. And as explained in my other > reply there's actually a real problem out there waiting for a > solution. And I've already pointed out that the solution lies within that filesystem impementation, not the VFS. i.e. If a filesystem needs to invalidate all existing inode objects it has instantiated regardless of whether they are actively referenced or not by other VFS objects, then the filesystem itself needs to implement the invalidation mechanism itself to ensure persistently referenced VFS inodes are correctly handled.... -Dave.
On Tue, Feb 18, 2025 at 06:11:17PM +0000, Luis Henriques wrote: > On Tue, Feb 18 2025, Miklos Szeredi wrote: > > > On Tue, 18 Feb 2025 at 12:51, Luis Henriques <luis@igalia.com> wrote: > >> > >> On Tue, Feb 18 2025, Miklos Szeredi wrote: > >> > >> > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: > >> > > >> >> The problem I'm trying to solve is that, if a filesystem wants to ask the > >> >> kernel to get rid of all inodes, it has to request the kernel to forget > >> >> each one, individually. The specific filesystem I'm looking at is CVMFS, > >> >> which is a read-only filesystem that needs to be able to update the full > >> >> set of filesystem objects when a new generation snapshot becomes > >> >> available. > >> > > >> > Yeah, we talked about this use case. As I remember there was a > >> > proposal to set an epoch, marking all objects for "revalidate needed", > >> > which I think is a better solution to the CVMFS problem, than just > >> > getting rid of unused objects. > >> > >> OK, so I think I'm missing some context here. And, obviously, I also miss > >> some more knowledge on the filesystem itself. But, if I understand it > >> correctly, the concept of 'inode' in CVMFS is very loose: when a new > >> snapshot generation is available (you mentioned 'epoch', which is, I > >> guess, the same thing) the inodes are all renewed -- the inode numbers > >> aren't kept between generations/epochs. > >> > >> Do you have any links for such discussions, or any details on how this > >> proposal is being implemented? This would probably be done mostly in > >> user-space I guess, but it would still need a way to get rid of the unused > >> inodes from old snapshots, right? (inodes from old snapshots still in use > >> would obvious be kept aroud). > > > > I don't have links. Adding Valentin Volkl and Laura Promberger to the > > Cc list, maybe they can help with clarification. > > > > As far as I understand it would work by incrementing fc->epoch on > > FUSE_INVALIDATE_ALL. When an object is looked up/created the current > > epoch is copied to e.g. dentry->d_time. fuse_dentry_revalidate() then > > compares d_time with fc->epoch and forces an invalidate on mismatch. > > OK, so hopefully Valentin or Laura will be able to help providing some > more details. But, from your description, we would still require this > FUSE_INVALIDATE_ALL operation to exist in order to increment the epoch. > And this new operation could do that *and* also already invalidate those > unused objects. I think you are still looking at this from the wrong direction. Invalidation is -not the operation- that is being requested. The CVMFS fuse server needs to update some global state in the kernel side fuse mount (i.e. the snapshot ID/epoch), and the need to evict cached inodes from previous IDs is a CVMFS implementation optimisation related to changing the global state. > > Only problem with this is that it seems very CVMFS specific, but I > > guess so is your proposal. > > > > Implementing the LRU purge is more generally useful, but I'm not sure > > if that helps CVMFS, since it would only get rid of unused objects. > > The LRU inodes purge can indeed work for me as well, because my patch is > also only getting rid of unused objects, right? Any inode still being > referenced will be kept around. > > So, based on your reply, let me try to summarize a possible alternative > solution, that I think would be useful for CVMFS but also generic enough > for other filesystems: > > - Add a new operation FUSE_INVAL_LRU_INODES, which would get rid of, at > most, 'N' unused inodes. > > - This operation would have an argument 'N' with the maximum number of > inodes to invalidate. > > - In addition, it would also increment this new fuse_connection attribute > 'epoch', to be used in the dentry revalidation as you suggested above As per above: invalidation is an implementation optimisation for the CVMFS epoch update. Invalidation, OTOH, does not imply that any fuse mount/connector global state (e.g. the epoch) needs to change... ii.e. the operation should be FUSE_UPDATE_EPOCH, not FUSE_INVAL_LRU_INODES... > > - This 'N' could also be set to a pre-#define'ed value that would mean > *all* (unused) inodes. Saying "only invalidate N inodes" makes no sense to me - it is fundamentally impossible for userspace to get right. Either the epoch update should evict all unreferenced inodes immediately, or it should leave them all behind to be purged by memory pressure or other periodic garbage collection mechanisms. -Dave.
On Wed, Feb 19 2025, Dave Chinner wrote: > On Tue, Feb 18, 2025 at 06:11:17PM +0000, Luis Henriques wrote: >> On Tue, Feb 18 2025, Miklos Szeredi wrote: >> >> > On Tue, 18 Feb 2025 at 12:51, Luis Henriques <luis@igalia.com> wrote: >> >> >> >> On Tue, Feb 18 2025, Miklos Szeredi wrote: >> >> >> >> > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@igalia.com> wrote: >> >> > >> >> >> The problem I'm trying to solve is that, if a filesystem wants to ask the >> >> >> kernel to get rid of all inodes, it has to request the kernel to forget >> >> >> each one, individually. The specific filesystem I'm looking at is CVMFS, >> >> >> which is a read-only filesystem that needs to be able to update the full >> >> >> set of filesystem objects when a new generation snapshot becomes >> >> >> available. >> >> > >> >> > Yeah, we talked about this use case. As I remember there was a >> >> > proposal to set an epoch, marking all objects for "revalidate needed", >> >> > which I think is a better solution to the CVMFS problem, than just >> >> > getting rid of unused objects. >> >> >> >> OK, so I think I'm missing some context here. And, obviously, I also miss >> >> some more knowledge on the filesystem itself. But, if I understand it >> >> correctly, the concept of 'inode' in CVMFS is very loose: when a new >> >> snapshot generation is available (you mentioned 'epoch', which is, I >> >> guess, the same thing) the inodes are all renewed -- the inode numbers >> >> aren't kept between generations/epochs. >> >> >> >> Do you have any links for such discussions, or any details on how this >> >> proposal is being implemented? This would probably be done mostly in >> >> user-space I guess, but it would still need a way to get rid of the unused >> >> inodes from old snapshots, right? (inodes from old snapshots still in use >> >> would obvious be kept aroud). >> > >> > I don't have links. Adding Valentin Volkl and Laura Promberger to the >> > Cc list, maybe they can help with clarification. >> > >> > As far as I understand it would work by incrementing fc->epoch on >> > FUSE_INVALIDATE_ALL. When an object is looked up/created the current >> > epoch is copied to e.g. dentry->d_time. fuse_dentry_revalidate() then >> > compares d_time with fc->epoch and forces an invalidate on mismatch. >> >> OK, so hopefully Valentin or Laura will be able to help providing some >> more details. But, from your description, we would still require this >> FUSE_INVALIDATE_ALL operation to exist in order to increment the epoch. >> And this new operation could do that *and* also already invalidate those >> unused objects. > > I think you are still looking at this from the wrong direction. > > Invalidation is -not the operation- that is being requested. The > CVMFS fuse server needs to update some global state in the kernel > side fuse mount (i.e. the snapshot ID/epoch), and the need to evict > cached inodes from previous IDs is a CVMFS implementation > optimisation related to changing the global state. > >> > Only problem with this is that it seems very CVMFS specific, but I >> > guess so is your proposal. >> > >> > Implementing the LRU purge is more generally useful, but I'm not sure >> > if that helps CVMFS, since it would only get rid of unused objects. >> >> The LRU inodes purge can indeed work for me as well, because my patch is >> also only getting rid of unused objects, right? Any inode still being >> referenced will be kept around. >> >> So, based on your reply, let me try to summarize a possible alternative >> solution, that I think would be useful for CVMFS but also generic enough >> for other filesystems: >> >> - Add a new operation FUSE_INVAL_LRU_INODES, which would get rid of, at >> most, 'N' unused inodes. >> >> - This operation would have an argument 'N' with the maximum number of >> inodes to invalidate. >> >> - In addition, it would also increment this new fuse_connection attribute >> 'epoch', to be used in the dentry revalidation as you suggested above > > As per above: invalidation is an implementation optimisation for the > CVMFS epoch update. Invalidation, OTOH, does not imply that any fuse > mount/connector global state (e.g. the epoch) needs to change... > > ii.e. the operation should be FUSE_UPDATE_EPOCH, not > FUSE_INVAL_LRU_INODES... > >> >> - This 'N' could also be set to a pre-#define'ed value that would mean >> *all* (unused) inodes. > > Saying "only invalidate N inodes" makes no sense to me - it is > fundamentally impossible for userspace to get right. Either the > epoch update should evict all unreferenced inodes immediately, or it > should leave them all behind to be purged by memory pressure or > other periodic garbage collection mechanisms. So, below I've a patch that is totally untested (not even compile-tested). It's unlikely to be fully correct, but I just wanted to make sure I got the main idea right. What I'm trying to do there is to initialize this new 'epoch' counter, both in the fuse connection and in every new dentry. Then, in the ->d_revalidate() it simply invalidate a dentry if the epochs don't match. Then, there's the new fuse notify operation to increment the epoch and shrink dcache (dropped the call to {evict,invalidate}_inodes() as Miklos suggested elsewhere). Does this look reasonable? (I may be missing other places where epoch should be checked or initialized.) Cheers,
On Wed, 19 Feb 2025 at 12:23, Luis Henriques <luis@igalia.com> wrote: > +static int fuse_notify_update_epoch(struct fuse_conn *fc) > +{ > + struct fuse_mount *fm; > + struct inode *inode; > + > + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); > + if (!inode) || !fm) > + return -ENOENT; > + > + iput(inode); > + atomic_inc(&fc->epoch); > + shrink_dcache_sb(fm->sb); This is just an optimization and could be racy, kicking out valid cache (harmlessly of course). I'd leave it out of the first version. There could be more than one fuse_mount instance. Wondering if epoch should be per-fm not per-fc... > @@ -204,6 +204,12 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name, > int ret; > > inode = d_inode_rcu(entry); > + if (inode) { > + fm = get_fuse_mount(inode); > + if (entry->d_time < atomic_read(&fm->fc->epoch)) > + goto invalid; > + } Negative dentries need to be invalidated too. > @@ -446,6 +452,12 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, > goto out_err; > > entry = newent ? newent : entry; > + if (inode) { > + struct fuse_mount *fm = get_fuse_mount(inode); > + entry->d_time = atomic_read(&fm->fc->epoch); > + } else { > + entry->d_time = 0; > + } Again, should do the same for positive and negative dentries. Need to read out fc->epoch before sending the request to the server, otherwise might get a stale dentry with an updated epoch. This also needs to be done in fuse_create_open(), create_new_entry() and fuse_direntplus_link(). Thanks, Miklos
On Wed, Feb 19 2025, Miklos Szeredi wrote: > On Wed, 19 Feb 2025 at 12:23, Luis Henriques <luis@igalia.com> wrote: > >> +static int fuse_notify_update_epoch(struct fuse_conn *fc) >> +{ >> + struct fuse_mount *fm; >> + struct inode *inode; >> + >> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >> + if (!inode) || !fm) >> + return -ENOENT; >> + >> + iput(inode); >> + atomic_inc(&fc->epoch); >> + shrink_dcache_sb(fm->sb); > > This is just an optimization and could be racy, kicking out valid > cache (harmlessly of course). I'd leave it out of the first version. OK, will do. > There could be more than one fuse_mount instance. Wondering if epoch > should be per-fm not per-fc... Good question. Because the cache is shared among the several fuse_mount instances the epoch may eventually affect all of them even if it's a per-fm attribute. But on the other hand, different mounts could focus on a different set of filesystem subtrees so... yeah, I'll probably leave it in fc for now while thinking about it some more. >> @@ -204,6 +204,12 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name, >> int ret; >> >> inode = d_inode_rcu(entry); >> + if (inode) { >> + fm = get_fuse_mount(inode); >> + if (entry->d_time < atomic_read(&fm->fc->epoch)) >> + goto invalid; >> + } > > Negative dentries need to be invalidated too. Ack. >> @@ -446,6 +452,12 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, >> goto out_err; >> >> entry = newent ? newent : entry; >> + if (inode) { >> + struct fuse_mount *fm = get_fuse_mount(inode); >> + entry->d_time = atomic_read(&fm->fc->epoch); >> + } else { >> + entry->d_time = 0; >> + } > > Again, should do the same for positive and negative dentries. > > Need to read out fc->epoch before sending the request to the server, > otherwise might get a stale dentry with an updated epoch. Ah, good point. > This also needs to be done in fuse_create_open(), create_new_entry() > and fuse_direntplus_link(). Yeah I suspected there were a few other places where this would be required. I'll look closer into that. Thanks a lot for your feedback, Miklos. I'll work on this new approach, so that I can send a real patch soon. Cheers,
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index e9db2cb8c150..64fa0806e97d 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -547,6 +547,37 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, return NULL; } +static int fuse_reverse_inval_all(struct fuse_conn *fc) +{ + struct fuse_mount *fm; + struct inode *inode; + + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); + if (!inode || !fm) + return -ENOENT; + iput(inode); + + /* Remove all possible active references to cached inodes */ + shrink_dcache_sb(fm->sb); + + /* Remove all unreferenced inodes from cache */ + invalidate_inodes(fm->sb); + + return 0; +} + +/* + * Notify to invalidate inodes cache. It can be called with @nodeid set to + * either: + * + * - An inode number - Any pending writebacks within the rage [@offset @len] + * will be triggered and the inode will be validated. To invalidate the whole + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset + * is negative, only the inode attributes are invalidated. + * + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated + * and the whole dcache is shrinked. + */ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, loff_t offset, loff_t len) { @@ -555,6 +586,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, pgoff_t pg_start; pgoff_t pg_end; + if (nodeid == FUSE_INVAL_ALL_INODES) + return fuse_reverse_inval_all(fc); + inode = fuse_ilookup(fc, nodeid, NULL); if (!inode) return -ENOENT; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 5e0eb41d967e..e5852b63f99f 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -669,6 +669,9 @@ enum fuse_notify_code { FUSE_NOTIFY_CODE_MAX, }; +/* The nodeid to request to invalidate all inodes */ +#define FUSE_INVAL_ALL_INODES 0 + /* The read buffer is required to be at least 8k, but may be much larger */ #define FUSE_MIN_READ_BUFFER 8192
Currently userspace is able to notify the kernel to invalidate the cache for an inode. This means that, if all the inodes in a filesystem need to be invalidated, then userspace needs to iterate through all of them and do this kernel notification separately. This patch adds a new option that allows userspace to invalidate all the inodes with a single notification operation. In addition to invalidate all the inodes, it also shrinks the sb dcache. Signed-off-by: Luis Henriques <luis@igalia.com> --- fs/fuse/inode.c | 34 ++++++++++++++++++++++++++++++++++ include/uapi/linux/fuse.h | 3 +++ 2 files changed, 37 insertions(+)