Message ID | 20250318194111.19419-1-James.Bottomley@HansenPartnership.com (mailing list archive) |
---|---|
Headers | show |
Series | create simple libfs directory iterator and make efivarfs use it | expand |
On Tue, Mar 18, 2025 at 03:41:08PM -0400, James Bottomley wrote: > [Note this is built on top of the previous patch to populate path.mnt] > > This turned out to be much simpler than I feared. The first patch > breaks out the core of the current dcache_readdir() into an internal > function with a callback (there should be no functional change). The > second adds a new API, simple_iterate_call(), which loops over the > dentries in the next level and executes a callback for each one and > the third which removes all the efivarfs superblock and mnt crud and > replaces it with this simple callback interface. I think the > diffstats of the third patch demonstrate how much nicer it is for us: I suspect that you are making it too generic for its own good. dcache_readdir() needs to cope with the situation when there are fuckloads of opened-and-unliked files in there. That's why we play those games with cursors under if (need_resched()) there. That's not the case for efivarfs. There you really want just "grab a reference to the next positive, drop the reference we were given" and that's it. IOW, find_next_child() instead of scan_positives(). Export that and it becomes just a simple loop - child = NULL; while ((child = find_next_child(parent, child)) != NULL) { struct inode *inode = d_inode(child); struct efivar_entry *entry = efivar_entry(inode); err = efivar_entry_size(entry, &size); inode_lock(inode); i_size_write(inode, err ? 0 : size + sizeof(__u32)); inode_unlock(inode); if (err) simple_recursive_removal(child, NULL); } and that's it. No callbacks, no cursors, no iterators - just an export of helper already there.
On Tue, Mar 18, 2025 at 11:45:05PM +0000, Al Viro wrote: > and it becomes just a simple loop - > child = NULL; > while ((child = find_next_child(parent, child)) != NULL) { that being root, obviously. And we might want a better function name than that...
On Tue, 2025-03-18 at 23:45 +0000, Al Viro wrote: [...] > I suspect that you are making it too generic for its own good. > > dcache_readdir() needs to cope with the situation when there are > fuckloads of opened-and-unliked files in there. That's why we > play those games with cursors under if (need_resched()) there. Yes, but those games will never really activate for the efivarfs code because we expect to be mostly positive. What I was aiming for was to make sure there's no duplication of the subtle code, so someone can't forget to update it in two places, so doing a callback approach seemed natural. > That's not the case for efivarfs. There you really want just > "grab a reference to the next positive, drop the reference we > were given" and that's it. > > IOW, find_next_child() instead of scan_positives(). Export that > and it becomes just a simple loop - > child = NULL; > while ((child = find_next_child(parent, child)) != NULL) { > struct inode *inode = d_inode(child); > struct efivar_entry *entry = efivar_entry(inode); > > err = efivar_entry_size(entry, &size); > > inode_lock(inode); > i_size_write(inode, err ? 0 : size + sizeof(__u32)); > inode_unlock(inode); > > if (err) > simple_recursive_removal(child, NULL); > } > and that's it. No callbacks, no cursors, no iterators - just an > export of helper already there. So we don't mind the callbacks; we have to do it for efivar_init() lower down anyway. However, I did take a cut at doing this based on simple positive (see below). I assume you won't like the way I have to allocate and toss a cursor for each iteration, nor the fact that there's still the cond_resched() in there? I think I can fix that but I'd have to slice apart simple_positive(), which is a bigger undertaking. > And we might want a better function name than that... I went for simple_next_child() ... Regards, James --- fs/libfs.c | 41 +++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 2 files changed, 43 insertions(+) diff --git a/fs/libfs.c b/fs/libfs.c index 8444f5cc4064..86f29cc2b85a 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -146,6 +146,47 @@ static struct dentry *scan_positives(struct dentry *cursor, return found; } +/** + * simple_next_child - get next child of the parent directory + * + * @parent: the directory to scan + * @child: last child (or NULL to start at the beginning) + * + * returns the next positive child in sequence with the reference + * elevated but if a child is passed in will drop that + * reference. Returns NULL on either error or when directory has been + * fully scanned. + * + * The intended use is as an iterator because all the references will + * be dropped by the end of the while loop: + * + * child = NULL + * while(child = simple_next_child(parent, child)) { + * // do something + * } + */ +struct dentry *simple_next_child(struct dentry *parent, struct dentry *child) +{ + struct hlist_node **p; + struct dentry *cursor = d_alloc_cursor(parent); + + if (!cursor) { + dput(child); + return NULL; + } + + if (child) + p = &child->d_sib.next; + else + p = &parent->d_children.first; + + child = scan_positives(cursor, p, 1, child); + dput(cursor); + + return child; +} +EXPORT_SYMBOL(simple_next_child); + loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence) { struct dentry *dentry = file->f_path.dentry; diff --git a/include/linux/fs.h b/include/linux/fs.h index 2788df98080f..dd84d1c3b8af 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3531,6 +3531,8 @@ extern int simple_rename(struct mnt_idmap *, struct inode *, unsigned int); extern void simple_recursive_removal(struct dentry *, void (*callback)(struct dentry *)); +extern struct dentry *simple_next_child(struct dentry *parent, + struct dentry *child); extern int noop_fsync(struct file *, loff_t, loff_t, int); extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter); extern int simple_empty(struct dentry *);
On Wed, 2025-03-19 at 12:46 -0400, James Bottomley wrote: > I assume you won't like the way I have to allocate and toss a cursor > for each iteration, nor the fact that there's still the > cond_resched() in there? I think I can fix that but I'd have to > slice apart simple_positive(), which is a bigger undertaking. So this is what I think that would look like: it still exports the same simple_next_child() function but doesn't need to allocate cursors or do any of the cond_resched() stuff. The down side is that the slice is a lot less easily verified than the other two patches, so this one's going to need a lot of code inspection to make sure I got it right. Regards, James --- fs/libfs.c | 85 +++++++++++++++++++++++++++++++++++++++------- include/linux/fs.h | 2 ++ 2 files changed, 74 insertions(+), 13 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index 8444f5cc4064..3a32910f44dc 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -101,6 +101,34 @@ int dcache_dir_close(struct inode *inode, struct file *file) } EXPORT_SYMBOL(dcache_dir_close); +/* + * helper for positive scanning. Requires a nested while loop which should + * be broken if this returns false + */ +static bool internal_scan_positive(struct hlist_node **p, struct dentry *d, + struct dentry **found, loff_t *count) +{ + while (*p) { + p = &d->d_sib.next; + + // we must at least skip cursors, to avoid livelocks + if (d->d_flags & DCACHE_DENTRY_CURSOR) + continue; + + if (simple_positive(d) && !--*count) { + spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED); + if (simple_positive(d)) + *found = dget_dlock(d); + spin_unlock(&d->d_lock); + if (likely(*found)) + return true; + *count = 1; + } + return false; + } + return true; +} + /* parent is locked at least shared */ /* * Returns an element of siblings' list. @@ -118,19 +146,9 @@ static struct dentry *scan_positives(struct dentry *cursor, spin_lock(&dentry->d_lock); while (*p) { struct dentry *d = hlist_entry(*p, struct dentry, d_sib); - p = &d->d_sib.next; - // we must at least skip cursors, to avoid livelocks - if (d->d_flags & DCACHE_DENTRY_CURSOR) - continue; - if (simple_positive(d) && !--count) { - spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED); - if (simple_positive(d)) - found = dget_dlock(d); - spin_unlock(&d->d_lock); - if (likely(found)) - break; - count = 1; - } + + if (internal_scan_positive(p, d, &found, &count)) + break; if (need_resched()) { if (!hlist_unhashed(&cursor->d_sib)) __hlist_del(&cursor->d_sib); @@ -146,6 +164,47 @@ static struct dentry *scan_positives(struct dentry *cursor, return found; } +/** + * simple_next_child - get next child of the parent directory + * + * @parent: the directory to scan + * @child: last child (or NULL to start at the beginning) + * + * returns the next positive child in sequence with the reference + * elevated but if a child is passed in will drop that + * reference. Returns NULL on either error or when directory has been + * fully scanned. + * + * The intended use is as an iterator because all the references will + * be dropped by the end of the while loop: + * + * child = NULL + * while(child = simple_next_child(parent, child)) { + * // do something + * } + */ +struct dentry *simple_next_child(struct dentry *parent, struct dentry *child) +{ + struct hlist_node **p; + loff_t count = 1; + struct dentry *found = NULL; + + if (child) + p = &child->d_sib.next; + else + p = &parent->d_children.first; + + while (*p) { + struct dentry *d = hlist_entry(*p, struct dentry, d_sib); + + if (internal_scan_positive(p, d, &found, &count)) + break; + } + dput(child); + return found; +} +EXPORT_SYMBOL(simple_next_child); + loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence) { struct dentry *dentry = file->f_path.dentry; diff --git a/include/linux/fs.h b/include/linux/fs.h index 2788df98080f..dd84d1c3b8af 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3531,6 +3531,8 @@ extern int simple_rename(struct mnt_idmap *, struct inode *, unsigned int); extern void simple_recursive_removal(struct dentry *, void (*callback)(struct dentry *)); +extern struct dentry *simple_next_child(struct dentry *parent, + struct dentry *child); extern int noop_fsync(struct file *, loff_t, loff_t, int); extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter); extern int simple_empty(struct dentry *);