mbox series

[RFC,0/3] create simple libfs directory iterator and make efivarfs use it

Message ID 20250318194111.19419-1-James.Bottomley@HansenPartnership.com (mailing list archive)
Headers show
Series create simple libfs directory iterator and make efivarfs use it | expand

Message

James Bottomley March 18, 2025, 7:41 p.m. UTC
[Note this is built on top of the previous patch to populate path.mnt]

This turned out to be much simpler than I feared.  The first patch
breaks out the core of the current dcache_readdir() into an internal
function with a callback (there should be no functional change).  The
second adds a new API, simple_iterate_call(), which loops over the
dentries in the next level and executes a callback for each one and
the third which removes all the efivarfs superblock and mnt crud and
replaces it with this simple callback interface.  I think the
diffstats of the third patch demonstrate how much nicer it is for us:

 1 file changed, 7 insertions(+), 96 deletions(-)

Regards,

James

---

James Bottomley (3):
  libfs: rework dcache_readdir to use an internal function with callback
  libfs: add simple directory iteration function with callback
  efivarfs: replace iterate_dir with libfs function simple_iterate_call

 fs/efivarfs/super.c | 103 +++-----------------------------------------
 fs/libfs.c          |  74 +++++++++++++++++++++++++------
 include/linux/fs.h  |   2 +
 3 files changed, 71 insertions(+), 108 deletions(-)

Comments

Al Viro March 18, 2025, 11:45 p.m. UTC | #1
On Tue, Mar 18, 2025 at 03:41:08PM -0400, James Bottomley wrote:
> [Note this is built on top of the previous patch to populate path.mnt]
> 
> This turned out to be much simpler than I feared.  The first patch
> breaks out the core of the current dcache_readdir() into an internal
> function with a callback (there should be no functional change).  The
> second adds a new API, simple_iterate_call(), which loops over the
> dentries in the next level and executes a callback for each one and
> the third which removes all the efivarfs superblock and mnt crud and
> replaces it with this simple callback interface.  I think the
> diffstats of the third patch demonstrate how much nicer it is for us:

I suspect that you are making it too generic for its own good.

dcache_readdir() needs to cope with the situation when there are
fuckloads of opened-and-unliked files in there.  That's why we
play those games with cursors under if (need_resched()) there.
That's not the case for efivarfs.  There you really want just
"grab a reference to the next positive, drop the reference we
were given" and that's it.

IOW, find_next_child() instead of scan_positives().  Export that
and it becomes just a simple loop -
	child = NULL;
	while ((child = find_next_child(parent, child)) != NULL) {
		struct inode *inode = d_inode(child);
		struct efivar_entry *entry = efivar_entry(inode);

		err = efivar_entry_size(entry, &size);

		inode_lock(inode);
		i_size_write(inode, err ? 0 : size + sizeof(__u32));
		inode_unlock(inode);

		if (err)
			simple_recursive_removal(child, NULL);
	}
and that's it.  No callbacks, no cursors, no iterators - just an
export of helper already there.
Al Viro March 18, 2025, 11:49 p.m. UTC | #2
On Tue, Mar 18, 2025 at 11:45:05PM +0000, Al Viro wrote:
> and it becomes just a simple loop -
> 	child = NULL;
> 	while ((child = find_next_child(parent, child)) != NULL) {
					that being root, obviously.

And we might want a better function name than that...
James Bottomley March 19, 2025, 4:46 p.m. UTC | #3
On Tue, 2025-03-18 at 23:45 +0000, Al Viro wrote:
[...]
> I suspect that you are making it too generic for its own good.
> 
> dcache_readdir() needs to cope with the situation when there are
> fuckloads of opened-and-unliked files in there.  That's why we
> play those games with cursors under if (need_resched()) there.

Yes, but those games will never really activate for the efivarfs code
because we expect to be mostly positive.  What I was aiming for was to
make sure there's no duplication of the subtle code,  so someone can't
forget to update it in two places, so doing a callback approach seemed
natural.

> That's not the case for efivarfs.  There you really want just
> "grab a reference to the next positive, drop the reference we
> were given" and that's it.
> 
> IOW, find_next_child() instead of scan_positives().  Export that
> and it becomes just a simple loop -
> 	child = NULL;
> 	while ((child = find_next_child(parent, child)) != NULL) {
> 		struct inode *inode = d_inode(child);
> 		struct efivar_entry *entry = efivar_entry(inode);
> 
> 		err = efivar_entry_size(entry, &size);
> 
> 		inode_lock(inode);
> 		i_size_write(inode, err ? 0 : size + sizeof(__u32));
> 		inode_unlock(inode);
> 
> 		if (err)
> 			simple_recursive_removal(child, NULL);
> 	}
> and that's it.  No callbacks, no cursors, no iterators - just an
> export of helper already there.

So we don't mind the callbacks; we have to do it for efivar_init()
lower down anyway.  However, I did take a cut at doing this based on
simple positive (see below).  I assume you won't like the way I have to
allocate and toss a cursor for each iteration, nor the fact that
there's still the cond_resched() in there?  I think I can fix that but
I'd have to slice apart simple_positive(), which is a bigger
undertaking.

> And we might want a better function name than that...

I went for simple_next_child() ...

Regards,

James

---

 fs/libfs.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 8444f5cc4064..86f29cc2b85a 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -146,6 +146,47 @@ static struct dentry *scan_positives(struct dentry *cursor,
 	return found;
 }
 
+/**
+ * simple_next_child - get next child of the parent directory
+ *
+ * @parent: the directory to scan
+ * @child: last child (or NULL to start at the beginning)
+ *
+ * returns the next positive child in sequence with the reference
+ * elevated but if a child is passed in will drop that
+ * reference. Returns NULL on either error or when directory has been
+ * fully scanned.
+ *
+ * The intended use is as an iterator because all the references will
+ * be dropped by the end of the while loop:
+ *
+ *     child = NULL
+ *     while(child = simple_next_child(parent, child)) {
+ *          // do something
+ *     }
+ */
+struct dentry *simple_next_child(struct dentry *parent, struct dentry *child)
+{
+	struct hlist_node **p;
+	struct dentry *cursor = d_alloc_cursor(parent);
+
+	if (!cursor) {
+		dput(child);
+		return NULL;
+	}
+
+	if (child)
+		p = &child->d_sib.next;
+	else
+		p = &parent->d_children.first;
+
+	child = scan_positives(cursor, p, 1, child);
+	dput(cursor);
+
+	return child;
+}
+EXPORT_SYMBOL(simple_next_child);
+
 loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
 {
 	struct dentry *dentry = file->f_path.dentry;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2788df98080f..dd84d1c3b8af 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3531,6 +3531,8 @@ extern int simple_rename(struct mnt_idmap *, struct inode *,
 			 unsigned int);
 extern void simple_recursive_removal(struct dentry *,
                               void (*callback)(struct dentry *));
+extern struct dentry *simple_next_child(struct dentry *parent,
+					struct dentry *child);
 extern int noop_fsync(struct file *, loff_t, loff_t, int);
 extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 extern int simple_empty(struct dentry *);
James Bottomley March 19, 2025, 6:45 p.m. UTC | #4
On Wed, 2025-03-19 at 12:46 -0400, James Bottomley wrote:
> I assume you won't like the way I have to allocate and toss a cursor
> for each iteration, nor the fact that there's still the
> cond_resched() in there?  I think I can fix that but I'd have to
> slice apart simple_positive(), which is a bigger undertaking.

So this is what I think that would look like: it still exports the same
simple_next_child() function but doesn't need to allocate cursors or do
any of the cond_resched() stuff.  The down side is that the slice is a
lot less easily verified than the other two patches, so this one's
going to need a lot of code inspection to make sure I got it right.

Regards,

James

---

 fs/libfs.c         | 85 +++++++++++++++++++++++++++++++++++++++-------
 include/linux/fs.h |  2 ++
 2 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 8444f5cc4064..3a32910f44dc 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -101,6 +101,34 @@ int dcache_dir_close(struct inode *inode, struct
file *file)
 }
 EXPORT_SYMBOL(dcache_dir_close);
 
+/*
+ * helper for positive scanning.  Requires a nested while loop which should
+ * be broken if this returns false
+ */
+static bool internal_scan_positive(struct hlist_node **p, struct dentry *d,
+				   struct dentry **found, loff_t *count)
+{
+	while (*p) {
+		p = &d->d_sib.next;
+
+		// we must at least skip cursors, to avoid livelocks
+		if (d->d_flags & DCACHE_DENTRY_CURSOR)
+			continue;
+
+		if (simple_positive(d) && !--*count) {
+			spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
+			if (simple_positive(d))
+				*found = dget_dlock(d);
+			spin_unlock(&d->d_lock);
+			if (likely(*found))
+				return true;
+			*count = 1;
+		}
+		return false;
+	}
+	return true;
+}
+
 /* parent is locked at least shared */
 /*
  * Returns an element of siblings' list.
@@ -118,19 +146,9 @@ static struct dentry *scan_positives(struct dentry *cursor,
 	spin_lock(&dentry->d_lock);
 	while (*p) {
 		struct dentry *d = hlist_entry(*p, struct dentry, d_sib);
-		p = &d->d_sib.next;
-		// we must at least skip cursors, to avoid livelocks
-		if (d->d_flags & DCACHE_DENTRY_CURSOR)
-			continue;
-		if (simple_positive(d) && !--count) {
-			spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
-			if (simple_positive(d))
-				found = dget_dlock(d);
-			spin_unlock(&d->d_lock);
-			if (likely(found))
-				break;
-			count = 1;
-		}
+
+		if (internal_scan_positive(p, d, &found, &count))
+			break;
 		if (need_resched()) {
 			if (!hlist_unhashed(&cursor->d_sib))
 				__hlist_del(&cursor->d_sib);
@@ -146,6 +164,47 @@ static struct dentry *scan_positives(struct dentry *cursor,
 	return found;
 }
 
+/**
+ * simple_next_child - get next child of the parent directory
+ *
+ * @parent: the directory to scan
+ * @child: last child (or NULL to start at the beginning)
+ *
+ * returns the next positive child in sequence with the reference
+ * elevated but if a child is passed in will drop that
+ * reference. Returns NULL on either error or when directory has been
+ * fully scanned.
+ *
+ * The intended use is as an iterator because all the references will
+ * be dropped by the end of the while loop:
+ *
+ *     child = NULL
+ *     while(child = simple_next_child(parent, child)) {
+ *          // do something
+ *     }
+ */
+struct dentry *simple_next_child(struct dentry *parent, struct dentry *child)
+{
+	struct hlist_node **p;
+	loff_t count = 1;
+	struct dentry *found = NULL;
+
+	if (child)
+		p = &child->d_sib.next;
+	else
+		p = &parent->d_children.first;
+
+	while (*p) {
+		struct dentry *d = hlist_entry(*p, struct dentry, d_sib);
+
+		if (internal_scan_positive(p, d, &found, &count))
+			break;
+	}
+	dput(child);
+	return found;
+}
+EXPORT_SYMBOL(simple_next_child);
+
 loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
 {
 	struct dentry *dentry = file->f_path.dentry;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2788df98080f..dd84d1c3b8af 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3531,6 +3531,8 @@ extern int simple_rename(struct mnt_idmap *, struct inode *,
 			 unsigned int);
 extern void simple_recursive_removal(struct dentry *,
                               void (*callback)(struct dentry *));
+extern struct dentry *simple_next_child(struct dentry *parent,
+					struct dentry *child);
 extern int noop_fsync(struct file *, loff_t, loff_t, int);
 extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 extern int simple_empty(struct dentry *);