diff mbox series

[RFC,v2,6/7] fs: introduce a usage count into the superblock

Message ID 20210414134737.2366971-7-yi.zhang@huawei.com (mailing list archive)
State New, archived
Headers show
Series ext4, jbd2: fix 3 issues about bdev_try_to_free_page() | expand

Commit Message

Zhang Yi April 14, 2021, 1:47 p.m. UTC
Commit <87d8fe1ee6b8> ("add releasepage hooks to block devices which can
be used by file systems") introduce a hook that used by ext4 filesystem
to release journal buffers, but it doesn't add corresponding concurrency
protection that ->bdev_try_to_free_page() could be raced by umount
filesystem concurrently. This patch add a usage count on superblock that
filesystem can use it to prevent above race and make invoke
->bdev_try_to_free_page() safe.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 include/linux/fs.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Christoph Hellwig April 15, 2021, 2:40 p.m. UTC | #1
On Wed, Apr 14, 2021 at 09:47:36PM +0800, Zhang Yi wrote:
> Commit <87d8fe1ee6b8> ("add releasepage hooks to block devices which can
> be used by file systems") introduce a hook that used by ext4 filesystem
> to release journal buffers, but it doesn't add corresponding concurrency
> protection that ->bdev_try_to_free_page() could be raced by umount
> filesystem concurrently. This patch add a usage count on superblock that
> filesystem can use it to prevent above race and make invoke
> ->bdev_try_to_free_page() safe.

We already have two refcounts in the superblock: s_active which counts
the active refernce, and s_count which prevents the data structures
from beeing freed.  I don't think we need yet another one.
Zhang Yi April 16, 2021, 8 a.m. UTC | #2
Hi, Christoph

On 2021/4/15 22:40, Christoph Hellwig wrote:
> On Wed, Apr 14, 2021 at 09:47:36PM +0800, Zhang Yi wrote:
>> Commit <87d8fe1ee6b8> ("add releasepage hooks to block devices which can
>> be used by file systems") introduce a hook that used by ext4 filesystem
>> to release journal buffers, but it doesn't add corresponding concurrency
>> protection that ->bdev_try_to_free_page() could be raced by umount
>> filesystem concurrently. This patch add a usage count on superblock that
>> filesystem can use it to prevent above race and make invoke
>> ->bdev_try_to_free_page() safe.
> 
> We already have two refcounts in the superblock: s_active which counts
> the active refernce, and s_count which prevents the data structures
> from beeing freed.  I don't think we need yet another one.
> .
> 

Thanks you for your response. I checked the s_count and s_active refcounts,
but it seems that we could not use these two refcounts directly.

For the s_active. If we get s_active refcount in blkdev_releasepage(), we may
put the final refcount when doing umount concurrently and have to do resource
recycling, but we got page locked here and lead to deadlock. Maybe we could do
async resource recycling through kworker, but I think it's not a good way.

For the s_count, it is operated under the global sb_lock now, so get this
refcount will serialize page release and affect performance. Besides, It's
semantics are different from the 'usage count' by private fileststem, and we
have to cooperate with sb->s_umount mutex lock to close the above race.

So I introduce another refcount. Am I missing something or any suggestions?

Thanks,
Yi.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec8f3ddf4a6a..3c6a5c08c2df 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -41,6 +41,7 @@ 
 #include <linux/stddef.h>
 #include <linux/mount.h>
 #include <linux/cred.h>
+#include <linux/percpu-refcount.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1547,6 +1548,13 @@  struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	/*
+	 * Count users who are using the super_block, used to protect
+	 * umount filesystem concurrently with others.
+	 */
+	struct percpu_ref	s_usage_counter;
+	wait_queue_head_t	s_usage_waitq;
 } __randomize_layout;
 
 /* Helper functions so that in most cases filesystems will
@@ -1765,6 +1773,27 @@  static inline bool sb_start_intwrite_trylock(struct super_block *sb)
 bool inode_owner_or_capable(struct user_namespace *mnt_userns,
 			    const struct inode *inode);
 
+static inline void sb_usage_counter_release(struct percpu_ref *ref)
+{
+	struct super_block *sb;
+
+	sb = container_of(ref, struct super_block, s_usage_counter);
+	wake_up(&sb->s_usage_waitq);
+}
+
+static inline int sb_usage_counter_init(struct super_block *sb)
+{
+	init_waitqueue_head(&sb->s_usage_waitq);
+	return percpu_ref_init(&sb->s_usage_counter, sb_usage_counter_release,
+			       0, GFP_KERNEL);
+}
+
+static inline void sb_usage_counter_wait(struct super_block *sb)
+{
+	percpu_ref_kill(&sb->s_usage_counter);
+	wait_event(sb->s_usage_waitq, percpu_ref_is_zero(&sb->s_usage_counter));
+}
+
 /*
  * VFS helper functions..
  */