From patchwork Wed Oct 2 01:33:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819220 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E4921C36 for ; Wed, 2 Oct 2024 01:40:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833224; cv=none; b=cDV3MZtmE4GH8TqqhARmXvPx0MRnM5ypfUXv74Bf8J8cVhPTTK+/8r70AwzQtpm6Tm5IuSOLoi8Ac4d+zn/P/leSddRWkwZhe30jXQcRPsVpY4S3MCPxxmzdFhhoFUA+AS3+ma/Iz9ARRUPSWDv4L80+SFJCX3t03Ks88r9c47k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833224; c=relaxed/simple; bh=octIU91dvJC7QTSpf1g04NsSgkp/76SSEC+udarKgpw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LHpsYBss7/eCCZGtHGA7r3lW4cCqmcjCNamdXlwbAt1TEXiK4ye6DhfH2SrxwQuCIFD03mYszB3dwQk5gxixuzN7Lfa4bJ/wVwAqH4Ho+loqd0yKKyiXoTQL2ReTHId1t5IxqrNGeb7Sca46Fr/JXPu3MbahG+QxWwxh8QfIW8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=Hyimm1kj; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="Hyimm1kj" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-719b17b2da1so4500843b3a.0 for ; Tue, 01 Oct 2024 18:40:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833223; x=1728438023; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ADEbyi31TdRw5PkbTTqYB4v7r+R49eetqXpO/YsbAKY=; b=Hyimm1kjtLEdszTLDvcxRmc2fkYCEPYH6uCdPJmz/dB0d9W5Nipv8x8rIX8c9c6ZnY 7J7J5uut3rrc7cAZNEPCWpZtrm51mgAM6HwUK43P7MC/zLFXvCCKG++mg06KH8LZRymS ulFC3qNrZfVdRGMwgYZh986OCAK1D+qptPc9v3a8nGTF/X5KDw+wDmAxfEh1lrt5MtzL 4P8TE0YmkT70bw4gEJplHK7hr/0sfpvYCon7P0kiSYfSHWE+YURNtGzhSXhxm+XBzYQv GWFqIt2v8qCkA+/jqQg68LWHOCtEbhOj5Oq4aOjnMQJwZ0yWCFkLeGG9jRkdZOr++RVJ FQmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833223; x=1728438023; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ADEbyi31TdRw5PkbTTqYB4v7r+R49eetqXpO/YsbAKY=; b=eOqOmsYV2eUWLt3sCsAAvViUqurgQ/4D5z9+/34zdSx8n1m0p/8ZJWVSTKIP9LJk7L uA+5YcSz9AaxLTEVawZpBowhDesV5YaitpamDNlaWqCNns8mUGmPpETtm+JZWispSyc7 orEcqhojx7eJVnNnP2aMmu+fbrTqHnFid2uvQ56dukYTKQw+n6KmVFzuwTfq+qjqmh8F nkRzE8YqZ/wdOYeoQrYmgiABj6g9kiIz1Wj3FqixyDuK+68/9y+NgqvVRC7dTsh9o2BJ KgDYQ/seSGlECLKPCTngSwfyu835XD3vP62pzzP0wwiJ5Pfk2A9QD8/2HbXbiBk6Hdiw TUDA== X-Gm-Message-State: AOJu0YyrzL0s4Y9Z7R6GMEfd1PF8E6tcDRDXea5ZE7nxtC2azAHP+Jba O8YDL12yNzQTOpXEjKQReGj2Ori6xTLi3lmvxSIDdfRffFBE2RaxswaUWoW0iCk6rgabaI7kfvw V X-Google-Smtp-Source: AGHT+IEZkPzavG/0ctUXSwBmb+RiGkcau7gELiXN8VvFFwrvg3zoqeGa50a7ZkffoAiN7R5hu1C+6Q== X-Received: by 2002:a05:6a00:1496:b0:713:e3f9:b58e with SMTP id d2e1a72fcca58-71dc5d590e3mr2822396b3a.17.1727833222634; Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71b2649b08csm9091868b3a.11.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8R-06; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGC-1zbh; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 1/7] vfs: replace invalidate_inodes() with evict_inodes() Date: Wed, 2 Oct 2024 11:33:18 +1000 Message-ID: <20241002014017.3801899-2-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner As of commit e127b9bccdb0 ("fs: simplify invalidate_inodes"), invalidate_inodes() is functionally identical to evict_inodes(). Replace calls to invalidate_inodes() with a call to evict_inodes() and kill the former. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara --- fs/inode.c | 40 ---------------------------------------- fs/internal.h | 1 - fs/super.c | 2 +- 3 files changed, 1 insertion(+), 42 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 471ae4a31549..0a53d8c34203 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -827,46 +827,6 @@ void evict_inodes(struct super_block *sb) } EXPORT_SYMBOL_GPL(evict_inodes); -/** - * invalidate_inodes - attempt to free all inodes on a superblock - * @sb: superblock to operate on - * - * Attempts to free all inodes (including dirty inodes) for a given superblock. - */ -void invalidate_inodes(struct super_block *sb) -{ - struct inode *inode, *next; - LIST_HEAD(dispose); - -again: - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { - spin_lock(&inode->i_lock); - if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { - spin_unlock(&inode->i_lock); - continue; - } - if (atomic_read(&inode->i_count)) { - spin_unlock(&inode->i_lock); - continue; - } - - inode->i_state |= I_FREEING; - inode_lru_list_del(inode); - spin_unlock(&inode->i_lock); - list_add(&inode->i_lru, &dispose); - if (need_resched()) { - spin_unlock(&sb->s_inode_list_lock); - cond_resched(); - dispose_list(&dispose); - goto again; - } - } - spin_unlock(&sb->s_inode_list_lock); - - dispose_list(&dispose); -} - /* * Isolate the inode from the LRU in preparation for freeing it. * diff --git a/fs/internal.h b/fs/internal.h index 8c1b7acbbe8f..37749b429e80 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -207,7 +207,6 @@ bool in_group_or_capable(struct mnt_idmap *idmap, * fs-writeback.c */ extern long get_nr_dirty_inodes(void); -void invalidate_inodes(struct super_block *sb); /* * dcache.c diff --git a/fs/super.c b/fs/super.c index 1db230432960..a16e6a6342e0 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1417,7 +1417,7 @@ static void fs_bdev_mark_dead(struct block_device *bdev, bool surprise) if (!surprise) sync_filesystem(sb); shrink_dcache_sb(sb); - invalidate_inodes(sb); + evict_inodes(sb); if (sb->s_op->shutdown) sb->s_op->shutdown(sb); From patchwork Wed Oct 2 01:33:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819223 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 990265227 for ; Wed, 2 Oct 2024 01:40:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833226; cv=none; b=krO9kxvEk2gJVrO5CNZ1Qb/7Mimmyn1v1bFj0zVTgiP//Sl9QvVxEvaK9vcDqr2aLdaaIkEYAnEIgxoy8q7kyZUAhRbPd0J7XxA4rZM3Vsfd6H1RQwDg6fvVrRtdJ8Ca3l02hTBGnlK52AdLGL/JQJucm5uYQ2jvjpBh2/l5R4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833226; c=relaxed/simple; bh=x4VXB9HOsPl6CApEgyrJMBzKsAJyIhcNcEvpxp9sYPU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HmuGcswT33thYiF0PsVun7QPGYJZ5WgQBKW9RLarCT6i9txBORk0nHofJywZq6SdmIr4qBVlWGmsnA11+U55EEn0oK+/D06RYzVPOqkvY7gUocQGDrN/PKG+as7hJSBh/4vXVXgQUcOTwtIXpW/vc4w0QsxxGrLFaT6YSYFd+fI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=vWYaAS7d; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="vWYaAS7d" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2e07d91f78aso4670003a91.1 for ; Tue, 01 Oct 2024 18:40:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833224; x=1728438024; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=78eie/LLOelu1UXuQO5bRLoa7IK7mnMlbWGf/OPUPFU=; b=vWYaAS7d9j8okrBpmU4GGWbrYQRf4NnvKU4CmfbnPbfM0VjLM5v1w4eTykDHwTm9ri hZUFN6vXvTZwqJ47FgaMtgApfeqiWiKxSIQGJc5GhllA5jcTKdWMFJoVKaB/acz9IZ9m m5rnqhpcB8idMf/vJLjNtVnlN69lb2qXrzFJG1xaTtzUnuWnVG3LTZaRtthdXrG75L3t fJtozMHSjtAeL5YbVkFHZKiE45RNCBZvlzbXbshsIpF0R/jB1IEfofE71yXMTA7YNq8O oKTmz/XUwCcRJhn9i3EPGTOxUZvYGkQIF0PNQvr24HuibkFS5VOcisg6nyAc0kZdkQu1 vaoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833224; x=1728438024; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=78eie/LLOelu1UXuQO5bRLoa7IK7mnMlbWGf/OPUPFU=; b=fAg3yK1b8bVzjJrd80PqkkB34gqPxCah65rSnyosAdwofZfkfONm3RuUT8rrUsLhHJ c8ihM89ic1bMfy7rKZCtC4WElbt00DelGFgQrRtngW1WPutbJ2JqlTczkwdC2l1enOCa AUOguJdkno9ODno6cMiQ7JhjpswHawvoyMf6tlwXqnX5JCQp6zkXnM2awvEul3RdMFXJ zSiOQCAv7QnNR6kfsfKdAZ928cqw8UVTLoyZxWRpUoO0e9mohGzxeE8ET5URiKkI9mbU 9woT5db92vD/F5BsS2YDEuLSSbEsd17hbatuW4lBKAfjvm8kQV9S9lxibiP7mtuHM3BY pmoQ== X-Gm-Message-State: AOJu0YzrkgtnogxlwSz2kL1swyYstgf3FiHFeSDNexyaSdrUWHUiiGJj Q3TAPIqXA4zxjQi3eYi4LzSQ1338Q/VranXMORyuwjgmVrLSIWjLXtFBvCp411QFb99rjtLBCoh J X-Google-Smtp-Source: AGHT+IGe1/8W+T6DPLuQCgARfncRvysLkH9GHOcsLNaf6AkJBBkqfdn93VL2stU4zGFnBlbPjsCQgw== X-Received: by 2002:a17:90a:f284:b0:2d8:d58b:52c8 with SMTP id 98e67ed59e1d1-2e1846b764cmr2159859a91.19.1727833223881; Tue, 01 Oct 2024 18:40:23 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e18f8cda33sm329076a91.44.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8T-0I; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGG-297f; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 2/7] vfs: add inode iteration superblock method Date: Wed, 2 Oct 2024 11:33:19 +1000 Message-ID: <20241002014017.3801899-3-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner Add a new superblock method for iterating all cached inodes in the inode cache. This will be used to replace the explicit sb->s_inodes iteration, and the caller will supply a callback function and a private data pointer that gets passed to the callback along with each inode that is iterated. There are two iteration functions provided. The first is the interface that everyone should be using - it provides an valid, unlocked and referenced inode that any inode operation (including blocking operations) is allowed on. The iterator infrastructure is responsible for lifecycle management, hence the subsystem callback only needs to implement the operation it wants to perform on all inodes. The second iterator interface is the unsafe variant for internal VFS use only. It simply iterates all VFS inodes without guaranteeing any state or taking references. This iteration is done under a RCU read lock to ensure that the VFS inode is not freed from under the callback. If the operation wishes to block, it must drop the RCU context after guaranteeing that the inode will not get freed. This unsafe iteration mechanism is needed for operations that need tight control over the state of the inodes they need to operate on. This mechanism allows the existing sb->s_inodes iteration models to be maintained, allowing a generic implementation for iterating all cached inodes on the superblock to be provided. Signed-off-by: Dave Chinner --- fs/internal.h | 2 + fs/super.c | 105 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 12 ++++++ 3 files changed, 119 insertions(+) diff --git a/fs/internal.h b/fs/internal.h index 37749b429e80..7039d13980c6 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -127,6 +127,8 @@ struct super_block *user_get_super(dev_t, bool excl); void put_super(struct super_block *sb); extern bool mount_capable(struct fs_context *); int sb_init_dio_done_wq(struct super_block *sb); +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data); /* * Prepare superblock for changing its read-only state (i.e., either remount diff --git a/fs/super.c b/fs/super.c index a16e6a6342e0..20a9446d943a 100644 --- a/fs/super.c +++ b/fs/super.c @@ -167,6 +167,111 @@ static void super_wake(struct super_block *sb, unsigned int flag) wake_up_var(&sb->s_flags); } +/** + * super_iter_inodes - iterate all the cached inodes on a superblock + * @sb: superblock to iterate + * @iter_fn: callback to run on every inode found. + * + * This function iterates all cached inodes on a superblock that are not in + * the process of being initialised or torn down. It will run @iter_fn() with + * a valid, referenced inode, so it is safe for the caller to do anything + * it wants with the inode except drop the reference the iterator holds. + * + */ +int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags) +{ + struct inode *inode, *old_inode = NULL; + int ret = 0; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { + spin_unlock(&inode->i_lock); + continue; + } + + /* + * Skip over zero refcount inode if the caller only wants + * referenced inodes to be iterated. + */ + if ((flags & INO_ITER_REFERENCED) && + !atomic_read(&inode->i_count)) { + spin_unlock(&inode->i_lock); + continue; + } + + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + iput(old_inode); + + ret = iter_fn(inode, private_data); + + old_inode = inode; + if (ret == INO_ITER_ABORT) { + ret = 0; + break; + } + if (ret < 0) + break; + + cond_resched(); + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); + iput(old_inode); + return ret; +} + +/** + * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock + * @sb: superblock to iterate + * @iter_fn: callback to run on every inode found. + * + * This is almost certainly not the function you want. It is for internal VFS + * operations only. Please use super_iter_inodes() instead. If you must use + * this function, please add a comment explaining why it is necessary and the + * locking that makes it safe to use this function. + * + * This function iterates all cached inodes on a superblock that are attached to + * the superblock. It will pass each inode to @iter_fn unlocked and without + * having performed any existences checks on it. + + * @iter_fn must perform all necessary state checks on the inode itself to + * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the + * inode exists and won't be freed whilst the callback is running. + * + * @iter_fn must not block. It is run in an atomic context that is not allowed + * to sleep to provide the inode existence guarantees. If the callback needs to + * do blocking operations it needs to track the inode itself and defer those + * operations until after the iteration completes. + * + * @iter_fn must provide conditional reschedule checks itself. If rescheduling + * or deferred processing is needed, it must return INO_ITER_ABORT to return to + * the high level function to perform those operations. It can then restart the + * iteration again. The high level code must provide forwards progress + * guarantees if they are necessary. + * + */ +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data) +{ + struct inode *inode; + int ret; + + rcu_read_lock(); + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + ret = iter_fn(inode, private_data); + if (ret == INO_ITER_ABORT) + break; + } + spin_unlock(&sb->s_inode_list_lock); + rcu_read_unlock(); +} + /* * One thing we have to be careful of with a per-sb shrinker is that we don't * drop the last active reference to the superblock from within the shrinker. diff --git a/include/linux/fs.h b/include/linux/fs.h index eae5b67e4a15..0a6a462c45ab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2213,6 +2213,18 @@ enum freeze_holder { FREEZE_MAY_NEST = (1U << 2), }; +/* Inode iteration callback return values */ +#define INO_ITER_DONE 0 +#define INO_ITER_ABORT 1 + +/* Inode iteration control flags */ +#define INO_ITER_REFERENCED (1U << 0) +#define INO_ITER_UNSAFE (1U << 1) + +typedef int (*ino_iter_fn)(struct inode *inode, void *priv); +int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags); + struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); From patchwork Wed Oct 2 01:33:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819226 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54C097489 for ; Wed, 2 Oct 2024 01:40:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833228; cv=none; b=qvJNO/GGh8AMYiBFKu5/5Nw6WPmHcFng6oTzcDjgpi191qW2I+uNHRhfQFBKtgIlsAEo9FpwEK1XjZHpraSOf2qamYsSFGczttIAkpwNit5K5o2aTgjtdFNUCU8uO8uzgNqOz200WOPNgGjDRg/Ky0ueQjN/JCHO5Q9zoDWy8+M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833228; c=relaxed/simple; bh=+XKGTg2Fm2Vz3At0AuJQbYflFNLhbpQLe87wbKWotS4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qVJ6uJxiPLaCZzxjLzfk6f0nv0Rp4QlV5zj3FVbLo7L5xAjR+z03RWQtJALVLomfTSnyhWVmAxdODysDlPdjm6aJgpwqhiWqfrvu0YY5hzmg1ZWu3AxPOl8+F9PshCuCd9zzl1IgBS3bv9wQTJVCaBxNsH8yY9wefEmQmQMEWyo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=IEBqPUxS; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="IEBqPUxS" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-71971d2099cso5022195b3a.2 for ; Tue, 01 Oct 2024 18:40:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833224; x=1728438024; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lcwJDAj1IFVpreD1ammGhtQbYCd8gl9K3mp7z+9tlY8=; b=IEBqPUxSztEu5nQGwcIjCoyJPHNlsPs5IJKGyi7iD8WWXdx2B38b4wbwA/qAhrL8gP vHAGLpGqNQJcTN9K5GMHp23RYTcsSA+xpZlSlLtcSf66qPe3knfMO/zrNtaesAtX/f6F 3DLVkSjP8xZzL1rc8IgLB0cESFGxf7x0EQtbKL6r9af2J/ypEDdkmm37Akh9a01cPjWi QtMT9C64vQuQAk+d9pnsdIEgAQalYx4rJTDlVck/CHzGJuSPIkh5TBiHv/J40kZxCoWs q5qGC0mB0guQ03wfJ3pIbnErDfpfGDTOtwt5YC9gGiW5TtVmEMXe3g3pFSQxBYLFkyds kjrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833224; x=1728438024; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lcwJDAj1IFVpreD1ammGhtQbYCd8gl9K3mp7z+9tlY8=; b=Q9r5P58F54FBPI8gLKZfhledbKKouYy60PfnksVuCLh/S0QP+aYCcTWG+7o+O0NPgD aJ3f2tUw7dIbz84U1Ig600nKV5Ej8s3gj8WiO4vM1UX9fWwiO9Dq536ggcvLIeYvtaFd o+sSMxU2r0sI+LNcFfvKUbi0ukhUV31buecLcW++sJ7uYO3RHo+rLWq1rGdMAk1rkyjj 3eil1CienMAXnWWNbkX0q2qLGcqgyoh6cbrhHkCh0XXwBmbvFmddn7VubfYhDnrk2Qp0 erWfANvbvj+yLW5aVNai09Eq9YxK8E9mI2yEet+di1eeO7bj5SFcEcCnCWIik15xGooP hyFA== X-Gm-Message-State: AOJu0YxjsjqGZQlio5l9NIdSWUCmoNY9DcWc3NL5hgLfvsmuC586LV5W 2l0vlhNJWX7bWsGa7fHMISdbERIM2RoALDnorWolWBxSZBurAsI1hYJviWYkypA+up2cJ8hQX6n f X-Google-Smtp-Source: AGHT+IFa5VKdpW6OvBWszejvcA/9C9ZTu/6tm7n5XHiTdlPJ5OMaqnIPrAPaDLI1mfIM4f+OniOK5w== X-Received: by 2002:a05:6a00:ace:b0:717:8cef:4053 with SMTP id d2e1a72fcca58-71dc5ca5780mr2190927b3a.14.1727833224360; Tue, 01 Oct 2024 18:40:24 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71b2649c2bfsm8742445b3a.28.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8Y-0N; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGK-2Iqj; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() Date: Wed, 2 Oct 2024 11:33:20 +1000 Message-ID: <20241002014017.3801899-4-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner Convert VFS internal superblock inode iterators that cannot use referenced inodes to the new super_iter_inodes_unsafe() iterator. Dquot and inode eviction require this special handling due to special eviction handling requirements. The special nr_blockdev_pages() statistics code needs it as well, as this is called from si_meminfo() and so can potentially be run from locations where arbitrary blocking is not allowed or desirable. New cases using this iterator need careful consideration. Signed-off-by: Dave Chinner --- block/bdev.c | 24 +++++++++++---- fs/inode.c | 79 ++++++++++++++++++++++++++---------------------- fs/quota/dquot.c | 72 ++++++++++++++++++++++++------------------- 3 files changed, 102 insertions(+), 73 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 33f9c4605e3a..b5a362156ca1 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -472,16 +472,28 @@ void bdev_drop(struct block_device *bdev) iput(BD_INODE(bdev)); } +static int bdev_pages_count(struct inode *inode, void *data) +{ + long *pages = data; + + *pages += inode->i_mapping->nrpages; + return INO_ITER_DONE; +} + long nr_blockdev_pages(void) { - struct inode *inode; long ret = 0; - spin_lock(&blockdev_superblock->s_inode_list_lock); - list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) - ret += inode->i_mapping->nrpages; - spin_unlock(&blockdev_superblock->s_inode_list_lock); - + /* + * We can be called from contexts where blocking is not + * desirable. The count is advisory at best, and we only + * need to access the inode mapping. Hence as long as we + * have an inode existence guarantee, we can safely count + * the cached pages on each inode without needing reference + * counted inodes. + */ + super_iter_inodes_unsafe(blockdev_superblock, + bdev_pages_count, &ret); return ret; } diff --git a/fs/inode.c b/fs/inode.c index 0a53d8c34203..3f335f78c5b2 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -761,8 +761,11 @@ static void evict(struct inode *inode) * Dispose-list gets a local list with local inodes in it, so it doesn't * need to worry about list corruption and SMP locks. */ -static void dispose_list(struct list_head *head) +static bool dispose_list(struct list_head *head) { + if (list_empty(head)) + return false; + while (!list_empty(head)) { struct inode *inode; @@ -772,6 +775,7 @@ static void dispose_list(struct list_head *head) evict(inode); cond_resched(); } + return true; } /** @@ -783,47 +787,50 @@ static void dispose_list(struct list_head *head) * so any inode reaching zero refcount during or after that call will * be immediately evicted. */ +static int evict_inode_fn(struct inode *inode, void *data) +{ + struct list_head *dispose = data; + + spin_lock(&inode->i_lock); + if (atomic_read(&inode->i_count) || + (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) { + spin_unlock(&inode->i_lock); + return INO_ITER_DONE; + } + + inode->i_state |= I_FREEING; + inode_lru_list_del(inode); + spin_unlock(&inode->i_lock); + list_add(&inode->i_lru, dispose); + + /* + * If we've run long enough to need rescheduling, abort the + * iteration so we can return to evict_inodes() and dispose of the + * inodes before collecting more inodes to evict. + */ + if (need_resched()) + return INO_ITER_ABORT; + return INO_ITER_DONE; +} + void evict_inodes(struct super_block *sb) { - struct inode *inode, *next; LIST_HEAD(dispose); -again: - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { - if (atomic_read(&inode->i_count)) - continue; - - spin_lock(&inode->i_lock); - if (atomic_read(&inode->i_count)) { - spin_unlock(&inode->i_lock); - continue; - } - if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { - spin_unlock(&inode->i_lock); - continue; - } - - inode->i_state |= I_FREEING; - inode_lru_list_del(inode); - spin_unlock(&inode->i_lock); - list_add(&inode->i_lru, &dispose); - + do { /* - * We can have a ton of inodes to evict at unmount time given - * enough memory, check to see if we need to go to sleep for a - * bit so we don't livelock. + * We do not want to take references to inodes whilst iterating + * because we are trying to evict unreferenced inodes from + * the cache. Hence we need to use the unsafe iteration + * mechanism and do all the required inode validity checks in + * evict_inode_fn() to safely queue unreferenced inodes for + * eviction. + * + * We repeat the iteration until it doesn't find any more + * inodes to dispose of. */ - if (need_resched()) { - spin_unlock(&sb->s_inode_list_lock); - cond_resched(); - dispose_list(&dispose); - goto again; - } - } - spin_unlock(&sb->s_inode_list_lock); - - dispose_list(&dispose); + super_iter_inodes_unsafe(sb, evict_inode_fn, &dispose); + } while (dispose_list(&dispose)); } EXPORT_SYMBOL_GPL(evict_inodes); diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index b40410cd39af..ea0bd807fed7 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1075,41 +1075,51 @@ static int add_dquot_ref(struct super_block *sb, int type) return err; } +struct dquot_ref_data { + int type; + int reserved; +}; + +static int remove_dquot_ref_fn(struct inode *inode, void *data) +{ + struct dquot_ref_data *ref = data; + + spin_lock(&dq_data_lock); + if (!IS_NOQUOTA(inode)) { + struct dquot __rcu **dquots = i_dquot(inode); + struct dquot *dquot = srcu_dereference_check( + dquots[ref->type], &dquot_srcu, + lockdep_is_held(&dq_data_lock)); + +#ifdef CONFIG_QUOTA_DEBUG + if (unlikely(inode_get_rsv_space(inode) > 0)) + ref->reserved++; +#endif + rcu_assign_pointer(dquots[ref->type], NULL); + if (dquot) + dqput(dquot); + } + spin_unlock(&dq_data_lock); + return INO_ITER_DONE; +} + static void remove_dquot_ref(struct super_block *sb, int type) { - struct inode *inode; -#ifdef CONFIG_QUOTA_DEBUG - int reserved = 0; -#endif - - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - /* - * We have to scan also I_NEW inodes because they can already - * have quota pointer initialized. Luckily, we need to touch - * only quota pointers and these have separate locking - * (dq_data_lock). - */ - spin_lock(&dq_data_lock); - if (!IS_NOQUOTA(inode)) { - struct dquot __rcu **dquots = i_dquot(inode); - struct dquot *dquot = srcu_dereference_check( - dquots[type], &dquot_srcu, - lockdep_is_held(&dq_data_lock)); + struct dquot_ref_data ref = { + .type = type, + }; + /* + * We have to scan I_NEW inodes because they can already + * have quota pointer initialized. Luckily, we need to touch + * only quota pointers and these have separate locking + * (dq_data_lock) so the existence guarantee that + * super_iter_inodes_unsafe() provides inodes passed to + * remove_dquot_ref_fn() is sufficient for this operation. + */ + super_iter_inodes_unsafe(sb, remove_dquot_ref_fn, &ref); #ifdef CONFIG_QUOTA_DEBUG - if (unlikely(inode_get_rsv_space(inode) > 0)) - reserved = 1; -#endif - rcu_assign_pointer(dquots[type], NULL); - if (dquot) - dqput(dquot); - } - spin_unlock(&dq_data_lock); - } - spin_unlock(&sb->s_inode_list_lock); -#ifdef CONFIG_QUOTA_DEBUG - if (reserved) { + if (ref.reserved) { printk(KERN_WARNING "VFS (%s): Writes happened after quota" " was disabled thus quota information is probably " "inconsistent. Please run quotacheck(8).\n", sb->s_id); From patchwork Wed Oct 2 01:33:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819227 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30845B66E for ; Wed, 2 Oct 2024 01:40:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833228; cv=none; b=nrmMaGj7bVXRprR6Ywssr0uqKkAI16MHKYT/51x5dmPwbihBQw3d9PvYVXaOgYWvlR3NoqspjsH5Qzj/rIJ9DoqflpbwmHYY8mwxIi0PjRFm2rxdTRGiFNAHfwsHChb+MUSkLVVgrWB9hH97FGaIeJd1JGEGdEcpiVDBbSRYZHU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833228; c=relaxed/simple; bh=fKLCm/Cgg+BtkT6PG8erpU4WyCSoo61yY/4mTCiiTfc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IttULuaZ9eSmpd2d1uAI35YKG+rFkgqizDPu9WobMlscO3uoFJl479WcTyqFXIQkxl3c3Zd+pIc5TeJEh4YvgYP72Yp4RKmM3AkmGTVRtqOjAIDR3yaj/qeoN2AwZ+aEC2gluB++KSAFRIf9zUp5u4yU0NXwSixItxlbNb07dB0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=rlwKY6Af; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="rlwKY6Af" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-207115e3056so54551855ad.2 for ; Tue, 01 Oct 2024 18:40:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833225; x=1728438025; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vwfcVJq7o+XgwN+06xi8VPz55XdNhbrMI0UMCxMDdeo=; b=rlwKY6AfpFGLqmKqoVjPXRCko55NGwTtnSbW9XTXfXzLFUKCqOdlRGl4lI5GWT+mK5 7vDNDGEt/ncWUAbh08rv7h6frb8oFwrhOSKP1z7ZwDx5G1Kv8WzJZoVRm47hbjtfz9oA jmMYZrNYwLEswvnV3nZKaSqrHqvXqzqUU8fkLM5wbUEcG76pmn0cAlMsCIlXV1nBjD+e fSmlWrjVG4Cj5WA1/D/HswkaSSFV2qTkjrc5q1aK1U4EqxiQyhmdjYgHmz/MVzx/k0lT 40+B+ID5BFWQXQwCSzVNCvr21yLW2wD855ciMXctrWceWWsg558AG1jjFacBVQJUJX9q BHSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833225; x=1728438025; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vwfcVJq7o+XgwN+06xi8VPz55XdNhbrMI0UMCxMDdeo=; b=v2Q/vaLjKWr3Dpe9tZRP2xSCOcgDvb6IK1BiWkgixEEdXCnzTM5mo/3CANLCXgdzTk MVJc6+pjwj9Cb1ePbBcIK/fETB8xBV34dq9WiAfc9g0s7o0mZF5UHFtQjjtcSVQBJ99Y Qcrm7JmF6N1ol9iXOPKBYc5/p+CDUYF0JsqJCP2lglsCcRm1wP2Lf006NVNg6Mk+216I eR4Af4j5X9QPECsM+P+NM/yKvndRfmEkeYGA4ZYtr7MTikpyKqFp4EbJ/xxkgL9GY/IJ k9ZIbn0f+Ps2aQLNKJAHLs2ctsqAv6MIgsOkRczetk3PPctETVu85rOK70bKcRNxZbQ/ g4aA== X-Gm-Message-State: AOJu0Yzw8XBkl9/1oUtlNDERVjwfqA0h809dfKkZJ6JUi5t2qFnK5Ipa ASlAo8d4JAAVCHOql488zg3CBRvsV0pyLAcLiADMpWBaBAcpiP9fQ9JyCWKIkjkertz/S+PPVfY w X-Google-Smtp-Source: AGHT+IEm41b4rrxSuTtQrNmXz+VS03pIIfUgDjrgzxeEWMyE94AIgjNykG0g2L7Q9ZQ61XJ4d+evEQ== X-Received: by 2002:a17:903:41ce:b0:20b:b39d:9735 with SMTP id d9443c01a7336-20bc5a7fb62mr20130495ad.54.1727833225054; Tue, 01 Oct 2024 18:40:25 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20b37d67195sm75669825ad.37.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8b-0V; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGO-2SC8; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() Date: Wed, 2 Oct 2024 11:33:21 +1000 Message-ID: <20241002014017.3801899-5-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner Convert all the remaining superblock inode iterators to use super_iter_inodes(). These are mostly straight forward conversions for the iterations that use references, and the bdev use cases that didn't even validate the inode before dereferencing it are now inherently safe. Signed-off-by: Dave Chinner --- block/bdev.c | 76 ++++++++-------------- fs/drop_caches.c | 38 ++++------- fs/gfs2/ops_fstype.c | 67 ++++++------------- fs/notify/fsnotify.c | 75 ++++++--------------- fs/quota/dquot.c | 79 +++++++++-------------- security/landlock/fs.c | 143 ++++++++++++++--------------------------- 6 files changed, 154 insertions(+), 324 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index b5a362156ca1..5f720e12f731 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1226,56 +1226,36 @@ void bdev_mark_dead(struct block_device *bdev, bool surprise) */ EXPORT_SYMBOL_GPL(bdev_mark_dead); +static int sync_bdev_fn(struct inode *inode, void *data) +{ + struct block_device *bdev; + bool wait = *(bool *)data; + + if (inode->i_mapping->nrpages == 0) + return INO_ITER_DONE; + + bdev = I_BDEV(inode); + mutex_lock(&bdev->bd_disk->open_mutex); + if (!atomic_read(&bdev->bd_openers)) { + ; /* skip */ + } else if (wait) { + /* + * We keep the error status of individual mapping so + * that applications can catch the writeback error using + * fsync(2). See filemap_fdatawait_keep_errors() for + * details. + */ + filemap_fdatawait_keep_errors(inode->i_mapping); + } else { + filemap_fdatawrite(inode->i_mapping); + } + mutex_unlock(&bdev->bd_disk->open_mutex); + return INO_ITER_DONE; +} + void sync_bdevs(bool wait) { - struct inode *inode, *old_inode = NULL; - - spin_lock(&blockdev_superblock->s_inode_list_lock); - list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) { - struct address_space *mapping = inode->i_mapping; - struct block_device *bdev; - - spin_lock(&inode->i_lock); - if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) || - mapping->nrpages == 0) { - spin_unlock(&inode->i_lock); - continue; - } - __iget(inode); - spin_unlock(&inode->i_lock); - spin_unlock(&blockdev_superblock->s_inode_list_lock); - /* - * We hold a reference to 'inode' so it couldn't have been - * removed from s_inodes list while we dropped the - * s_inode_list_lock We cannot iput the inode now as we can - * be holding the last reference and we cannot iput it under - * s_inode_list_lock. So we keep the reference and iput it - * later. - */ - iput(old_inode); - old_inode = inode; - bdev = I_BDEV(inode); - - mutex_lock(&bdev->bd_disk->open_mutex); - if (!atomic_read(&bdev->bd_openers)) { - ; /* skip */ - } else if (wait) { - /* - * We keep the error status of individual mapping so - * that applications can catch the writeback error using - * fsync(2). See filemap_fdatawait_keep_errors() for - * details. - */ - filemap_fdatawait_keep_errors(inode->i_mapping); - } else { - filemap_fdatawrite(inode->i_mapping); - } - mutex_unlock(&bdev->bd_disk->open_mutex); - - spin_lock(&blockdev_superblock->s_inode_list_lock); - } - spin_unlock(&blockdev_superblock->s_inode_list_lock); - iput(old_inode); + super_iter_inodes(blockdev_superblock, sync_bdev_fn, &wait, 0); } /* diff --git a/fs/drop_caches.c b/fs/drop_caches.c index d45ef541d848..901cda15537f 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -16,36 +16,20 @@ /* A global variable is a bit ugly, but it keeps the code simple */ int sysctl_drop_caches; -static void drop_pagecache_sb(struct super_block *sb, void *unused) +static int invalidate_inode_fn(struct inode *inode, void *data) { - struct inode *inode, *toput_inode = NULL; - - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - spin_lock(&inode->i_lock); - /* - * We must skip inodes in unusual state. We may also skip - * inodes without pages but we deliberately won't in case - * we need to reschedule to avoid softlockups. - */ - if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || - (mapping_empty(inode->i_mapping) && !need_resched())) { - spin_unlock(&inode->i_lock); - continue; - } - __iget(inode); - spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); - + if (!mapping_empty(inode->i_mapping)) invalidate_mapping_pages(inode->i_mapping, 0, -1); - iput(toput_inode); - toput_inode = inode; + return INO_ITER_DONE; +} - cond_resched(); - spin_lock(&sb->s_inode_list_lock); - } - spin_unlock(&sb->s_inode_list_lock); - iput(toput_inode); +/* + * Note: it would be nice to check mapping_empty() before we get a reference on + * the inode in super_iter_inodes(), but that's a future optimisation. + */ +static void drop_pagecache_sb(struct super_block *sb, void *unused) +{ + super_iter_inodes(sb, invalidate_inode_fn, NULL, 0); } int drop_caches_sysctl_handler(const struct ctl_table *table, int write, diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index e83d293c3614..f20862614ad6 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1714,53 +1714,10 @@ static int gfs2_meta_init_fs_context(struct fs_context *fc) return 0; } -/** - * gfs2_evict_inodes - evict inodes cooperatively - * @sb: the superblock - * - * When evicting an inode with a zero link count, we are trying to upgrade the - * inode's iopen glock from SH to EX mode in order to determine if we can - * delete the inode. The other nodes are supposed to evict the inode from - * their caches if they can, and to poke the inode's inode glock if they cannot - * do so. Either behavior allows gfs2_upgrade_iopen_glock() to proceed - * quickly, but if the other nodes are not cooperating, the lock upgrading - * attempt will time out. Since inodes are evicted sequentially, this can add - * up quickly. - * - * Function evict_inodes() tries to keep the s_inode_list_lock list locked over - * a long time, which prevents other inodes from being evicted concurrently. - * This precludes the cooperative behavior we are looking for. This special - * version of evict_inodes() avoids that. - * - * Modeled after drop_pagecache_sb(). - */ -static void gfs2_evict_inodes(struct super_block *sb) +/* Nothing to do because we just want to bounce the inode through iput() */ +static int gfs2_evict_inode_fn(struct inode *inode, void *data) { - struct inode *inode, *toput_inode = NULL; - struct gfs2_sbd *sdp = sb->s_fs_info; - - set_bit(SDF_EVICTING, &sdp->sd_flags); - - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - spin_lock(&inode->i_lock); - if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) && - !need_resched()) { - spin_unlock(&inode->i_lock); - continue; - } - atomic_inc(&inode->i_count); - spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); - - iput(toput_inode); - toput_inode = inode; - - cond_resched(); - spin_lock(&sb->s_inode_list_lock); - } - spin_unlock(&sb->s_inode_list_lock); - iput(toput_inode); + return INO_ITER_DONE; } static void gfs2_kill_sb(struct super_block *sb) @@ -1779,7 +1736,23 @@ static void gfs2_kill_sb(struct super_block *sb) sdp->sd_master_dir = NULL; shrink_dcache_sb(sb); - gfs2_evict_inodes(sb); + /* + * When evicting an inode with a zero link count, we are trying to + * upgrade the inode's iopen glock from SH to EX mode in order to + * determine if we can delete the inode. The other nodes are supposed + * to evict the inode from their caches if they can, and to poke the + * inode's inode glock if they cannot do so. Either behavior allows + * gfs2_upgrade_iopen_glock() to proceed quickly, but if the other nodes + * are not cooperating, the lock upgrading attempt will time out. Since + * inodes are evicted sequentially, this can add up quickly. + * + * evict_inodes() tries to keep the s_inode_list_lock list locked over a + * long time, which prevents other inodes from being evicted + * concurrently. This precludes the cooperative behavior we are looking + * for. + */ + set_bit(SDF_EVICTING, &sdp->sd_flags); + super_iter_inodes(sb, gfs2_evict_inode_fn, NULL, 0); /* * Flush and then drain the delete workqueue here (via diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 272c8a1dab3c..68c34ed94271 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -28,63 +28,14 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt) fsnotify_clear_marks_by_mount(mnt); } -/** - * fsnotify_unmount_inodes - an sb is unmounting. handle any watched inodes. - * @sb: superblock being unmounted. - * - * Called during unmount with no locks held, so needs to be safe against - * concurrent modifiers. We temporarily drop sb->s_inode_list_lock and CAN block. - */ -static void fsnotify_unmount_inodes(struct super_block *sb) +static int fsnotify_unmount_inode_fn(struct inode *inode, void *data) { - struct inode *inode, *iput_inode = NULL; + spin_unlock(&inode->i_lock); - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - /* - * We cannot __iget() an inode in state I_FREEING, - * I_WILL_FREE, or I_NEW which is fine because by that point - * the inode cannot have any associated watches. - */ - spin_lock(&inode->i_lock); - if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) { - spin_unlock(&inode->i_lock); - continue; - } - - /* - * If i_count is zero, the inode cannot have any watches and - * doing an __iget/iput with SB_ACTIVE clear would actually - * evict all inodes with zero i_count from icache which is - * unnecessarily violent and may in fact be illegal to do. - * However, we should have been called /after/ evict_inodes - * removed all zero refcount inodes, in any case. Test to - * be sure. - */ - if (!atomic_read(&inode->i_count)) { - spin_unlock(&inode->i_lock); - continue; - } - - __iget(inode); - spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); - - iput(iput_inode); - - /* for each watch, send FS_UNMOUNT and then remove it */ - fsnotify_inode(inode, FS_UNMOUNT); - - fsnotify_inode_delete(inode); - - iput_inode = inode; - - cond_resched(); - spin_lock(&sb->s_inode_list_lock); - } - spin_unlock(&sb->s_inode_list_lock); - - iput(iput_inode); + /* for each watch, send FS_UNMOUNT and then remove it */ + fsnotify_inode(inode, FS_UNMOUNT); + fsnotify_inode_delete(inode); + return INO_ITER_DONE; } void fsnotify_sb_delete(struct super_block *sb) @@ -95,7 +46,19 @@ void fsnotify_sb_delete(struct super_block *sb) if (!sbinfo) return; - fsnotify_unmount_inodes(sb); + /* + * If i_count is zero, the inode cannot have any watches and + * doing an __iget/iput with SB_ACTIVE clear would actually + * evict all inodes with zero i_count from icache which is + * unnecessarily violent and may in fact be illegal to do. + * However, we should have been called /after/ evict_inodes + * removed all zero refcount inodes, in any case. Hence we use + * INO_ITER_REFERENCED to ensure zero refcount inodes are filtered + * properly. + */ + super_iter_inodes(sb, fsnotify_unmount_inode_fn, NULL, + INO_ITER_REFERENCED); + fsnotify_clear_marks_by_sb(sb); /* Wait for outstanding object references from connectors */ wait_var_event(fsnotify_sb_watched_objects(sb), diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index ea0bd807fed7..ea9fce7acd1b 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1017,56 +1017,40 @@ static int dqinit_needed(struct inode *inode, int type) return 0; } +struct dquot_ref_data { + int type; + int reserved; +}; + +static int add_dquot_ref_fn(struct inode *inode, void *data) +{ + struct dquot_ref_data *ref = data; + int ret; + + if (!dqinit_needed(inode, ref->type)) + return INO_ITER_DONE; + +#ifdef CONFIG_QUOTA_DEBUG + if (unlikely(inode_get_rsv_space(inode) > 0)) + ref->reserved++; +#endif + ret = __dquot_initialize(inode, ref->type); + if (ret < 0) + return ret; + return INO_ITER_DONE; +} + /* This routine is guarded by s_umount semaphore */ static int add_dquot_ref(struct super_block *sb, int type) { - struct inode *inode, *old_inode = NULL; -#ifdef CONFIG_QUOTA_DEBUG - int reserved = 0; -#endif - int err = 0; - - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - spin_lock(&inode->i_lock); - if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || - !atomic_read(&inode->i_writecount) || - !dqinit_needed(inode, type)) { - spin_unlock(&inode->i_lock); - continue; - } - __iget(inode); - spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); - -#ifdef CONFIG_QUOTA_DEBUG - if (unlikely(inode_get_rsv_space(inode) > 0)) - reserved = 1; -#endif - iput(old_inode); - err = __dquot_initialize(inode, type); - if (err) { - iput(inode); - goto out; - } + struct dquot_ref_data ref = { + .type = type, + }; + int err; - /* - * We hold a reference to 'inode' so it couldn't have been - * removed from s_inodes list while we dropped the - * s_inode_list_lock. We cannot iput the inode now as we can be - * holding the last reference and we cannot iput it under - * s_inode_list_lock. So we keep the reference and iput it - * later. - */ - old_inode = inode; - cond_resched(); - spin_lock(&sb->s_inode_list_lock); - } - spin_unlock(&sb->s_inode_list_lock); - iput(old_inode); -out: + err = super_iter_inodes(sb, add_dquot_ref_fn, &ref, 0); #ifdef CONFIG_QUOTA_DEBUG - if (reserved) { + if (ref.reserved) { quota_error(sb, "Writes happened before quota was turned on " "thus quota information is probably inconsistent. " "Please run quotacheck(8)"); @@ -1075,11 +1059,6 @@ static int add_dquot_ref(struct super_block *sb, int type) return err; } -struct dquot_ref_data { - int type; - int reserved; -}; - static int remove_dquot_ref_fn(struct inode *inode, void *data) { struct dquot_ref_data *ref = data; diff --git a/security/landlock/fs.c b/security/landlock/fs.c index 7d79fc8abe21..013ec4017ddd 100644 --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -1223,109 +1223,60 @@ static void hook_inode_free_security_rcu(void *inode_security) /* * Release the inodes used in a security policy. - * - * Cf. fsnotify_unmount_inodes() and invalidate_inodes() */ +static int release_inode_fn(struct inode *inode, void *data) +{ + + rcu_read_lock(); + object = rcu_dereference(landlock_inode(inode)->object); + if (!object) { + rcu_read_unlock(); + return INO_ITER_DONE; + } + + /* + * If there is no concurrent release_inode() ongoing, then we + * are in charge of calling iput() on this inode, otherwise we + * will just wait for it to finish. + */ + spin_lock(&object->lock); + if (object->underobj != inode) { + spin_unlock(&object->lock); + rcu_read_unlock(); + return INO_ITER_DONE; + } + + object->underobj = NULL; + spin_unlock(&object->lock); + rcu_read_unlock(); + + /* + * Because object->underobj was not NULL, release_inode() and + * get_inode_object() guarantee that it is safe to reset + * landlock_inode(inode)->object while it is not NULL. It is therefore + * not necessary to lock inode->i_lock. + */ + rcu_assign_pointer(landlock_inode(inode)->object, NULL); + + /* + * At this point, we own the ihold() reference that was originally set + * up by get_inode_object() as well as the reference the inode iterator + * obtained before calling us. Therefore the following call to iput() + * will not sleep nor drop the inode because there is now at least two + * references to it. + */ + iput(inode); + return INO_ITER_DONE; +} + static void hook_sb_delete(struct super_block *const sb) { - struct inode *inode, *prev_inode = NULL; - if (!landlock_initialized) return; - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - struct landlock_object *object; + super_iter_inodes(sb, release_inode_fn, NULL, 0); - /* Only handles referenced inodes. */ - if (!atomic_read(&inode->i_count)) - continue; - - /* - * Protects against concurrent modification of inode (e.g. - * from get_inode_object()). - */ - spin_lock(&inode->i_lock); - /* - * Checks I_FREEING and I_WILL_FREE to protect against a race - * condition when release_inode() just called iput(), which - * could lead to a NULL dereference of inode->security or a - * second call to iput() for the same Landlock object. Also - * checks I_NEW because such inode cannot be tied to an object. - */ - if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) { - spin_unlock(&inode->i_lock); - continue; - } - - rcu_read_lock(); - object = rcu_dereference(landlock_inode(inode)->object); - if (!object) { - rcu_read_unlock(); - spin_unlock(&inode->i_lock); - continue; - } - /* Keeps a reference to this inode until the next loop walk. */ - __iget(inode); - spin_unlock(&inode->i_lock); - - /* - * If there is no concurrent release_inode() ongoing, then we - * are in charge of calling iput() on this inode, otherwise we - * will just wait for it to finish. - */ - spin_lock(&object->lock); - if (object->underobj == inode) { - object->underobj = NULL; - spin_unlock(&object->lock); - rcu_read_unlock(); - - /* - * Because object->underobj was not NULL, - * release_inode() and get_inode_object() guarantee - * that it is safe to reset - * landlock_inode(inode)->object while it is not NULL. - * It is therefore not necessary to lock inode->i_lock. - */ - rcu_assign_pointer(landlock_inode(inode)->object, NULL); - /* - * At this point, we own the ihold() reference that was - * originally set up by get_inode_object() and the - * __iget() reference that we just set in this loop - * walk. Therefore the following call to iput() will - * not sleep nor drop the inode because there is now at - * least two references to it. - */ - iput(inode); - } else { - spin_unlock(&object->lock); - rcu_read_unlock(); - } - - if (prev_inode) { - /* - * At this point, we still own the __iget() reference - * that we just set in this loop walk. Therefore we - * can drop the list lock and know that the inode won't - * disappear from under us until the next loop walk. - */ - spin_unlock(&sb->s_inode_list_lock); - /* - * We can now actually put the inode reference from the - * previous loop walk, which is not needed anymore. - */ - iput(prev_inode); - cond_resched(); - spin_lock(&sb->s_inode_list_lock); - } - prev_inode = inode; - } - spin_unlock(&sb->s_inode_list_lock); - - /* Puts the inode reference from the last loop walk, if any. */ - if (prev_inode) - iput(prev_inode); - /* Waits for pending iput() in release_inode(). */ + /* Waits for pending iput()s in release_inode(). */ wait_var_event(&landlock_superblock(sb)->inode_refs, !atomic_long_read(&landlock_superblock(sb)->inode_refs)); } From patchwork Wed Oct 2 01:33:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819221 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C71BB23C9 for ; Wed, 2 Oct 2024 01:40:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833225; cv=none; b=jRJiJNQXBVslQVIEc/+xzVaYh7vPVHN0IHrG+I410s3dFQ3XoCWK19wdXga6OsDbnvq/bDQSxLwEKd1ZSfJoN4UOFH1KV0zVl3u2DHaksyAPhF20O2U37czbDFGIde4ITwGjhxbkem21BDSOmESEIEkcRowMFHeVQKQuGMP51B8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833225; c=relaxed/simple; bh=wERFCZi4NhTiU1dRVVGYgy3VJ2JsE6+E60RIrOoOa6c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TGqHI20Aw/pfHUB2f30yxU9Oquus8n0pIxf/ycIqeUICAdUkG/PGbIUNqpoWrk44TwNCoj+q7aO+8M0qYHfC50376fyg3o7z0PQrHjMT6vMiRgs5FOxtL2sr1NOk3QOuYAnGpXr4cGpHcLG1SFAGa6YxSBB241rgJCmiPQBngrg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=1VgdHl3S; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="1VgdHl3S" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-7163489149eso5113145a12.1 for ; Tue, 01 Oct 2024 18:40:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833223; x=1728438023; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MqUz/n8MwDanhwIPscfHBkgek04h3sMszpax/ZNM448=; b=1VgdHl3SVjFiorXNao6kQKzzafwDHifkqX3eD8ytJjN+PbrW1t0ibxiFBoJ48TuV0A JyrPqhAe0GlVf1BCpAL7oVBCXizAAAG/PghQhTabKzJm3MCznkBKCy4YXFJijEdBNbYI MG7pV82d/oNXX0eEl+L/+2Hlr7cBsrKqwbKC9AYlbkNqAqectmvYiqMraOSShNofUf0w iAv6pvWTLOJQ0+N+DV/5R69tGNbTfdAYYeEyi9/q1uDZPX0/SvX/3eYNl8XfoZLy/1pR RtnuG1RjyFLdOIUxwbjGLA4j/KkZr5ubf3nlLdy4SMHNgdZnYXup1hTJHFqCMN3s5QWC omTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833223; x=1728438023; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MqUz/n8MwDanhwIPscfHBkgek04h3sMszpax/ZNM448=; b=cx8LFFDZH6oBCS+dhIia3dOG5VT1tuD9gdrOISqpOIxK0rFL1gYsMHUYFd9hLmXfsx /BTKDJyyPcdHSJDzYIocmG2EbbVnhCgmvWJ8sUEh8L2kv05DP7S0eIA0xLk0ZGU+Bisg ACy2bUQ1KcttfJ1IH+DuApvwwYfn6scNBws0kUtgfpbnEJfyuM8vg31/9cfBQX8IKerG UrvtR+GUqR3fMw2wmL6UN4qNAINDbhSS4CggYpqq8yHhCrmoG4Kzu6Xw60U4/vY6UgvM Yt3m53LPXUefpqdD2KP7Tog6xZZlz3fiJW4NvB8zsvILyuH7LGlmIUEehTgO6je+uF7G GDaQ== X-Gm-Message-State: AOJu0Yy30nY2Mbaha8KE02sirMXTiYaP1asRLThTlew/ECR5/kTsT5KP k87DK0+QzCbSKb+W4ylAf+qdpGyhuurePotbCI5nkKBee4tkvGkk3BI+6eKpKkeMNnpJ82iqRpw F X-Google-Smtp-Source: AGHT+IFI/J5a2+oGSxaIyhJos9JM1DKr2gAaXKr5CeqDLkHEJBj3amuR8ocZQfdzLPPtz0sAZagbSA== X-Received: by 2002:a05:6a20:c6cd:b0:1d5:1604:65e5 with SMTP id adf61e73a8af0-1d5e2d2f303mr2078892637.40.1727833222985; Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71b2653717esm8730866b3a.204.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8e-0h; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGV-2eHR; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 5/7] vfs: add inode iteration superblock method Date: Wed, 2 Oct 2024 11:33:22 +1000 Message-ID: <20241002014017.3801899-6-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner For filesytsems that provide their own inode cache that can be traversed, add a sueprblock method that can be used instead of iterating the sb->s_inodes list. This allows these filesystems to avoid having to populate the sb->s_inodes list and hence avoid the scalability limitations that this list imposes. Signed-off-by: Dave Chinner --- fs/super.c | 54 +++++++++++++++++++++++++++++++--------------- include/linux/fs.h | 4 ++++ 2 files changed, 41 insertions(+), 17 deletions(-) diff --git a/fs/super.c b/fs/super.c index 20a9446d943a..971ad4e996e0 100644 --- a/fs/super.c +++ b/fs/super.c @@ -167,6 +167,31 @@ static void super_wake(struct super_block *sb, unsigned int flag) wake_up_var(&sb->s_flags); } +bool super_iter_iget(struct inode *inode, int flags) +{ + bool ret = false; + + spin_lock(&inode->i_lock); + if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) + goto out_unlock; + + /* + * Skip over zero refcount inode if the caller only wants + * referenced inodes to be iterated. + */ + if ((flags & INO_ITER_REFERENCED) && + !atomic_read(&inode->i_count)) + goto out_unlock; + + __iget(inode); + ret = true; +out_unlock: + spin_unlock(&inode->i_lock); + return ret; + +} +EXPORT_SYMBOL_GPL(super_iter_iget); + /** * super_iter_inodes - iterate all the cached inodes on a superblock * @sb: superblock to iterate @@ -184,26 +209,15 @@ int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, struct inode *inode, *old_inode = NULL; int ret = 0; + if (sb->s_op->iter_vfs_inodes) { + return sb->s_op->iter_vfs_inodes(sb, iter_fn, + private_data, flags); + } + spin_lock(&sb->s_inode_list_lock); list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - spin_lock(&inode->i_lock); - if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { - spin_unlock(&inode->i_lock); + if (!super_iter_iget(inode, flags)) continue; - } - - /* - * Skip over zero refcount inode if the caller only wants - * referenced inodes to be iterated. - */ - if ((flags & INO_ITER_REFERENCED) && - !atomic_read(&inode->i_count)) { - spin_unlock(&inode->i_lock); - continue; - } - - __iget(inode); - spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); iput(old_inode); @@ -261,6 +275,12 @@ void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, struct inode *inode; int ret; + if (sb->s_op->iter_vfs_inodes) { + sb->s_op->iter_vfs_inodes(sb, iter_fn, + private_data, INO_ITER_UNSAFE); + return; + } + rcu_read_lock(); spin_lock(&sb->s_inode_list_lock); list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 0a6a462c45ab..8e82e3dc0618 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2224,6 +2224,7 @@ enum freeze_holder { typedef int (*ino_iter_fn)(struct inode *inode, void *priv); int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, void *private_data, int flags); +bool super_iter_iget(struct inode *inode, int flags); struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); @@ -2258,6 +2259,9 @@ struct super_operations { long (*free_cached_objects)(struct super_block *, struct shrink_control *); void (*shutdown)(struct super_block *sb); + + int (*iter_vfs_inodes)(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags); }; /* From patchwork Wed Oct 2 01:33:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819225 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87D5D79D0 for ; Wed, 2 Oct 2024 01:40:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833228; cv=none; b=c/RJn34RQvEwkfH4Q/qwWipFQMbfQPJWtzzQ8kFGvk7JmNp5DItBymCi+KK/9/KXMSmqirhhoTxSpwbTQqXFqjhS1b4temQtSn8VysFFkkEzXcSv3QLKRo1SOKRim8wsf/u7ZfmnQJmo9EdWqUODW8JR3EgJ2HO8dkl/Uat//Xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833228; c=relaxed/simple; bh=bpHYf/Ad8ZsA0mRcGw7Z3I00cIH6+OQCTVVfVhvX/+g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iPYoHloxnArS4tZYFoXobQsUea4x2Sj0QLcJTrLTlWc4D5jzTAiFgxjkSYRxHiO6bA2m3jDduVJ0wN01w6aT1vG/qvgMWIgT6TKyeJrUAx+2TfY7JVA+/rp6STbsCZ4gKQ6EbakHwlzXtfWKkRH/3L0fiHCaAiWmQNG6m6TY6U4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=C2QKZDwO; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="C2QKZDwO" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-20b90984971so28432505ad.3 for ; Tue, 01 Oct 2024 18:40:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833225; x=1728438025; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j2KUInWGphvxBKuc1DE8mW4DxZdaRvJhqE8Ppwgsdrk=; b=C2QKZDwOu+o06vyjqdvR17lD6yc7uD1ohVskO15G7t/7Iv4PkpeCd4pt3eqAuQ4k5O G6GtKMskGNFlaEWimF5zCzdjhL1AUo/ZN5pnkyAtX6s0APSggtzaWHyRrQXPPaI/0fZS EiL+PA0Z8sthoQFzLiCUYAkfUgJOAW6uxCW7bmGmv0xERl8l69YF+jX4Zc1bZY76kFHB gjQXtK8bBzXys0py4gMnDUTWrPpXrbbzQL+QT7bOmP4Zg8uiOKkGOrfLozB6ermMylVU NvtLI7FB0asN3KQN/JMF20MivdYYD8uq7Hn4menLMO4hdLyGmnH/hlsRip0A1bN8jWEX 77Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833225; x=1728438025; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j2KUInWGphvxBKuc1DE8mW4DxZdaRvJhqE8Ppwgsdrk=; b=hFlbd5Jg/mmJ1P/iO39QA2yHYs+tev0cNWBioAn8YFGzqI97hx/UQRb7DpZDUC9qHK sk8rTCoa8IXgVGUecq8bcSlz7LvVnaguhudAX7HENx+FVMNqpBP9qeYlRWg6cPvdd42F Z+GjN36EF1pOIhf0QG3cIe6FVM7bz0kga5OK/P/HEb3/4WyUA4KoJgOMLIoVodcxbbUU qXe56Dce8uELqtu0IYR9D08sAkCsYxxUp8TaOtPfSbXA/dlzGzVJD2NClo1T08zuh5Kr +geEHTrruZ8kdu/sQpSeMppnAMjoBHEZQaiuCTvKzVhU60Br4DgvXiNiH6wcfbcXoJD/ e++w== X-Gm-Message-State: AOJu0YxsSnETVPpwLYdJOv7rjth12xHznUtt488kzYS6fCNh0UGngjFC sOUKPWuMkeTJd+0sc4DBOMt55iq/qA3AM8uoSnUThnLG0hjM1Ws7/JcSpwH8vDhope23JflpOYd M X-Google-Smtp-Source: AGHT+IHbPXlGNlNBNCwkCtbLzpCwjr2fEuu8pNhpcjZTuOCY+/oRjkGhWGKft7Fxs8iSyiQkUNBAEQ== X-Received: by 2002:a17:90b:1811:b0:2d8:8c95:ebb6 with SMTP id 98e67ed59e1d1-2e18480148dmr2230415a91.19.1727833224689; Tue, 01 Oct 2024 18:40:24 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e18f93798bsm329604a91.53.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8m-0n; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGa-2nwW; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 6/7] xfs: implement sb->iter_vfs_inodes Date: Wed, 2 Oct 2024 11:33:23 +1000 Message-ID: <20241002014017.3801899-7-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner We can iterate all the in-memory VFS inodes via the xfs_icwalk() interface, so implement the new superblock operation to walk inodes in this way. This removes the dependency XFS has on the sb->s_inodes list and allows us to avoid the global lock that marshalls this list and must be taken on every VFS inode instantiation and eviction. This greatly improves the rate at which we can stream inodes through the VFS inode cache. Sharded, share-nothing cold cache workload with 100,000 files per thread in per-thread directories. Before: Filesystem Files Threads Create Walk Chmod Unlink Bulkstat xfs 400000 4 4.269 3.225 4.557 7.316 1.306 xfs 800000 8 4.844 3.227 4.702 7.905 1.908 xfs 1600000 16 6.286 3.296 5.592 8.838 4.392 xfs 3200000 32 8.912 5.681 8.505 11.724 7.085 xfs 6400000 64 15.344 11.144 14.162 18.604 15.494 After: Filesystem Files Threads Create Walk Chmod Unlink Bulkstat xfs 400000 4 4.140 3.502 4.154 7.242 1.164 xfs 800000 8 4.637 2.836 4.444 7.896 1.093 xfs 1600000 16 5.549 3.054 5.213 8.696 1.107 xfs 3200000 32 8.387 3.218 6.867 10.668 1.125 xfs 6400000 64 14.112 3.953 10.365 18.620 1.270 Bulkstat shows the real story here - before we start to see scalability problems at 16 threads. Patched shows almost perfect scalability up to 64 threads streaming inodes through the VFS cache using I_DONTCACHE semantics. Note: this is an initial, unoptimised implementation that could be significantly improved and reduced in size by using a radix tree tag filter for VFS inodes and so use the generic tag-filtered xfs_icwalk() implementation instead of special casing it like this patch does. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 151 +++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_icache.h | 3 + fs/xfs/xfs_iops.c | 1 - fs/xfs/xfs_super.c | 11 ++++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index a680e5b82672..ee544556cee7 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1614,6 +1614,155 @@ xfs_blockgc_free_quota( xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags); } +/* VFS Inode Cache Walking Code */ + +/* XFS inodes in these states are not visible to the VFS. */ +#define XFS_ITER_VFS_NOGRAB_IFLAGS (XFS_INEW | \ + XFS_NEED_INACTIVE | \ + XFS_INACTIVATING | \ + XFS_IRECLAIMABLE | \ + XFS_IRECLAIM) +/* + * If the inode we found is visible to the VFS inode cache, then return it to + * the caller. + * + * In the normal case, we need to validate the VFS inode state and take a + * reference to it here. We will drop that reference once the VFS inode has been + * processed by the ino_iter_fn. + * + * However, if the INO_ITER_UNSAFE flag is set, we do not take references to the + * inode - it is the ino_iter_fn's responsibility to validate the inode is still + * a VFS inode once we hand it to them. We do not drop references after + * processing these inodes; the processing function may have evicted the VFS + * inode from cache as part of it's processing. + */ +static bool +xfs_iter_vfs_igrab( + struct xfs_inode *ip, + int flags) +{ + struct inode *inode = VFS_I(ip); + bool ret = false; + + ASSERT(rcu_read_lock_held()); + + /* Check for stale RCU freed inode */ + spin_lock(&ip->i_flags_lock); + if (!ip->i_ino) + goto out_unlock_noent; + + if (ip->i_flags & XFS_ITER_VFS_NOGRAB_IFLAGS) + goto out_unlock_noent; + + if ((flags & INO_ITER_UNSAFE) || + super_iter_iget(inode, flags)) + ret = true; + +out_unlock_noent: + spin_unlock(&ip->i_flags_lock); + return ret; +} + +/* + * Initial implementation of vfs inode walker. This does not use batched lookups + * for initial simplicity and testing, though it could use them quite + * efficiently for both safe and unsafe iteration contexts. + */ +static int +xfs_icwalk_vfs_inodes_ag( + struct xfs_perag *pag, + ino_iter_fn iter_fn, + void *private_data, + int flags) +{ + struct xfs_mount *mp = pag->pag_mount; + uint32_t first_index = 0; + int ret = 0; + int nr_found; + bool done = false; + + do { + struct xfs_inode *ip; + + rcu_read_lock(); + nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, + (void **)&ip, first_index, 1); + if (!nr_found) { + rcu_read_unlock(); + break; + } + + /* + * Update the index for the next lookup. Catch + * overflows into the next AG range which can occur if + * we have inodes in the last block of the AG and we + * are currently pointing to the last inode. + */ + first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); + if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) + done = true; + + if (!xfs_iter_vfs_igrab(ip, flags)) { + rcu_read_unlock(); + continue; + } + + /* + * If we are doing an unsafe iteration, we must continue to hold + * the RCU lock across the callback to guarantee the existence + * of inode. We can't hold the rcu lock for reference counted + * inodes because the callback is allowed to block in that case. + */ + if (!(flags & INO_ITER_UNSAFE)) + rcu_read_unlock(); + + ret = iter_fn(VFS_I(ip), private_data); + + /* + * We've run the callback, so we can drop the existence + * guarantee we hold on the inode now. + */ + if (!(flags & INO_ITER_UNSAFE)) + iput(VFS_I(ip)); + else + rcu_read_unlock(); + + if (ret == INO_ITER_ABORT) { + ret = 0; + break; + } + if (ret < 0) + break; + + } while (!done); + + return ret; +} + +int +xfs_icwalk_vfs_inodes( + struct xfs_mount *mp, + ino_iter_fn iter_fn, + void *private_data, + int flags) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + int ret; + + for_each_perag(mp, agno, pag) { + ret = xfs_icwalk_vfs_inodes_ag(pag, iter_fn, + private_data, flags); + if (ret == INO_ITER_ABORT) { + ret = 0; + break; + } + if (ret < 0) + break; + } + return ret; +} + /* XFS Inode Cache Walking Code */ /* @@ -1624,7 +1773,6 @@ xfs_blockgc_free_quota( */ #define XFS_LOOKUP_BATCH 32 - /* * Decide if we want to grab this inode in anticipation of doing work towards * the goal. @@ -1700,7 +1848,6 @@ xfs_icwalk_ag( int i; rcu_read_lock(); - nr_found = radix_tree_gang_lookup_tag(&pag->pag_ici_root, (void **) batch, first_index, XFS_LOOKUP_BATCH, goal); diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 905944dafbe5..c2754ea28a88 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -18,6 +18,9 @@ struct xfs_icwalk { long icw_scan_limit; }; +int xfs_icwalk_vfs_inodes(struct xfs_mount *mp, ino_iter_fn iter_fn, + void *private_data, int flags); + /* Flags that reflect xfs_fs_eofblocks functionality. */ #define XFS_ICWALK_FLAG_SYNC (1U << 0) /* sync/wait mode scan */ #define XFS_ICWALK_FLAG_UID (1U << 1) /* filter by uid */ diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ee79cf161312..5375c17ed69c 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1293,7 +1293,6 @@ xfs_setup_inode( inode->i_ino = ip->i_ino; inode->i_state |= I_NEW; - inode_sb_list_add(inode); /* make the inode look hashed for the writeback code */ inode_fake_hash(inode); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index fbb3a1594c0d..a2ef1b582066 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1179,6 +1179,16 @@ xfs_fs_shutdown( xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED); } +static int +xfs_fs_iter_vfs_inodes( + struct super_block *sb, + ino_iter_fn iter_fn, + void *private_data, + int flags) +{ + return xfs_icwalk_vfs_inodes(XFS_M(sb), iter_fn, private_data, flags); +} + static const struct super_operations xfs_super_operations = { .alloc_inode = xfs_fs_alloc_inode, .destroy_inode = xfs_fs_destroy_inode, @@ -1193,6 +1203,7 @@ static const struct super_operations xfs_super_operations = { .nr_cached_objects = xfs_fs_nr_cached_objects, .free_cached_objects = xfs_fs_free_cached_objects, .shutdown = xfs_fs_shutdown, + .iter_vfs_inodes = xfs_fs_iter_vfs_inodes, }; static int From patchwork Wed Oct 2 01:33:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13819222 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E83E2F44 for ; Wed, 2 Oct 2024 01:40:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833225; cv=none; b=FR13TrA6kklPy+8PuDp1148nIujVVRmaEcTWjJKo5HbdqwK9EnGm9RVcGIj9uSVEeXOyrkYRszgabnyPoUQdkX/TokAK7NW/+4JXIKecdgyyJElQZXbwRWflck6ge4aGemTX1RDGmvy2rJoLakAx4JAFwQVZD/rOtN494cXM8+I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727833225; c=relaxed/simple; bh=QzJZcYuJqvu4t2qF3uS7a40qvoBvPZsRghW4Ip08npo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=he+nLz4lNzVOF1n9Ln/gmuSHEYGsTRpBeE9WyGGMokUpGoFcNU6TiIe9+YOUQllpGP9Xq3nWC9ypwnNFrkWfXbrJZEN7MOvLKz607+yBUgu+QRD/GR3LAz8znVVp+n4c0UST0A0EP4vfwYTtefljTTY3jXAxUD6O7p9IYFsXhqE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=mqYWq7x1; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="mqYWq7x1" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2e0b0142bbfso268854a91.1 for ; Tue, 01 Oct 2024 18:40:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727833223; x=1728438023; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=t1rJUH/mWxXjOpXgbWGlZQXMWgJmvd7aMaDMpndP2yo=; b=mqYWq7x1xbf1Ifrw7l1y2hhb9uvYjvNPSTW1iORI8rCU2XEsl+XY/vOeenwwhuh/gY ajpJIWehMuzEXqo8GoU0SwcE6EjcUK+/obMFHEa5HIBTDnN44tZMziodmtPWVDDg44oS liGh0o9dM1n4SBN5ZanO4+Atj/FQLwPgu+9nPP0tO/wB658yECd2nYb1eK4tpCJrIIyu Umi/Ogw0B1sjExbsJdPofoexikFkJ9xIhVPNFzXe7zyJe1GTGvRq532XyL3D4IFnQX76 ixifH88l8GT6DzppfqRVwkDy1lgnwuPn76+vleoRSMchGFMuoaqFJkSKob3fscTt9W3d sTwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727833223; x=1728438023; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t1rJUH/mWxXjOpXgbWGlZQXMWgJmvd7aMaDMpndP2yo=; b=Wq3fO2yygE8zDmSkuW3kLvWgtd+6E/0JdXYUT9JFD9D2DQK7rTVJMewSqWIxzNRv07 +aHUTkdH/4at80ls3hOS2Z2i8S4TcvGvN5wJKDkWzcb7RwI+RVIuVLLQuHmgcq4fjOuv bdRimTKXlxaml5nhCLzGSV0Nqh1ERsOk7PtjSw1s8qzXKhVZDqnDrutOM5b8d8ZgAw8G KzRUPkgX6FBqVVkMea10a8F4uca2b58/VKF8klIxXQshX56mJBd94djxx9+3zyBhRt/f oHeoOSJcc8d7rbVnPBId07CMIlsMpr60USjxTl2bOpJ5aZT1hqRUfKJFIy4TjBrmbk/5 8q9Q== X-Gm-Message-State: AOJu0YwtQIm1K1lvLCCHYBRvVYR4dSuFoA//C0wUgHyz+R50IuZ+5eb5 f+0AiQ5krZ6vIJOwfAG1pEVLNWkyyVUVF7X2pvyYi1xWEHemQkLJ1fsmhlXx7HhPr32iuqeAeWU j X-Google-Smtp-Source: AGHT+IER4Kb2iLiz3Cn70aHvpLUJtA0lkyIz83+tDTqx83rbjBv6L7OqRmyXU9lvlCTDvJmi8BZfWg== X-Received: by 2002:a17:90a:dc0d:b0:2d8:85fc:464c with SMTP id 98e67ed59e1d1-2e15a253ae9mr8067492a91.11.1727833223299; Tue, 01 Oct 2024 18:40:23 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e18fa54fe3sm317590a91.54.2024.10.01.18.40.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 18:40:22 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1svoLj-00Ck8r-0v; Wed, 02 Oct 2024 11:40:19 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.98) (envelope-from ) id 1svoLj-0000000FxGf-2yhr; Wed, 02 Oct 2024 11:40:19 +1000 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 7/7] bcachefs: implement sb->iter_vfs_inodes Date: Wed, 2 Oct 2024 11:33:24 +1000 Message-ID: <20241002014017.3801899-8-david@fromorbit.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner Untested, probably doesn't work, just a quick hack to indicate how this could be done with the new bcachefs inode cache. Signed-off-by: Dave Chinner --- fs/bcachefs/fs.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c index 4a1bb07a2574..7708ec2b68c1 100644 --- a/fs/bcachefs/fs.c +++ b/fs/bcachefs/fs.c @@ -1814,6 +1814,46 @@ void bch2_evict_subvolume_inodes(struct bch_fs *c, snapshot_id_list *s) darray_exit(&grabbed); } +static int +bch2_iter_vfs_inodes( + struct super_block *sb, + ino_iter_fn iter_fn, + void *private_data, + int flags) +{ + struct bch_inode_info *inode, *old_inode = NULL; + int ret = 0; + + mutex_lock(&c->vfs_inodes_lock); + list_for_each_entry(inode, &c->vfs_inodes_list, ei_vfs_inode_list) { + if (!super_iter_iget(&inode->v, flags)) + continue; + + if (!(flags & INO_ITER_UNSAFE)) + mutex_unlock(&c->vfs_inodes_lock); + + ret = iter_fn(VFS_I(ip), private_data); + cond_resched(); + + if (!(flags & INO_ITER_UNSAFE)) { + if (old_inode) + iput(&old_inode->v); + old_inode = inode; + mutex_lock(&c->vfs_inodes_lock); + } + + if (ret == INO_ITER_ABORT) { + ret = 0; + break; + } + if (ret < 0) + break; + } + if (old_inode) + iput(&old_inode->v); + return ret; +} + static int bch2_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; @@ -1995,6 +2035,7 @@ static const struct super_operations bch_super_operations = { .put_super = bch2_put_super, .freeze_fs = bch2_freeze, .unfreeze_fs = bch2_unfreeze, + .iter_vfs_inodes = bch2_iter_vfs_inodes }; static int bch2_set_super(struct super_block *s, void *data)