From patchwork Tue Jun 27 20:53:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13295026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B4DEB64D9 for ; Tue, 27 Jun 2023 20:53:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60DEF8D0003; Tue, 27 Jun 2023 16:53:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A2338D0001; Tue, 27 Jun 2023 16:53:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 410A38D0003; Tue, 27 Jun 2023 16:53:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 29FA38D0001 for ; Tue, 27 Jun 2023 16:53:14 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CAC0E1605C7 for ; Tue, 27 Jun 2023 20:53:13 +0000 (UTC) X-FDA: 80949727866.03.B7F7ADC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf28.hostedemail.com (Postfix) with ESMTP id DCBD9C0012 for ; Tue, 27 Jun 2023 20:53:11 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=AsfRUUBQ; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf28.hostedemail.com: domain of cel@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cel@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687899192; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YGaralgLRaoHAX7jqshxsm1kVgaoqcFMmqkjfjAOd0A=; b=VhbvuXU81p5v6WySU2woafP2IjMaVJ/CcunbLQPfmSzQHd/Jz3FKeCqVvoEXTSHop4LoV4 G1TgyFjO5TQXROGEPg75Q2JAjf79CMd8noH22pQgHdQarhaq1o2ED+03SMqBp0OMQ4tG9N t4iuS0l3geHQEvPI4H+/Gl1OURQXg1Y= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=AsfRUUBQ; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf28.hostedemail.com: domain of cel@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cel@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687899192; a=rsa-sha256; cv=none; b=lJxUjgpKTIquK9NizDgvdLlcDNrJsVUYH3HCwctO8VaD3F3wYuE8N+TikJNlFWJJqhn0dE 8TTDES4dBgvsmJD6JKPW/yQ31sjS8OpjMiP0qrP3bo9yUeZjX5dODPuvTxz2fhfeL4zC0E 7G7CSE0TGIXXx0xz0XsBGWnyVAKhtN8= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1A1DD61233; Tue, 27 Jun 2023 20:53:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E1008C433C9; Tue, 27 Jun 2023 20:53:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687899190; bh=uDfTpqtgCWtUzszg8NsN8i3VJhtB66ykQkSTdRnCRjc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=AsfRUUBQvlvcfShVqC49ri0eIUchu7MUUwuUxYHcTnlDnaFyZFi9tESuXmfcmsSo9 IJpfV5N3y5AmG3Pg/Gjgk8dqWZB5Fs7VL6XljyrAT9k5CpbzLdQrYSNrabh2qsuDUa +6KFDB0y1HILKAKPU2PiJMIZw/Xh+avyh8TnKNv7d9LmWcsi9h4EPbQqeJddy13Tm5 YzztQ4Bi9ZBZ96uqJ/9ypl060deYrY/iASHmFqdS7atJi14bAOP+Wr6HuNNW4s0URF C/NhCHUAyNNsfGQl1wxE9fQVPredj3zYSJENuTQrmjX3OP9jgus2WETPhSu1l6mR7/ yOsInYXhHakiA== Subject: [PATCH v5 1/3] libfs: Add directory operations for stable offsets From: Chuck Lever To: viro@zeniv.linux.org.uk, brauner@kernel.org, hughd@google.com, akpm@linux-foundation.org Cc: Chuck Lever , jlayton@redhat.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Date: Tue, 27 Jun 2023 16:53:09 -0400 Message-ID: <168789918896.157531.14644838088821804546.stgit@manet.1015granger.net> In-Reply-To: <168789864000.157531.11122232592994999253.stgit@manet.1015granger.net> References: <168789864000.157531.11122232592994999253.stgit@manet.1015granger.net> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Rspamd-Queue-Id: DCBD9C0012 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ticggzpsyswf3xrmdiwajx9xpmqxe834 X-HE-Tag: 1687899191-949628 X-HE-Meta: U2FsdGVkX18Cxb0nOqcSZzS4ytzC5tKv6WdPzW51reT7As96aUwcWQV9Je3T3hDrZXPBJh6x1+fz/7y3Lsx8SU65eWcQ7jb+I0uIOpF7DICIFKFKQ+tSi6TNH9vptIHyn+cvBHk+SRM8cvXzzwsGo6BeO+zxLYEqbFZvv0c2C3DWU5lHuM7WqSKexHyCCMS/7ajOSB4TM+ro0A2ww4algpmu1JzuCNk7Jg74WmIyrCjJNNYUlQat8yRITUzz7jHACL5lTd9Aby7f5roWYWCy1ed535T4Gl3V8XapAyM7AjFSES/tJAF1nvbZAcv9mKF6DSLfAQIb8I/0+eoaowaaRyu0hFLrlDm57nWdPcq3RB3uZMYkiSJ698gXqSK551l6i1ClQcEm0t+yYBHGPiZf14Vb52ZUhwQAciO7IiX4gq9zjUtDT0AwucGPn2vgInjyC/+Lrcasb3LeA9K+8lJ2gpNLo+F4jEVg88GUY1K+PDSy3w8KF06L4Cof4ycklSltY4RcmbUcmJbwwYeMk6eSoniACQZIoY/1UAWF31lw9gBuP45CC+hWksYIyFNsvABzQwAn/JohExSsZ4ZLjoCm5ROGc+7BgPee2Z/xUPRc2gQnOaZqrRQv4WMKdjADr4eM0Vp25MoDDkR0BG2Tr/sL0uJedHyEnon78Mhv9495Du1x50A4Wck67kcwZ0UQvIUByuidy9srF2Yrdd5zl+l7cHHpkwfgQXSaKmN2qklt7vWGboK7NtzfPnr3cMXYSbHkwfXABztZAE3Z+7AxUIZ4kcG+dW8eJXyUPT0/OEhA1CCaak4/JgW/SBywSg5lFttq730ViHkAPLdJbha9U+fcTti61FD0gIrvRA5nXDjrplA08yKQ7o7x0mvtnxvy6ClFmi20eiJMIiQJrQlZrJLUmYryI6LnUUcEuEDSBeV1b9G//0aZwF2yJbiyGh3wn56LynZa1kwg2e6G4i/L8vr e6vrPXvj HefmvKsB6c15l593HwLXLquhbP19p/TUpTcEIbdlCovq2C/byUhgNrozFikTFWvaW9MS2usuC96Rpu8hzyAM0rqVc7vUYNDue5i5oC4W5SXM0ThDC3yaeR3UGSqZGvP72H1AmyQFjvCR6V5GmiwAw4iGZO7qZqARQLTP3IzIxj+4XG0EnjDnA1nN926ca8RgmZ/bFUwFrt1CW6WW8Efll3nuPi91RINNy4oVz8PqH2PYGZSPyJdICV1Dxpv9H3/IOoaCWwFFlSrpyDMSbZgfl4sdAow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Chuck Lever Create a vector of directory operations in fs/libfs.c that handles directory seeks and readdir via stable offsets instead of the current cursor-based mechanism. For the moment these are unused. Signed-off-by: Chuck Lever --- fs/libfs.c | 252 ++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 19 ++++ 2 files changed, 271 insertions(+) diff --git a/fs/libfs.c b/fs/libfs.c index 89cf614a3271..9940dce049e6 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -239,6 +239,258 @@ const struct inode_operations simple_dir_inode_operations = { }; EXPORT_SYMBOL(simple_dir_inode_operations); +static struct stable_offset_ctx *stable_ctx_get(struct inode *inode) +{ + return inode->i_op->get_so_ctx(inode); +} + +static void stable_offset_set(struct dentry *dentry, unsigned long offset) +{ + dentry->d_fsdata = (void *)offset; +} + +static unsigned long stable_offset_get(struct dentry *dentry) +{ + return (unsigned long)dentry->d_fsdata; +} + +/** + * stable_offset_init - initialize a parent directory + * @so_ctx: directory offset map to be initialized + * + */ +void stable_offset_init(struct stable_offset_ctx *so_ctx) +{ + xa_init_flags(&so_ctx->xa, XA_FLAGS_ALLOC1); + + /* 0 is '.', 1 is '..', so always start with offset 2 */ + so_ctx->next_offset = 2; +} + +/** + * stable_offset_add - Add an entry to a directory's stable offset map + * @so_ctx: directory offset ctx to be updated + * @dentry: new dentry being added + * + * Returns zero on success. @so_ctx and the dentry offset are updated. + * Otherwise, a negative errno value is returned. + */ +int stable_offset_add(struct stable_offset_ctx *so_ctx, struct dentry *dentry) +{ + static const struct xa_limit limit = XA_LIMIT(2, U32_MAX); + u32 offset; + int ret; + + if (stable_offset_get(dentry) != 0) + return -EBUSY; + + ret = xa_alloc_cyclic(&so_ctx->xa, &offset, dentry, limit, + &so_ctx->next_offset, GFP_KERNEL); + if (ret < 0) + return ret; + + stable_offset_set(dentry, offset); + return 0; +} + +/** + * stable_offset_remove - Remove an entry to a directory's stable offset map + * @so_ctx: directory offset ctx to be updated + * @dentry: dentry being removed + * + */ +void stable_offset_remove(struct stable_offset_ctx *so_ctx, + struct dentry *dentry) +{ + unsigned long index = stable_offset_get(dentry); + + if (index == 0) + return; + + xa_erase(&so_ctx->xa, index); + stable_offset_set(dentry, 0); +} + +/** + * stable_offset_rename_exchange - exchange rename with stable directory offsets + * @old_dir: parent of dentry being moved + * @old_dentry: dentry being moved + * @new_dir: destination parent + * @new_dentry: destination dentry + * + * Returns zero on success. Otherwise a negative errno is returned and the + * rename is rolled back. + */ +int stable_offset_rename_exchange(struct inode *old_dir, + struct dentry *old_dentry, + struct inode *new_dir, + struct dentry *new_dentry) +{ + struct stable_offset_ctx *old_ctx = stable_ctx_get(old_dir); + struct stable_offset_ctx *new_ctx = stable_ctx_get(new_dir); + unsigned long old_index = stable_offset_get(old_dentry); + unsigned long new_index = stable_offset_get(new_dentry); + int ret; + + stable_offset_remove(old_ctx, old_dentry); + stable_offset_remove(new_ctx, new_dentry); + + ret = stable_offset_add(new_ctx, old_dentry); + if (ret) + goto out_restore; + + ret = stable_offset_add(old_ctx, new_dentry); + if (ret) { + stable_offset_remove(new_ctx, old_dentry); + goto out_restore; + } + + ret = simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry); + if (ret) { + stable_offset_remove(new_ctx, old_dentry); + stable_offset_remove(old_ctx, new_dentry); + goto out_restore; + } + return 0; + +out_restore: + stable_offset_set(old_dentry, old_index); + xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL); + stable_offset_set(new_dentry, new_index); + xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL); + return ret; +} + +/** + * stable_offset_destroy - Release offset map + * @so_ctx: directory offset ctx that is about to be destroyed + * + * During fs teardown (eg. umount), a directory's offset map might still + * contain entries. xa_destroy() cleans out anything that remains. + */ +void stable_offset_destroy(struct stable_offset_ctx *so_ctx) +{ + xa_destroy(&so_ctx->xa); +} + +/** + * stable_dir_llseek - Advance the read position of a directory descriptor + * @file: an open directory whose position is to be updated + * @offset: a byte offset + * @whence: enumerator describing the starting position for this update + * + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories. + * + * Returns the updated read position if successful; otherwise a + * negative errno is returned and the read position remains unchanged. + */ +static loff_t stable_dir_llseek(struct file *file, loff_t offset, int whence) +{ + switch (whence) { + case SEEK_CUR: + offset += file->f_pos; + fallthrough; + case SEEK_SET: + if (offset >= 0) + break; + fallthrough; + default: + return -EINVAL; + } + + return vfs_setpos(file, offset, U32_MAX); +} + +static struct dentry *stable_find_next(struct xa_state *xas) +{ + struct dentry *child, *found = NULL; + + rcu_read_lock(); + child = xas_next_entry(xas, U32_MAX); + if (!child) + goto out; + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); + if (simple_positive(child)) + found = dget_dlock(child); + spin_unlock(&child->d_lock); +out: + rcu_read_unlock(); + return found; +} + +static bool stable_dir_emit(struct dir_context *ctx, struct dentry *dentry) +{ + loff_t offset = stable_offset_get(dentry); + struct inode *inode = d_inode(dentry); + + return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset, + inode->i_ino, fs_umode_to_dtype(inode->i_mode)); +} + +static void stable_iterate_dir(struct dentry *dir, struct dir_context *ctx) +{ + struct stable_offset_ctx *so_ctx = stable_ctx_get(d_inode(dir)); + XA_STATE(xas, &so_ctx->xa, ctx->pos); + struct dentry *dentry; + + while (true) { + spin_lock(&dir->d_lock); + dentry = stable_find_next(&xas); + spin_unlock(&dir->d_lock); + if (!dentry) + break; + + if (!stable_dir_emit(ctx, dentry)) { + dput(dentry); + break; + } + + dput(dentry); + ctx->pos = xas.xa_index + 1; + } +} + +/** + * stable_readdir - Emit entries starting at offset @ctx->pos + * @file: an open directory to iterate over + * @ctx: directory iteration context + * + * Caller must hold @file's i_rwsem to prevent insertion or removal of + * entries during this call. + * + * On entry, @ctx->pos contains an offset that represents the first entry + * to be read from the directory. + * + * The operation continues until there are no more entries to read, or + * until the ctx->actor indicates there is no more space in the caller's + * output buffer. + * + * On return, @ctx->pos contains an offset that will read the next entry + * in this directory when shmem_readdir() is called again with @ctx. + * + * Return values: + * %0 - Complete + */ +static int stable_readdir(struct file *file, struct dir_context *ctx) +{ + struct dentry *dir = file->f_path.dentry; + + lockdep_assert_held(&d_inode(dir)->i_rwsem); + + if (!dir_emit_dots(file, ctx)) + return 0; + + stable_iterate_dir(dir, ctx); + return 0; +} + +const struct file_operations stable_dir_operations = { + .llseek = stable_dir_llseek, + .iterate_shared = stable_readdir, + .read = generic_read_dir, + .fsync = noop_fsync, +}; + static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev) { struct dentry *child = NULL; diff --git a/include/linux/fs.h b/include/linux/fs.h index 133f0640fb24..16be31bd81f7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1767,6 +1767,7 @@ struct dir_context { struct iov_iter; struct io_uring_cmd; +struct stable_offset_ctx; struct file_operations { struct module *owner; @@ -1854,6 +1855,7 @@ struct inode_operations { int (*fileattr_set)(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); + struct stable_offset_ctx *(*get_so_ctx)(struct inode *inode); } ____cacheline_aligned; static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, @@ -2954,6 +2956,23 @@ extern ssize_t simple_read_from_buffer(void __user *to, size_t count, extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos, const void __user *from, size_t count); +struct stable_offset_ctx { + struct xarray xa; + u32 next_offset; +}; + +void stable_offset_init(struct stable_offset_ctx *so_ctx); +int stable_offset_add(struct stable_offset_ctx *so_ctx, struct dentry *dentry); +void stable_offset_remove(struct stable_offset_ctx *so_ctx, + struct dentry *dentry); +int stable_offset_rename_exchange(struct inode *old_dir, + struct dentry *old_dentry, + struct inode *new_dir, + struct dentry *new_dentry); +void stable_offset_destroy(struct stable_offset_ctx *so_ctx); + +extern const struct file_operations stable_dir_operations; + extern int __generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_file_fsync(struct file *, loff_t, loff_t, int);