From patchwork Wed Nov 1 17:25:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442944 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCA9711CAA for ; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id 63BDEC433C8; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzR-00EZs7-0m; Wed, 01 Nov 2023 13:26:49 -0400 Message-ID: <20231101172649.049758712@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:42 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , stable@vger.kernel.org, Ajay Kaher Subject: [PATCH v6 1/8] eventfs: Remove "is_freed" union with rcu head References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" The eventfs_inode->is_freed was a union with the rcu_head with the assumption that when it was on the srcu list the head would contain a pointer which would make "is_freed" true. But that was a wrong assumption as the rcu head is a single link list where the last element is NULL. Instead, split the nr_entries integer so that "is_freed" is one bit and the nr_entries is the next 31 bits. As there shouldn't be more than 10 (currently there's at most 5 to 7 depending on the config), this should not be a problem. Cc: stable@vger.kernel.org Cc: Ajay Kaher Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223419.935276916@goodmis.org - Resynced for this patch series fs/tracefs/event_inode.c | 2 ++ fs/tracefs/internal.h | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 9f612a8f009d..1ce73acf3df0 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -824,6 +824,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, eventfs_remove_rec(ei_child, head, level + 1); } + ei->is_freed = 1; + list_del_rcu(&ei->list); list_add_tail(&ei->del_list, head); } diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 64fde9490f52..c7d88aaa949f 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -23,6 +23,7 @@ struct tracefs_inode { * @d_parent: pointer to the parent's dentry * @d_children: The array of dentries to represent the files when created * @data: The private data to pass to the callbacks + * @is_freed: Flag set if the eventfs is on its way to be freed * @nr_entries: The number of items in @entries */ struct eventfs_inode { @@ -38,14 +39,13 @@ struct eventfs_inode { * Union - used for deletion * @del_list: list of eventfs_inode to delete * @rcu: eventfs_inode to delete in RCU - * @is_freed: node is freed if one of the above is set */ union { struct list_head del_list; struct rcu_head rcu; - unsigned long is_freed; }; - int nr_entries; + unsigned int is_freed:1; + unsigned int nr_entries:31; }; static inline struct tracefs_inode *get_tracefs(const struct inode *inode) From patchwork Wed Nov 1 17:25:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442945 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E384B1642F for ; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B283C433CB; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzR-00EZsf-1T; Wed, 01 Nov 2023 13:26:49 -0400 Message-ID: <20231101172649.265214087@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:43 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher Subject: [PATCH v6 2/8] eventfs: Have a free_ei() that just frees the eventfs_inode References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" As the eventfs_inode is freed in two different locations, make a helper function free_ei() to make sure all the allocated fields of the eventfs_inode is freed. This requires renaming the existing free_ei() which is called by the srcu handler to free_rcu_ei() and have free_ei() just do the freeing, where free_rcu_ei() will call it. Cc: Ajay Kaher Reviewed-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223420.141626438@goodmis.org - Resynced to this patch series fs/tracefs/event_inode.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 1ce73acf3df0..dd5971855732 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -129,6 +129,13 @@ static struct dentry *create_dir(const char *name, struct dentry *parent) return eventfs_end_creating(dentry); } +static void free_ei(struct eventfs_inode *ei) +{ + kfree_const(ei->name); + kfree(ei->d_children); + kfree(ei); +} + /** * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode * @ti: the tracefs_inode of the dentry @@ -168,9 +175,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) eventfs_remove_dir(ei_child); } - kfree_const(ei->name); - kfree(ei->d_children); - kfree(ei); + free_ei(ei); return; } @@ -784,13 +789,11 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry return ERR_PTR(-ENOMEM); } -static void free_ei(struct rcu_head *head) +static void free_rcu_ei(struct rcu_head *head) { struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu); - kfree_const(ei->name); - kfree(ei->d_children); - kfree(ei); + free_ei(ei); } /** @@ -881,7 +884,7 @@ void eventfs_remove_dir(struct eventfs_inode *ei) for (i = 0; i < ei->nr_entries; i++) unhook_dentry(&ei->d_children[i], &dentry_list); unhook_dentry(&ei->dentry, &dentry_list); - call_srcu(&eventfs_srcu, &ei->rcu, free_ei); + call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); } mutex_unlock(&eventfs_mutex); From patchwork Wed Nov 1 17:25:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442948 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C057171DA for ; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE4B7C433CC; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzR-00EZtD-2A; Wed, 01 Nov 2023 13:26:49 -0400 Message-ID: <20231101172649.477608228@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:44 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher , Linux Kernel Functional Testing , Naresh Kamboju , Beau Belgrave Subject: [PATCH v6 3/8] eventfs: Test for ei->is_freed when accessing ei->dentry References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" The eventfs_inode (ei) is protected by SRCU, but the ei->dentry is not. It is protected by the eventfs_mutex. Anytime the eventfs_mutex is released, and access to the ei->dentry needs to be done, it should first check if ei->is_freed is set under the eventfs_mutex. If it is, then the ei->dentry is invalid and must not be used. The ei->dentry must only be accessed under the eventfs_mutex and after checking if ei->is_freed is set. When the ei is being freed, it will (under the eventfs_mutex) set is_freed and at the same time move the dentry to a free list to be cleared after the eventfs_mutex is released. This means that any access to the ei->dentry must check first if ei->is_freed is set, because if it is, then the dentry is on its way to be freed. Also add comments to describe this better. Link: https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=SoEywQOe19fGP3uR15SGowkdK+_X85Cg@mail.gmail.com/ Link: https://lore.kernel.org/all/CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6aA@mail.gmail.com/ Cc: Ajay Kaher Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Linux Kernel Functional Testing Reported-by: Naresh Kamboju Reported-by: Beau Belgrave Reviewed-by: Masami Hiramatsu (Google) Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Tested-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223420.347714541@goodmis.org - Resynced to this patch series fs/tracefs/event_inode.c | 45 ++++++++++++++++++++++++++++++++++------ fs/tracefs/internal.h | 3 ++- 2 files changed, 41 insertions(+), 7 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index dd5971855732..e9625732c52d 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -24,7 +24,20 @@ #include #include "internal.h" +/* + * eventfs_mutex protects the eventfs_inode (ei) dentry. Any access + * to the ei->dentry must be done under this mutex and after checking + * if ei->is_freed is not set. The ei->dentry is released under the + * mutex at the same time ei->is_freed is set. If ei->is_freed is set + * then the ei->dentry is invalid. + */ static DEFINE_MUTEX(eventfs_mutex); + +/* + * The eventfs_inode (ei) itself is protected by SRCU. It is released from + * its parent's list and will have is_freed set (under eventfs_mutex). + * After the SRCU grace period is over, the ei may be freed. + */ DEFINE_STATIC_SRCU(eventfs_srcu); static struct dentry *eventfs_root_lookup(struct inode *dir, @@ -239,6 +252,10 @@ create_file_dentry(struct eventfs_inode *ei, struct dentry **e_dentry, bool invalidate = false; mutex_lock(&eventfs_mutex); + if (ei->is_freed) { + mutex_unlock(&eventfs_mutex); + return NULL; + } /* If the e_dentry already has a dentry, use it */ if (*e_dentry) { /* lookup does not need to up the ref count */ @@ -312,6 +329,8 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei) struct eventfs_inode *ei_child; struct tracefs_inode *ti; + lockdep_assert_held(&eventfs_mutex); + /* srcu lock already held */ /* fill parent-child relation */ list_for_each_entry_srcu(ei_child, &ei->children, list, @@ -325,6 +344,7 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei) /** * create_dir_dentry - Create a directory dentry for the eventfs_inode + * @pei: The eventfs_inode parent of ei. * @ei: The eventfs_inode to create the directory for * @parent: The dentry of the parent of this directory * @lookup: True if this is called by the lookup code @@ -332,12 +352,17 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei) * This creates and attaches a directory dentry to the eventfs_inode @ei. */ static struct dentry * -create_dir_dentry(struct eventfs_inode *ei, struct dentry *parent, bool lookup) +create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, + struct dentry *parent, bool lookup) { bool invalidate = false; struct dentry *dentry = NULL; mutex_lock(&eventfs_mutex); + if (pei->is_freed || ei->is_freed) { + mutex_unlock(&eventfs_mutex); + return NULL; + } if (ei->dentry) { /* If the dentry already has a dentry, use it */ dentry = ei->dentry; @@ -440,7 +465,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, */ mutex_lock(&eventfs_mutex); ei = READ_ONCE(ti->private); - if (ei) + if (ei && !ei->is_freed) ei_dentry = READ_ONCE(ei->dentry); mutex_unlock(&eventfs_mutex); @@ -454,7 +479,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, if (strcmp(ei_child->name, name) != 0) continue; ret = simple_lookup(dir, dentry, flags); - create_dir_dentry(ei_child, ei_dentry, true); + create_dir_dentry(ei, ei_child, ei_dentry, true); created = true; break; } @@ -588,7 +613,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) { - d = create_dir_dentry(ei_child, parent, false); + d = create_dir_dentry(ei, ei_child, parent, false); if (d) { ret = add_dentries(&dentries, d, cnt); if (ret < 0) @@ -705,12 +730,20 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode ei->nr_entries = size; ei->data = data; INIT_LIST_HEAD(&ei->children); + INIT_LIST_HEAD(&ei->list); mutex_lock(&eventfs_mutex); - list_add_tail(&ei->list, &parent->children); - ei->d_parent = parent->dentry; + if (!parent->is_freed) { + list_add_tail(&ei->list, &parent->children); + ei->d_parent = parent->dentry; + } mutex_unlock(&eventfs_mutex); + /* Was the parent freed? */ + if (list_empty(&ei->list)) { + free_ei(ei); + ei = NULL; + } return ei; } diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index c7d88aaa949f..5a98e87dd3d1 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -24,6 +24,7 @@ struct tracefs_inode { * @d_children: The array of dentries to represent the files when created * @data: The private data to pass to the callbacks * @is_freed: Flag set if the eventfs is on its way to be freed + * Note if is_freed is set, then dentry is corrupted. * @nr_entries: The number of items in @entries */ struct eventfs_inode { @@ -31,7 +32,7 @@ struct eventfs_inode { const struct eventfs_entry *entries; const char *name; struct list_head children; - struct dentry *dentry; + struct dentry *dentry; /* Check is_freed to access */ struct dentry *d_parent; struct dentry **d_children; void *data; From patchwork Wed Nov 1 17:25:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442946 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EE35171C3 for ; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id C2744C43391; Wed, 1 Nov 2023 17:26:50 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzR-00EZtl-2r; Wed, 01 Nov 2023 13:26:49 -0400 Message-ID: <20231101172649.691841445@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:45 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher Subject: [PATCH v6 4/8] eventfs: Save ownership and mode References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Now that inodes and dentries are created on the fly, they are also reclaimed on memory pressure. Since the ownership and file mode are saved in the inode, if they are freed, any changes to the ownership and mode will be lost. To counter this, if the user changes the permissions or ownership, save them, and when creating the inodes again, restore those changes. Cc: Ajay Kaher Signed-off-by: Steven Rostedt (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223420.568912586@goodmis.org - Resynced to this patch series fs/tracefs/event_inode.c | 148 +++++++++++++++++++++++++++++++++++---- fs/tracefs/internal.h | 16 +++++ 2 files changed, 151 insertions(+), 13 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index e9625732c52d..93d08e552483 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -40,6 +40,15 @@ static DEFINE_MUTEX(eventfs_mutex); */ DEFINE_STATIC_SRCU(eventfs_srcu); +/* Mode is unsigned short, use the upper bits for flags */ +enum { + EVENTFS_SAVE_MODE = BIT(16), + EVENTFS_SAVE_UID = BIT(17), + EVENTFS_SAVE_GID = BIT(18), +}; + +#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1) + static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); @@ -47,8 +56,89 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx); static int eventfs_release(struct inode *inode, struct file *file); +static void update_attr(struct eventfs_attr *attr, struct iattr *iattr) +{ + unsigned int ia_valid = iattr->ia_valid; + + if (ia_valid & ATTR_MODE) { + attr->mode = (attr->mode & ~EVENTFS_MODE_MASK) | + (iattr->ia_mode & EVENTFS_MODE_MASK) | + EVENTFS_SAVE_MODE; + } + if (ia_valid & ATTR_UID) { + attr->mode |= EVENTFS_SAVE_UID; + attr->uid = iattr->ia_uid; + } + if (ia_valid & ATTR_GID) { + attr->mode |= EVENTFS_SAVE_GID; + attr->gid = iattr->ia_gid; + } +} + +static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, + struct iattr *iattr) +{ + const struct eventfs_entry *entry; + struct eventfs_inode *ei; + const char *name; + int ret; + + mutex_lock(&eventfs_mutex); + ei = dentry->d_fsdata; + /* The LSB is set when the eventfs_inode is being freed */ + if (((unsigned long)ei & 1UL) || ei->is_freed) { + /* Do not allow changes if the event is about to be removed. */ + mutex_unlock(&eventfs_mutex); + return -ENODEV; + } + + /* Preallocate the children mode array if necessary */ + if (!(dentry->d_inode->i_mode & S_IFDIR)) { + if (!ei->entry_attrs) { + ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries, + GFP_KERNEL); + if (!ei->entry_attrs) { + ret = -ENOMEM; + goto out; + } + } + } + + ret = simple_setattr(idmap, dentry, iattr); + if (ret < 0) + goto out; + + /* + * If this is a dir, then update the ei cache, only the file + * mode is saved in the ei->m_children, and the ownership is + * determined by the parent directory. + */ + if (dentry->d_inode->i_mode & S_IFDIR) { + update_attr(&ei->attr, iattr); + + } else { + name = dentry->d_name.name; + + for (int i = 0; i < ei->nr_entries; i++) { + entry = &ei->entries[i]; + if (strcmp(name, entry->name) == 0) { + update_attr(&ei->entry_attrs[i], iattr); + break; + } + } + } + out: + mutex_unlock(&eventfs_mutex); + return ret; +} + static const struct inode_operations eventfs_root_dir_inode_operations = { .lookup = eventfs_root_lookup, + .setattr = eventfs_set_attr, +}; + +static const struct inode_operations eventfs_file_inode_operations = { + .setattr = eventfs_set_attr, }; static const struct file_operations eventfs_file_operations = { @@ -59,10 +149,30 @@ static const struct file_operations eventfs_file_operations = { .release = eventfs_release, }; +static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, umode_t mode) +{ + if (!attr) { + inode->i_mode = mode; + return; + } + + if (attr->mode & EVENTFS_SAVE_MODE) + inode->i_mode = attr->mode & EVENTFS_MODE_MASK; + else + inode->i_mode = mode; + + if (attr->mode & EVENTFS_SAVE_UID) + inode->i_uid = attr->uid; + + if (attr->mode & EVENTFS_SAVE_GID) + inode->i_gid = attr->gid; +} + /** * create_file - create a file in the tracefs filesystem * @name: the name of the file to create. * @mode: the permission that the file should have. + * @attr: saved attributes changed by user * @parent: parent dentry for this file. * @data: something that the caller will want to get to later on. * @fop: struct file_operations that should be used for this file. @@ -72,6 +182,7 @@ static const struct file_operations eventfs_file_operations = { * call. */ static struct dentry *create_file(const char *name, umode_t mode, + struct eventfs_attr *attr, struct dentry *parent, void *data, const struct file_operations *fop) { @@ -95,7 +206,10 @@ static struct dentry *create_file(const char *name, umode_t mode, if (unlikely(!inode)) return eventfs_failed_creating(dentry); - inode->i_mode = mode; + /* If the user updated the directory's attributes, use them */ + update_inode_attr(inode, attr, mode); + + inode->i_op = &eventfs_file_inode_operations; inode->i_fop = fop; inode->i_private = data; @@ -108,19 +222,19 @@ static struct dentry *create_file(const char *name, umode_t mode, /** * create_dir - create a dir in the tracefs filesystem - * @name: the name of the file to create. + * @ei: the eventfs_inode that represents the directory to create * @parent: parent dentry for this file. * * This function will create a dentry for a directory represented by * a eventfs_inode. */ -static struct dentry *create_dir(const char *name, struct dentry *parent) +static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent) { struct tracefs_inode *ti; struct dentry *dentry; struct inode *inode; - dentry = eventfs_start_creating(name, parent); + dentry = eventfs_start_creating(ei->name, parent); if (IS_ERR(dentry)) return dentry; @@ -128,7 +242,9 @@ static struct dentry *create_dir(const char *name, struct dentry *parent) if (unlikely(!inode)) return eventfs_failed_creating(dentry); - inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + /* If the user updated the directory's attributes, use them */ + update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO); + inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations; @@ -146,6 +262,7 @@ static void free_ei(struct eventfs_inode *ei) { kfree_const(ei->name); kfree(ei->d_children); + kfree(ei->entry_attrs); kfree(ei); } @@ -231,7 +348,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) /** * create_file_dentry - create a dentry for a file of an eventfs_inode * @ei: the eventfs_inode that the file will be created under - * @e_dentry: a pointer to the d_children[] of the @ei + * @idx: the index into the d_children[] of the @ei * @parent: The parent dentry of the created file. * @name: The name of the file to create * @mode: The mode of the file. @@ -244,10 +361,12 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) * just do a dget() on it and return. Otherwise create the dentry and attach it. */ static struct dentry * -create_file_dentry(struct eventfs_inode *ei, struct dentry **e_dentry, +create_file_dentry(struct eventfs_inode *ei, int idx, struct dentry *parent, const char *name, umode_t mode, void *data, const struct file_operations *fops, bool lookup) { + struct eventfs_attr *attr = NULL; + struct dentry **e_dentry = &ei->d_children[idx]; struct dentry *dentry; bool invalidate = false; @@ -264,13 +383,18 @@ create_file_dentry(struct eventfs_inode *ei, struct dentry **e_dentry, mutex_unlock(&eventfs_mutex); return *e_dentry; } + + /* ei->entry_attrs are protected by SRCU */ + if (ei->entry_attrs) + attr = &ei->entry_attrs[idx]; + mutex_unlock(&eventfs_mutex); /* The lookup already has the parent->d_inode locked */ if (!lookup) inode_lock(parent->d_inode); - dentry = create_file(name, mode, parent, data, fops); + dentry = create_file(name, mode, attr, parent, data, fops); if (!lookup) inode_unlock(parent->d_inode); @@ -378,7 +502,7 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, if (!lookup) inode_lock(parent->d_inode); - dentry = create_dir(ei->name, parent); + dentry = create_dir(ei, parent); if (!lookup) inode_unlock(parent->d_inode); @@ -495,8 +619,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, if (r <= 0) continue; ret = simple_lookup(dir, dentry, flags); - create_file_dentry(ei, &ei->d_children[i], - ei_dentry, name, mode, cdata, + create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops, true); break; } @@ -629,8 +752,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) r = entry->callback(name, &mode, &cdata, &fops); if (r <= 0) continue; - d = create_file_dentry(ei, &ei->d_children[i], - parent, name, mode, cdata, fops, false); + d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false); if (d) { ret = add_dentries(&dentries, d, cnt); if (ret < 0) diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 5a98e87dd3d1..5f60bcd69289 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -13,6 +13,18 @@ struct tracefs_inode { struct inode vfs_inode; }; +/* + * struct eventfs_attr - cache the mode and ownership of a eventfs entry + * @mode: saved mode plus flags of what is saved + * @uid: saved uid if changed + * @gid: saved gid if changed + */ +struct eventfs_attr { + int mode; + kuid_t uid; + kgid_t gid; +}; + /* * struct eventfs_inode - hold the properties of the eventfs directories. * @list: link list into the parent directory @@ -22,6 +34,8 @@ struct tracefs_inode { * @dentry: the dentry of the directory * @d_parent: pointer to the parent's dentry * @d_children: The array of dentries to represent the files when created + * @entry_attrs: Saved mode and ownership of the @d_children + * @attr: Saved mode and ownership of eventfs_inode itself * @data: The private data to pass to the callbacks * @is_freed: Flag set if the eventfs is on its way to be freed * Note if is_freed is set, then dentry is corrupted. @@ -35,6 +49,8 @@ struct eventfs_inode { struct dentry *dentry; /* Check is_freed to access */ struct dentry *d_parent; struct dentry **d_children; + struct eventfs_attr *entry_attrs; + struct eventfs_attr attr; void *data; /* * Union - used for deletion From patchwork Wed Nov 1 17:25:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442947 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A5F4171C3 for ; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id 32E82C433C9; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzS-00EZuJ-0K; Wed, 01 Nov 2023 13:26:50 -0400 Message-ID: <20231101172649.906696613@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:46 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher , Linux Kernel Functional Testing , Naresh Kamboju Subject: [PATCH v6 5/8] eventfs: Hold eventfs_mutex when calling callback functions References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" The callback function that is used to create inodes and dentries is not protected by anything and the data that is passed to it could become stale. After eventfs_remove_dir() is called by the tracing system, it is free to remove the events that are associated to that directory. Unfortunately, that means the callbacks must not be called after that. CPU0 CPU1 ---- ---- eventfs_root_lookup() { eventfs_remove_dir() { mutex_lock(&event_mutex); ei->is_freed = set; mutex_unlock(&event_mutex); } kfree(event_call); for (...) { entry = &ei->entries[i]; r = entry->callback() { call = data; // call == event_call above if (call->flags ...) [ USE AFTER FREE BUG ] The safest way to protect this is to wrap the callback with: mutex_lock(&eventfs_mutex); if (!ei->is_freed) r = entry->callback(); else r = -1; mutex_unlock(&eventfs_mutex); This will make sure that the callback will not be called after it is freed. But now it needs to be known that the callback is called while holding internal eventfs locks, and that it must not call back into the eventfs / tracefs system. There's no reason it should anyway, but document that as well. Link: https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=Y50WpWAE7_Q@mail.gmail.com/ Cc: Ajay Kaher Reported-by: Linux Kernel Functional Testing Reported-by: Naresh Kamboju Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Steven Rostedt (Google) Reviewed-by: Masami Hiramatsu (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223420.778161254@goodmis.org - Resynced to this patch series fs/tracefs/event_inode.c | 22 ++++++++++++++++++-- include/linux/tracefs.h | 43 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+), 2 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 93d08e552483..8ac9abf7a3d5 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -615,7 +615,13 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, entry = &ei->entries[i]; if (strcmp(name, entry->name) == 0) { void *cdata = data; - r = entry->callback(name, &mode, &cdata, &fops); + mutex_lock(&eventfs_mutex); + /* If ei->is_freed, then the event itself may be too */ + if (!ei->is_freed) + r = entry->callback(name, &mode, &cdata, &fops); + else + r = -1; + mutex_unlock(&eventfs_mutex); if (r <= 0) continue; ret = simple_lookup(dir, dentry, flags); @@ -749,7 +755,13 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) void *cdata = data; entry = &ei->entries[i]; name = entry->name; - r = entry->callback(name, &mode, &cdata, &fops); + mutex_lock(&eventfs_mutex); + /* If ei->is_freed, then the event itself may be too */ + if (!ei->is_freed) + r = entry->callback(name, &mode, &cdata, &fops); + else + r = -1; + mutex_unlock(&eventfs_mutex); if (r <= 0) continue; d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false); @@ -819,6 +831,10 @@ static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx) * data = A pointer to @data, and the callback may replace it, which will * cause the file created to pass the new data to the open() call. * fops = the fops to use for the created file. + * + * NB. @callback is called while holding internal locks of the eventfs + * system. The callback must not call any code that might also call into + * the tracefs or eventfs system or it will risk creating a deadlock. */ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent, const struct eventfs_entry *entries, @@ -878,6 +894,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode * @data: The default data to pass to the files (an entry may override it). * * This function creates the top of the trace event directory. + * + * See eventfs_create_dir() for use of @entries. */ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent, const struct eventfs_entry *entries, diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 13359b1a35d1..7a5fe17b6bf9 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -23,9 +23,52 @@ struct file_operations; struct eventfs_file; +/** + * eventfs_callback - A callback function to create dynamic files in eventfs + * @name: The name of the file that is to be created + * @mode: return the file mode for the file (RW access, etc) + * @data: data to pass to the created file ops + * @fops: the file operations of the created file + * + * The evetnfs files are dynamically created. The struct eventfs_entry array + * is passed to eventfs_create_dir() or eventfs_create_events_dir() that will + * be used to create the files within those directories. When a lookup + * or access to a file within the directory is made, the struct eventfs_entry + * array is used to find a callback() with the matching name that is being + * referenced (for lookups, the entire array is iterated and each callback + * will be called). + * + * The callback will be called with @name for the name of the file to create. + * The callback can return less than 1 to indicate that no file should be + * created. + * + * If a file is to be created, then @mode should be populated with the file + * mode (permissions) for which the file is created for. This would be + * used to set the created inode i_mode field. + * + * The @data should be set to the data passed to the other file operations + * (read, write, etc). Note, @data will also point to the data passed in + * to eventfs_create_dir() or eventfs_create_events_dir(), but the callback + * can replace the data if it chooses to. Otherwise, the original data + * will be used for the file operation functions. + * + * The @fops should be set to the file operations that will be used to create + * the inode. + * + * NB. This callback is called while holding internal locks of the eventfs + * system. The callback must not call any code that might also call into + * the tracefs or eventfs system or it will risk creating a deadlock. + */ typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data, const struct file_operations **fops); +/** + * struct eventfs_entry - dynamically created eventfs file call back handler + * @name: Then name of the dynamic file in an eventfs directory + * @callback: The callback to get the fops of the file when it is created + * + * See evenfs_callback() typedef for how to set up @callback. + */ struct eventfs_entry { const char *name; eventfs_callback callback; From patchwork Wed Nov 1 17:25:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442949 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AE7C17980 for ; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3DC7FC433CA; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzS-00EZur-11; Wed, 01 Nov 2023 13:26:50 -0400 Message-ID: <20231101172650.123479767@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:47 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , stable@vger.kernel.org, Ajay Kaher Subject: [PATCH v6 6/8] eventfs: Delete eventfs_inode when the last dentry is freed References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" There exists a race between holding a reference of an eventfs_inode dentry and the freeing of the eventfs_inode. If user space has a dentry held long enough, it may still be able to access the dentry's eventfs_inode after it has been freed. To prevent this, have he eventfs_inode freed via the last dput() (or via RCU if the eventfs_inode does not have a dentry). This means reintroducing the eventfs_inode del_list field at a temporary place to put the eventfs_inode. It needs to mark it as freed (via the list) but also must invalidate the dentry immediately as the return from eventfs_remove_dir() expects that they are. But the dentry invalidation must not be called under the eventfs_mutex, so it must be done after the eventfs_inode is marked as free (put on a deletion list). Cc: stable@vger.kernel.org Cc: Ajay Kaher Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223420.988874091@goodmis.org - Rebased for this patch series fs/tracefs/event_inode.c | 146 ++++++++++++++++++--------------------- fs/tracefs/internal.h | 2 + 2 files changed, 69 insertions(+), 79 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 8ac9abf7a3d5..0a04ae0ca8c8 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -85,8 +85,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, mutex_lock(&eventfs_mutex); ei = dentry->d_fsdata; - /* The LSB is set when the eventfs_inode is being freed */ - if (((unsigned long)ei & 1UL) || ei->is_freed) { + if (ei->is_freed) { /* Do not allow changes if the event is about to be removed. */ mutex_unlock(&eventfs_mutex); return -ENODEV; @@ -276,35 +275,17 @@ static void free_ei(struct eventfs_inode *ei) void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) { struct tracefs_inode *ti_parent; - struct eventfs_inode *ei_child, *tmp; struct eventfs_inode *ei; int i; /* The top level events directory may be freed by this */ if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) { - LIST_HEAD(ef_del_list); - mutex_lock(&eventfs_mutex); - ei = ti->private; - - /* Record all the top level files */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - lockdep_is_held(&eventfs_mutex)) { - list_add_tail(&ei_child->del_list, &ef_del_list); - } - /* Nothing should access this, but just in case! */ ti->private = NULL; - mutex_unlock(&eventfs_mutex); - /* Now safely free the top level files and their children */ - list_for_each_entry_safe(ei_child, tmp, &ef_del_list, del_list) { - list_del(&ei_child->del_list); - eventfs_remove_dir(ei_child); - } - free_ei(ei); return; } @@ -319,14 +300,6 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) if (!ei) goto out; - /* - * If ei was freed, then the LSB bit is set for d_fsdata. - * But this should not happen, as it should still have a - * ref count that prevents it. Warn in case it does. - */ - if (WARN_ON_ONCE((unsigned long)ei & 1)) - goto out; - /* This could belong to one of the files of the ei */ if (ei->dentry != dentry) { for (i = 0; i < ei->nr_entries; i++) { @@ -336,6 +309,8 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) if (WARN_ON_ONCE(i == ei->nr_entries)) goto out; ei->d_children[i] = NULL; + } else if (ei->is_freed) { + free_ei(ei); } else { ei->dentry = NULL; } @@ -962,13 +937,65 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry return ERR_PTR(-ENOMEM); } +static LLIST_HEAD(free_list); + +static void eventfs_workfn(struct work_struct *work) +{ + struct eventfs_inode *ei, *tmp; + struct llist_node *llnode; + + llnode = llist_del_all(&free_list); + llist_for_each_entry_safe(ei, tmp, llnode, llist) { + /* This dput() matches the dget() from unhook_dentry() */ + for (int i = 0; i < ei->nr_entries; i++) { + if (ei->d_children[i]) + dput(ei->d_children[i]); + } + /* This should only get here if it had a dentry */ + if (!WARN_ON_ONCE(!ei->dentry)) + dput(ei->dentry); + } +} + +static DECLARE_WORK(eventfs_work, eventfs_workfn); + static void free_rcu_ei(struct rcu_head *head) { struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu); + if (ei->dentry) { + /* Do not free the ei until all references of dentry are gone */ + if (llist_add(&ei->llist, &free_list)) + queue_work(system_unbound_wq, &eventfs_work); + return; + } + + /* If the ei doesn't have a dentry, neither should its children */ + for (int i = 0; i < ei->nr_entries; i++) { + WARN_ON_ONCE(ei->d_children[i]); + } + free_ei(ei); } +static void unhook_dentry(struct dentry *dentry) +{ + if (!dentry) + return; + + /* Keep the dentry from being freed yet (see eventfs_workfn()) */ + dget(dentry); + + dentry->d_fsdata = NULL; + d_invalidate(dentry); + mutex_lock(&eventfs_mutex); + /* dentry should now have at least a single reference */ + WARN_ONCE((int)d_count(dentry) < 1, + "dentry %px (%s) less than one reference (%d) after invalidate\n", + dentry, dentry->d_name.name, d_count(dentry)); + mutex_unlock(&eventfs_mutex); +} + /** * eventfs_remove_rec - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. @@ -1006,33 +1033,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, list_add_tail(&ei->del_list, head); } -static void unhook_dentry(struct dentry **dentry, struct dentry **list) -{ - if (*dentry) { - unsigned long ptr = (unsigned long)*list; - - /* Keep the dentry from being freed yet */ - dget(*dentry); - - /* - * Paranoid: The dget() above should prevent the dentry - * from being freed and calling eventfs_set_ei_status_free(). - * But just in case, set the link list LSB pointer to 1 - * and have eventfs_set_ei_status_free() check that to - * make sure that if it does happen, it will not think - * the d_fsdata is an eventfs_inode. - * - * For this to work, no eventfs_inode should be allocated - * on a odd space, as the ef should always be allocated - * to be at least word aligned. Check for that too. - */ - WARN_ON_ONCE(ptr & 1); - - (*dentry)->d_fsdata = (void *)(ptr | 1); - *list = *dentry; - *dentry = NULL; - } -} /** * eventfs_remove_dir - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. @@ -1043,40 +1043,28 @@ void eventfs_remove_dir(struct eventfs_inode *ei) { struct eventfs_inode *tmp; LIST_HEAD(ei_del_list); - struct dentry *dentry_list = NULL; - struct dentry *dentry; - int i; if (!ei) return; + /* + * Move the deleted eventfs_inodes onto the ei_del_list + * which will also set the is_freed value. Note, this has to be + * done under the eventfs_mutex, but the deletions of + * the dentries must be done outside the eventfs_mutex. + * Hence moving them to this temporary list. + */ mutex_lock(&eventfs_mutex); eventfs_remove_rec(ei, &ei_del_list, 0); + mutex_unlock(&eventfs_mutex); list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) { - for (i = 0; i < ei->nr_entries; i++) - unhook_dentry(&ei->d_children[i], &dentry_list); - unhook_dentry(&ei->dentry, &dentry_list); + for (int i = 0; i < ei->nr_entries; i++) + unhook_dentry(ei->d_children[i]); + unhook_dentry(ei->dentry); + list_del(&ei->del_list); call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); } - mutex_unlock(&eventfs_mutex); - - while (dentry_list) { - unsigned long ptr; - - dentry = dentry_list; - ptr = (unsigned long)dentry->d_fsdata & ~1UL; - dentry_list = (struct dentry *)ptr; - dentry->d_fsdata = NULL; - d_invalidate(dentry); - mutex_lock(&eventfs_mutex); - /* dentry should now have at least a single reference */ - WARN_ONCE((int)d_count(dentry) < 1, - "dentry %px (%s) less than one reference (%d) after invalidate\n", - dentry, dentry->d_name.name, d_count(dentry)); - mutex_unlock(&eventfs_mutex); - dput(dentry); - } } /** diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 5f60bcd69289..06a1f220b901 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -54,10 +54,12 @@ struct eventfs_inode { void *data; /* * Union - used for deletion + * @llist: for calling dput() if needed after RCU * @del_list: list of eventfs_inode to delete * @rcu: eventfs_inode to delete in RCU */ union { + struct llist_node llist; struct list_head del_list; struct rcu_head rcu; }; From patchwork Wed Nov 1 17:25:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442950 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8475B179B5 for ; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71028C433AB; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzS-00EZvQ-1i; Wed, 01 Nov 2023 13:26:50 -0400 Message-ID: <20231101172650.340876747@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:48 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher Subject: [PATCH v6 7/8] eventfs: Remove special processing of dput() of events directory References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" The top level events directory is no longer special with regards to how it should be delete. Remove the extra processing for it in eventfs_set_ei_status_free(). Cc: Ajay Kaher Signed-off-by: Steven Rostedt (Google) Reviewed-by: Masami Hiramatsu (Google) --- Changes since v5: https://lkml.kernel.org/r/20231031223421.185413077@goodmis.org - Resynced to this patch series fs/tracefs/event_inode.c | 19 ++----------------- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 0a04ae0ca8c8..0087a3f455f1 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -274,28 +274,11 @@ static void free_ei(struct eventfs_inode *ei) */ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) { - struct tracefs_inode *ti_parent; struct eventfs_inode *ei; int i; - /* The top level events directory may be freed by this */ - if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) { - mutex_lock(&eventfs_mutex); - ei = ti->private; - /* Nothing should access this, but just in case! */ - ti->private = NULL; - mutex_unlock(&eventfs_mutex); - - free_ei(ei); - return; - } - mutex_lock(&eventfs_mutex); - ti_parent = get_tracefs(dentry->d_parent->d_inode); - if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) - goto out; - ei = dentry->d_fsdata; if (!ei) goto out; @@ -920,6 +903,8 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations; + dentry->d_fsdata = ei; + /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); d_instantiate(dentry, inode); From patchwork Wed Nov 1 17:25:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13442951 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16AC8179B5 for ; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: by smtp.kernel.org (Postfix) with ESMTPSA id B449EC433B6; Wed, 1 Nov 2023 17:26:51 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.96) (envelope-from ) id 1qyEzS-00EZw0-2O; Wed, 01 Nov 2023 13:26:50 -0400 Message-ID: <20231101172650.552471568@goodmis.org> User-Agent: quilt/0.66 Date: Wed, 01 Nov 2023 13:25:49 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , stable@vger.kernel.org, Al Viro Subject: [PATCH v6 8/8] eventfs: Use simple_recursive_removal() to clean up dentries References: <20231101172541.971928390@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Looking at how dentry is removed via the tracefs system, I found that eventfs does not do everything that it did under tracefs. The tracefs removal of a dentry calls simple_recursive_removal() that does a lot more than a simple d_invalidate(). As it should be a requirement that any eventfs_inode that has a dentry, so does its parent. When removing a eventfs_inode, if it has a dentry, a call to simple_recursive_removal() on that dentry should clean up all the dentries underneath it. Add WARN_ON_ONCE() to check for the parent having a dentry if any children do. Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/ Cc: stable@vger.kernel.org Cc: Al Viro Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) Reviewed-by: Masami Hiramatsu (Google) --- Changes since the last patch: https://lore.kernel.org/linux-trace-kernel/20231031144703.71eef3a0@gandalf.local.home - Was originally called: eventfs: Process deletion of dentry more thoroughly - Al Viro pointed out that I could use simple_recursive_removal() instead. I had originally thought that I could not, but looking deeper into it, and realizing that if a dentry exists on any eventfs_inode, then all the parent eventfs_inode of that would als have a dentry. Hence, calling simple_recursive_removal() on the top dentry would clean up all the children dentries as well. Doing it his way cleans up the code quite a bit! fs/tracefs/event_inode.c | 77 +++++++++++++++++++++++----------------- fs/tracefs/internal.h | 2 -- 2 files changed, 44 insertions(+), 35 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 0087a3f455f1..f8a594a50ae6 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -967,30 +967,29 @@ static void unhook_dentry(struct dentry *dentry) { if (!dentry) return; - - /* Keep the dentry from being freed yet (see eventfs_workfn()) */ + /* + * Need to add a reference to the dentry that is expected by + * simple_recursive_removal(), which will include a dput(). + */ dget(dentry); - dentry->d_fsdata = NULL; - d_invalidate(dentry); - mutex_lock(&eventfs_mutex); - /* dentry should now have at least a single reference */ - WARN_ONCE((int)d_count(dentry) < 1, - "dentry %px (%s) less than one reference (%d) after invalidate\n", - dentry, dentry->d_name.name, d_count(dentry)); - mutex_unlock(&eventfs_mutex); + /* + * Also add a reference for the dput() in eventfs_workfn(). + * That is required as that dput() will free the ei after + * the SRCU grace period is over. + */ + dget(dentry); } /** * eventfs_remove_rec - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. - * @head: the list head to place the deleted @ei and children * @level: prevent recursion from going more than 3 levels deep. * * This function recursively removes eventfs_inodes which * contains info of files and/or directories. */ -static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, int level) +static void eventfs_remove_rec(struct eventfs_inode *ei, int level) { struct eventfs_inode *ei_child; @@ -1009,13 +1008,26 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, /* search for nested folders or files */ list_for_each_entry_srcu(ei_child, &ei->children, list, lockdep_is_held(&eventfs_mutex)) { - eventfs_remove_rec(ei_child, head, level + 1); + /* Children only have dentry if parent does */ + WARN_ON_ONCE(ei_child->dentry && !ei->dentry); + eventfs_remove_rec(ei_child, level + 1); } + ei->is_freed = 1; + for (int i = 0; i < ei->nr_entries; i++) { + if (ei->d_children[i]) { + /* Children only have dentry if parent does */ + WARN_ON_ONCE(!ei->dentry); + unhook_dentry(ei->d_children[i]); + } + } + + unhook_dentry(ei->dentry); + list_del_rcu(&ei->list); - list_add_tail(&ei->del_list, head); + call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); } /** @@ -1026,30 +1038,22 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, */ void eventfs_remove_dir(struct eventfs_inode *ei) { - struct eventfs_inode *tmp; - LIST_HEAD(ei_del_list); + struct dentry *dentry; if (!ei) return; - /* - * Move the deleted eventfs_inodes onto the ei_del_list - * which will also set the is_freed value. Note, this has to be - * done under the eventfs_mutex, but the deletions of - * the dentries must be done outside the eventfs_mutex. - * Hence moving them to this temporary list. - */ mutex_lock(&eventfs_mutex); - eventfs_remove_rec(ei, &ei_del_list, 0); + dentry = ei->dentry; + eventfs_remove_rec(ei, 0); mutex_unlock(&eventfs_mutex); - list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) { - for (int i = 0; i < ei->nr_entries; i++) - unhook_dentry(ei->d_children[i]); - unhook_dentry(ei->dentry); - list_del(&ei->del_list); - call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); - } + /* + * If any of the ei children has a dentry, then the ei itself + * must have a dentry. + */ + if (dentry) + simple_recursive_removal(dentry, NULL); } /** @@ -1060,10 +1064,17 @@ void eventfs_remove_dir(struct eventfs_inode *ei) */ void eventfs_remove_events_dir(struct eventfs_inode *ei) { - struct dentry *dentry = ei->dentry; + struct dentry *dentry; + dentry = ei->dentry; eventfs_remove_dir(ei); - /* Matches the dget() from eventfs_create_events_dir() */ + /* + * Matches the dget() done by tracefs_start_creating() + * in eventfs_create_events_dir() when it the dentry was + * created. In other words, it's a normal dentry that + * sticks around while the other ei->dentry are created + * and destroyed dynamically. + */ dput(dentry); } diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 06a1f220b901..ccee18ca66c7 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -55,12 +55,10 @@ struct eventfs_inode { /* * Union - used for deletion * @llist: for calling dput() if needed after RCU - * @del_list: list of eventfs_inode to delete * @rcu: eventfs_inode to delete in RCU */ union { struct llist_node llist; - struct list_head del_list; struct rcu_head rcu; }; unsigned int is_freed:1;