From patchwork Wed Dec 9 18:04:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 11962009 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB1E7C4361B for ; Wed, 9 Dec 2020 18:06:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6972722ADC for ; Wed, 9 Dec 2020 18:06:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732676AbgLISGH (ORCPT ); Wed, 9 Dec 2020 13:06:07 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:34642 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730097AbgLISGA (ORCPT ); Wed, 9 Dec 2020 13:06:00 -0500 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1kn3q1-0053Tt-Ky; Wed, 09 Dec 2020 11:05:17 -0700 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1kn3q0-0002xR-LT; Wed, 09 Dec 2020 11:05:17 -0700 From: ebiederm@xmission.com (Eric W. Biederman) To: Al Viro Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Linus Torvalds , Christian Brauner , Oleg Nesterov , Jann Horn References: <87r1on1v62.fsf@x220.int.ebiederm.org> <20201120231441.29911-15-ebiederm@xmission.com> <20201207232900.GD4115853@ZenIV.linux.org.uk> <877dprvs8e.fsf@x220.int.ebiederm.org> <20201209040731.GK3579531@ZenIV.linux.org.uk> <877dprtxly.fsf@x220.int.ebiederm.org> <20201209142359.GN3579531@ZenIV.linux.org.uk> Date: Wed, 09 Dec 2020 12:04:38 -0600 In-Reply-To: <20201209142359.GN3579531@ZenIV.linux.org.uk> (Al Viro's message of "Wed, 9 Dec 2020 14:23:59 +0000") Message-ID: <87o8j2svnt.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-XM-SPF: eid=1kn3q0-0002xR-LT;;;mid=<87o8j2svnt.fsf_-_@x220.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/AVDZYCvBzPyVf4VIYpXLHcjRbyxvbzzo= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH] files: rcu free files_struct X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This makes it safe to deference task->files with just rcu_read_lock held, removing the need to take task_lock. Signed-off-by: Eric W. Biederman --- I have cleaned up my patch a little more and made this a little more explicitly rcu. I took a completley different approach to documenting the use of rcu_head than we described that seems a little more robust. fs/file.c | 53 ++++++++++++++++++++++++++--------------- include/linux/fdtable.h | 1 + kernel/fork.c | 4 ++-- 3 files changed, 37 insertions(+), 21 deletions(-) diff --git a/fs/file.c b/fs/file.c index 412033d8cfdf..88064f185560 100644 --- a/fs/file.c +++ b/fs/file.c @@ -379,7 +379,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int return NULL; } -static struct fdtable *close_files(struct files_struct * files) +static void close_files(struct files_struct * files) { /* * It is safe to dereference the fd table without RCU or @@ -397,8 +397,9 @@ static struct fdtable *close_files(struct files_struct * files) set = fdt->open_fds[j++]; while (set) { if (set & 1) { - struct file * file = xchg(&fdt->fd[i], NULL); + struct file * file = fdt->fd[i]; if (file) { + rcu_assign_pointer(fdt->fd[i], NULL); filp_close(file, files); cond_resched(); } @@ -407,19 +408,35 @@ static struct fdtable *close_files(struct files_struct * files) set >>= 1; } } +} - return fdt; +static void free_files_struct_rcu(struct rcu_head *rcu) +{ + struct files_struct *files = + container_of(rcu, struct files_struct, fdtab.rcu); + struct fdtable *fdt = rcu_dereference_raw(files->fdt); + + /* free the arrays if they are not embedded */ + if (fdt != &files->fdtab) + __free_fdtable(fdt); + kmem_cache_free(files_cachep, files); } void put_files_struct(struct files_struct *files) { if (atomic_dec_and_test(&files->count)) { - struct fdtable *fdt = close_files(files); - - /* free the arrays if they are not embedded */ - if (fdt != &files->fdtab) - __free_fdtable(fdt); - kmem_cache_free(files_cachep, files); + close_files(files); + /* + * The rules for using the rcu_head in fdtable: + * + * If the fdtable is being freed independently + * of the files_struct fdtable->rcu is used. + * + * When the fdtable and files_struct are freed + * together the rcu_head from the fdtable embedded in + * files_struct is used. + */ + call_rcu(&files->fdtab.rcu, free_files_struct_rcu); } } @@ -822,12 +839,14 @@ EXPORT_SYMBOL(fget_raw); struct file *fget_task(struct task_struct *task, unsigned int fd) { + struct files_struct *files; struct file *file = NULL; - task_lock(task); - if (task->files) - file = __fget_files(task->files, fd, 0, 1); - task_unlock(task); + rcu_read_lock(); + files = rcu_dereference(task->files); + if (files) + file = __fget_files(files, fd, 0, 1); + rcu_read_unlock(); return file; } @@ -838,11 +857,9 @@ struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd) struct files_struct *files; struct file *file = NULL; - task_lock(task); - files = task->files; + files = rcu_dereference(task->files); if (files) file = files_lookup_fd_rcu(files, fd); - task_unlock(task); return file; } @@ -854,8 +871,7 @@ struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret unsigned int fd = *ret_fd; struct file *file = NULL; - task_lock(task); - files = task->files; + files = rcu_dereference(task->files); if (files) { for (; fd < files_fdtable(files)->max_fds; fd++) { file = files_lookup_fd_rcu(files, fd); @@ -863,7 +879,6 @@ struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret break; } } - task_unlock(task); *ret_fd = fd; return file; } diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index d0e78174874a..37dcface020f 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -30,6 +30,7 @@ struct fdtable { unsigned long *close_on_exec; unsigned long *open_fds; unsigned long *full_fds_bits; + /* Used for freeing fdtable and any containing files_struct */ struct rcu_head rcu; }; diff --git a/kernel/fork.c b/kernel/fork.c index 4d0ae6f827df..eca474a1586a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2982,7 +2982,7 @@ int ksys_unshare(unsigned long unshare_flags) if (new_fd) { fd = current->files; - current->files = new_fd; + rcu_assign_pointer(current->files, new_fd); new_fd = fd; } @@ -3035,7 +3035,7 @@ int unshare_files(void) old = task->files; task_lock(task); - task->files = copy; + rcu_assign_pointer(task->files, copy); task_unlock(task); put_files_struct(old); return 0;