From patchwork Tue Apr 21 18:05:43 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 6252491 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 978149F1C4 for ; Tue, 21 Apr 2015 18:06:06 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9BB6420211 for ; Tue, 21 Apr 2015 18:06:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8899620218 for ; Tue, 21 Apr 2015 18:06:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753447AbbDUSFr (ORCPT ); Tue, 21 Apr 2015 14:05:47 -0400 Received: from mail-ie0-f177.google.com ([209.85.223.177]:35944 "EHLO mail-ie0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753015AbbDUSFr (ORCPT ); Tue, 21 Apr 2015 14:05:47 -0400 Received: by iebrs15 with SMTP id rs15so19949355ieb.3; Tue, 21 Apr 2015 11:05:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:mime-version:content-transfer-encoding; bh=6SVZUzXUQpPAMGR4DKW4hnEioIUu8acEruINPx4bn3A=; b=Vh+f0uI6mG4UfKveO1xauQvbrefIQibcACgKYL7YEDxDnVK0ilUCZuzvarsstC5HwI WySr0V87Uwk1MVjyIwXpPtHhQPqK/lOzjqR5eIB+5z5ipTPsenJZebGj+6wU8a6s6nTk 4t4cGen2m2zT6M6s2Ymo9rzxjsfOsgjvrXeZAIUamMm0tuzeQ/SSH5YTczFNxmrvpTZ1 KEnRBOBNGU3ifsF188zqAsklBX9TXDktIfZeazdITDtLadvJ0F3NMwCumw7ongL55uR1 jbsNM/hZ+rHuENPnw2D9KPUzNQobzAzaz8rsGJw7rwjZ4yGSAD9HymLxVDuuqb71MD62 BnLw== X-Received: by 10.42.185.12 with SMTP id cm12mr5140662icb.0.1429639545992; Tue, 21 Apr 2015 11:05:45 -0700 (PDT) Received: from ?IPv6:2620:0:1000:3e02:ac4a:6774:8cec:a3fd? ([2620:0:1000:3e02:ac4a:6774:8cec:a3fd]) by mx.google.com with ESMTPSA id o2sm1679951igr.9.2015.04.21.11.05.44 (version=TLSv1.2 cipher=AES128-GCM-SHA256 bits=128/128); Tue, 21 Apr 2015 11:05:45 -0700 (PDT) Message-ID: <1429639543.7346.329.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install From: Eric Dumazet To: Mateusz Guzik , Al Viro Cc: Andrew Morton , "Paul E. McKenney" , Yann Droneaud , Konstantin Khlebnikov , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Tue, 21 Apr 2015 11:05:43 -0700 In-Reply-To: <1429562991.7346.290.camel@edumazet-glaptop2.roam.corp.google.com> References: <20150416121628.GA20615@mguzik> <1429307216.7346.255.camel@edumazet-glaptop2.roam.corp.google.com> <20150417221646.GA15589@mguzik> <20150417230252.GE889@ZenIV.linux.org.uk> <20150420130633.GA2513@mguzik> <20150420134326.GC2513@mguzik> <20150420151054.GD2513@mguzik> <1429550126.7346.268.camel@edumazet-glaptop2.roam.corp.google.com> <1429562991.7346.290.camel@edumazet-glaptop2.roam.corp.google.com> X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2015-04-20 at 13:49 -0700, Eric Dumazet wrote: > On Mon, 2015-04-20 at 10:15 -0700, Eric Dumazet wrote: > > On Mon, 2015-04-20 at 17:10 +0200, Mateusz Guzik wrote: > > > > > Sorry for spam but I came up with another hack. :) > > > > > > The idea is that we can have a variable which would signify the that > > > given thread is playing with fd table in fd_install (kind of a lock > > > embedded into task_struct). We would also have a flag in files struct > > > indicating that a thread would like to resize it. > > > > > > expand_fdtable would set the flag and iterate over all threads waiting > > > for all of them to have the var set to 0. > > > > The opposite : you have to block them in some way and add a rcu_sched() > > or something. > > Here is the patch I cooked here but not yet tested. In following version : 1) I replaced the yield() hack by a proper wait queue. 2) I do not invoke synchronize_sched() for mono threaded programs. 3) I avoid multiple threads doing a resize and then only one wins the deal. (copying/clearing big amount of memory only to discover another guy did the same is a big waste of resources) This seems to run properly on my hosts. Your comments/tests are most welcomed, thanks ! fs/file.c | 41 +++++++++++++++++++++++++++++++++----- include/linux/fdtable.h | 3 ++ 2 files changed, 39 insertions(+), 5 deletions(-) --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/file.c b/fs/file.c index 93c5f89c248b..e0e113a56444 100644 --- a/fs/file.c +++ b/fs/file.c @@ -147,6 +147,9 @@ static int expand_fdtable(struct files_struct *files, int nr) spin_unlock(&files->file_lock); new_fdt = alloc_fdtable(nr); + /* make sure no __fd_install() are still updating fdt */ + if (atomic_read(&files->count) > 1) + synchronize_sched(); spin_lock(&files->file_lock); if (!new_fdt) return -ENOMEM; @@ -170,9 +173,12 @@ static int expand_fdtable(struct files_struct *files, int nr) if (cur_fdt != &files->fdtab) call_rcu(&cur_fdt->rcu, free_fdtable_rcu); } else { + WARN_ON_ONCE(1); /* Somebody else expanded, so undo our attempt */ __free_fdtable(new_fdt); } + /* coupled with smp_rmb() in __fd_install() */ + smp_wmb(); return 1; } @@ -187,19 +193,33 @@ static int expand_fdtable(struct files_struct *files, int nr) static int expand_files(struct files_struct *files, int nr) { struct fdtable *fdt; + int expanded = 0; +begin: fdt = files_fdtable(files); /* Do we need to expand? */ if (nr < fdt->max_fds) - return 0; + return expanded; /* Can we expand? */ if (nr >= sysctl_nr_open) return -EMFILE; + while (unlikely(files->resize_in_progress)) { + spin_unlock(&files->file_lock); + expanded = 1; + wait_event(files->resize_wait, !files->resize_in_progress); + spin_lock(&files->file_lock); + goto begin; + } + /* All good, so we try */ - return expand_fdtable(files, nr); + files->resize_in_progress = true; + expanded = expand_fdtable(files, nr); + files->resize_in_progress = false; + wake_up_all(&files->resize_wait); + return expanded; } static inline void __set_close_on_exec(int fd, struct fdtable *fdt) @@ -256,6 +276,8 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp) atomic_set(&newf->count, 1); spin_lock_init(&newf->file_lock); + newf->resize_in_progress = 0; + init_waitqueue_head(&newf->resize_wait); newf->next_fd = 0; new_fdt = &newf->fdtab; new_fdt->max_fds = NR_OPEN_DEFAULT; @@ -553,11 +575,20 @@ void __fd_install(struct files_struct *files, unsigned int fd, struct file *file) { struct fdtable *fdt; - spin_lock(&files->file_lock); - fdt = files_fdtable(files); + + rcu_read_lock_sched(); + + while (unlikely(files->resize_in_progress)) { + rcu_read_unlock_sched(); + wait_event(files->resize_wait, !files->resize_in_progress); + rcu_read_lock_sched(); + } + /* coupled with smp_wmb() in expand_fdtable() */ + smp_rmb(); + fdt = READ_ONCE(files->fdt); BUG_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); - spin_unlock(&files->file_lock); + rcu_read_unlock_sched(); } void fd_install(unsigned int fd, struct file *file) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 230f87bdf5ad..fbb88740634a 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -47,6 +47,9 @@ struct files_struct { * read mostly part */ atomic_t count; + bool resize_in_progress; + wait_queue_head_t resize_wait; + struct fdtable __rcu *fdt; struct fdtable fdtab; /*