From patchwork Fri Apr 27 05:29:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10367431 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 30643602B3 for ; Fri, 27 Apr 2018 05:29:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D4B4292FB for ; Fri, 27 Apr 2018 05:29:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0E840292FD; Fri, 27 Apr 2018 05:29:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A436C292F8 for ; Fri, 27 Apr 2018 05:29:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751610AbeD0F3Q (ORCPT ); Fri, 27 Apr 2018 01:29:16 -0400 Received: from mx2.suse.de ([195.135.220.15]:54783 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751291AbeD0F3Q (ORCPT ); Fri, 27 Apr 2018 01:29:16 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "Cc" Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 70BCAACE0; Fri, 27 Apr 2018 05:29:14 +0000 (UTC) From: NeilBrown To: Trond Myklebust , Anna Schumaker , "Paul E. McKenney" Date: Fri, 27 Apr 2018 15:29:03 +1000 Cc: Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] NFS: Avoid quadratic search when freeing delegations. Message-ID: <87h8nxkyjk.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If an NFS client has 10,000 delegations which are between 90 and 180 seconds old, and 10,000 which are between 180 and 270 seconds old, with none of them still in use, it is likely that the old ones are at the end of the list. The first 10,000 will not be marked to be returned, the last 10,000 will. To return these, the code starts at the front of the list and finds the first delegation needing to be returned (testing 10,000 on the way). Then it drops rcu_readlock(), returns the delegation and starts again, and does this 10,000 times. As delegation return is async, it may never block, so these 10,000 delegation will be returned without stopping for a breath. The soft-lock detector will notice and complain. This patch makes 3 changes. 1/ cond_resched() is added so that the soft-lockup detector doesn't notice. 2/ A place-holder (an inode) is kept when locks are dropped, so that the place can usually be found again after taking rcu_readlock(). This means we don't need to skip over 10,000 entries 10,000 times, 100 million pointless operations - which could eaisly be a larger number. 3/ If nfs_sb_active() fails, break out of the loop - there is no point in continuing. The patch also add list_for_each_entry_from_rcu() to rculist.h in order to achieve 2/. Signed-off-by: NeilBrown --- Hi, I'm hoping one of the RCU reviewers will provide an Acked-by for the rculist.h change. If you'ld like 3 patches instead of just one, please let me know. But they all see to fit well together. thanks, NeilBrown fs/nfs/delegation.c | 57 ++++++++++++++++++++++++++++++++++++++++++++----- include/linux/rculist.h | 10 +++++++++ 2 files changed, 62 insertions(+), 5 deletions(-) diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 1819d0d0ba4b..c3d9e21ab440 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -483,19 +483,56 @@ static bool nfs_delegation_need_return(struct nfs_delegation *delegation) int nfs_client_return_marked_delegations(struct nfs_client *clp) { struct nfs_delegation *delegation; + struct nfs_delegation *prev; struct nfs_server *server; struct inode *inode; + struct inode *place_holder = NULL; int err = 0; restart: + /* + * To avoid quadratic looping we hold an reference + * to an inode place_holder. Each time we restart, we + * list nfs_servers from the server of that inode, and + * delegation in the server from the delegations of that + * inode. + * prev is an RCU-protected pointer to a delegation which + * wasn't marked for return and might be a good choice for + * the next place_holder. + */ rcu_read_lock(); - list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) { - list_for_each_entry_rcu(delegation, &server->delegations, - super_list) { - if (!nfs_delegation_need_return(delegation)) + prev = NULL; + if (place_holder) + server = NFS_SERVER(place_holder); + else + server = list_entry_rcu(clp->cl_superblocks.next, + struct nfs_server, client_link); + list_for_each_entry_from_rcu(server, &clp->cl_superblocks, client_link) { + delegation = NULL; + if (place_holder && server == NFS_SERVER(place_holder)) + delegation = rcu_dereference(NFS_I(place_holder)->delegation); + if (!delegation) + delegation = list_entry_rcu(server->delegations.next, + struct nfs_delegation, super_list); + list_for_each_entry_from_rcu(delegation, &server->delegations, super_list) { + struct inode *to_put = NULL; + + if (!nfs_delegation_need_return(delegation)) { + prev = delegation; continue; + } if (!nfs_sb_active(server->super)) - continue; + break; + + if (prev) { + struct inode *tmp; + tmp = nfs_delegation_grab_inode(prev); + if (tmp) { + to_put = place_holder; + place_holder = tmp; + } + } + inode = nfs_delegation_grab_inode(delegation); if (inode == NULL) { rcu_read_unlock(); @@ -505,16 +542,26 @@ int nfs_client_return_marked_delegations(struct nfs_client *clp) delegation = nfs_start_delegation_return_locked(NFS_I(inode)); rcu_read_unlock(); + if (to_put) { + iput(to_put); + to_put = NULL; + } + err = nfs_end_delegation_return(inode, delegation, 0); iput(inode); nfs_sb_deactive(server->super); + cond_resched(); if (!err) goto restart; set_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state); + if (place_holder) + iput(place_holder); return err; } } rcu_read_unlock(); + if (place_holder) + iput(place_holder); return 0; } diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 127f534fec94..2d86f9869842 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -403,6 +403,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, &pos->member != (head); \ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) +/** + * list_for_each_entry_from_rcu - iterate over a list continuing from current point + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member: the name of the list_node within the struct. + */ +#define list_for_each_entry_from_rcu(pos, head, member) \ + for (; &(pos)->member != (head); \ + pos = list_entry_rcu(pos->member.next, typeof(*(pos)), member)) + /** * hlist_del_rcu - deletes entry from hash list without re-initialization * @n: the element to delete from the hash list.