From patchwork Sat Jun 3 05:22:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Khazhy Kumykov X-Patchwork-Id: 9763873 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4D95B60363 for ; Sat, 3 Jun 2017 05:22:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30AC6285A1 for ; Sat, 3 Jun 2017 05:22:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 14DD7285F5; Sat, 3 Jun 2017 05:22:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 59E85285A1 for ; Sat, 3 Jun 2017 05:22:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750876AbdFCFWm (ORCPT ); Sat, 3 Jun 2017 01:22:42 -0400 Received: from mail-pg0-f45.google.com ([74.125.83.45]:34118 "EHLO mail-pg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750753AbdFCFWl (ORCPT ); Sat, 3 Jun 2017 01:22:41 -0400 Received: by mail-pg0-f45.google.com with SMTP id t185so3682303pgd.1 for ; Fri, 02 Jun 2017 22:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=P2Q/Qd/I//jNIS8El8NekVb9S0BqwrMwZwySgHCJmsg=; b=DiWqYaCjzkt3tYIbiNRFP9/S37Rc/NsuIXiIsZk8YfNzKa1V3D6PyTBKJH7swxLGX1 vrcCgJqNOMMI6IH9IYkFblKyKtxsqPS1Vp7lII/3y9WuGoBJQcrFpY4pP9Kja8Rn+2Va ZATKEjY6SA+2vy/ewU9iLxBDzrErKCNMx3Y69xEFtGZ+2/2gNCw9wktBI20yUW/QiTZX qzhj6N4+2UJdd+i4n5+2jnUkVil4YQpTREXMT8MeXULrbl1etMvpvHNOWc6PXLTumtKK NgOPH4skmvomm27mPxQLJqZuN8ZyKJvysuxnFxLDPvKHfkKlH2sxG9BMhuA/9u7s7ny5 guuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=P2Q/Qd/I//jNIS8El8NekVb9S0BqwrMwZwySgHCJmsg=; b=Af1qwA3kOKz3Lx/WDVpUIH21TRGmB4PmE6eiXcbarLXBnlcKjKexgznInYmmHQF693 djlz+7YVzTRxCYw3lUpKBXEAkU0VE7hQEkyIqAFOh0QXr92+DjQLmDcs+7zFiiZOt7Ws 5bLKt1wo4gAgteDkYaU6SySpEzIja0NPT0adKBdCY/zs3AHzf8jf87IcR0OfOqEzArqs 8lqJozKXJAlr1FRwcDhX8nlAioisU6+S0QcHzZcxUTLMCGB8O7xcfIMvpr114fS3gcvH wuCTalbrIUGgycGLw5JQvvpknv5J0g05SvE+PxZ+XFPpcxQHgiZmj+IuiwhYopyBsCMs L1Lg== X-Gm-Message-State: AODbwcDKfH3RSzLBXUzN87tuITHXMP6/uaM/UhASjjPtqRRLlv2dPXTW iA0i0ulz1K0tcgzy1jbdzJSyP/JWv21t X-Received: by 10.84.237.2 with SMTP id s2mr3488155plk.176.1496467360217; Fri, 02 Jun 2017 22:22:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.168.74 with HTTP; Fri, 2 Jun 2017 22:22:39 -0700 (PDT) In-Reply-To: <20170603011241.GH6365@ZenIV.linux.org.uk> References: <20170603011241.GH6365@ZenIV.linux.org.uk> From: Khazhismel Kumykov Date: Fri, 2 Jun 2017 22:22:39 -0700 Message-ID: Subject: Re: Hang/soft lockup in d_invalidate with simultaneous calls To: Al Viro Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Miklos Szeredi Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Jun 2, 2017 at 6:12 PM, Al Viro wrote: > Part of that could be relieved if we turned check_and_drop() into > static void check_and_drop(void *_data) > { > struct detach_data *data = _data; > > if (!data->mountpoint && list_empty(&data->select.dispose)) > __d_drop(data->select.start); > } So with this change, d_invalidate will drop the starting dentry before all it's children are dropped? Would it make sense to just drop it right off the bat, and let one task handle shrinking all it's children? > > It doesn't solve the entire problem, though - we still could get too > many threads into that thing before any of them gets to that __d_drop(). Yes, the change isn't sufficient for my repro, as many threads get to the loop before the drop, although new tasks don't get stuck in the same loop after the dentry is dropped. > I wonder if we should try and collect more victims; that could lead > to contention of its own, but... From my understanding, the contention is the worst when one task is shrinking everything, and several other tasks are busily looping walking the dentry until everything is done. In this case, the tasks busily looping d_walk hold the d_lock for a dentry while walking over all it's children, then soon after it finishes the d_walk, it queues again to walk again, while shrink_dentry_list releases and re-grabs for each entry. If we limit the number of items we pass to shrink_dentry_list at one time things actually look quite a bit better. e.g., in testing arbitrarily limiting select.dispose to 1000 elements does fix my repro. e.g. /* @@ -1397,6 +1399,9 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) */ if (!list_empty(&data->dispose)) ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; + + if (data->actually_found > 1000) + ret = D_WALK_QUIT; out: return ret; } @@ -1415,6 +1420,7 @@ void shrink_dcache_parent(struct dentry *parent) INIT_LIST_HEAD(&data.dispose); data.start = parent; data.found = 0; + data.actually_found = 0; d_walk(parent, &data, select_collect, NULL); if (!data.found) @@ -1536,6 +1542,7 @@ void d_invalidate(struct dentry *dentry) INIT_LIST_HEAD(&data.select.dispose); data.select.start = dentry; data.select.found = 0; + data.select.actually_found = 0; d_walk(dentry, &data, detach_and_collect, check_and_drop); diff --git a/fs/dcache.c b/fs/dcache.c index 22af360ceca3..3892e0eb7ec2 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1367,6 +1367,7 @@ struct select_data { struct dentry *start; struct list_head dispose; int found; + int actually_found; }; static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) @@ -1388,6 +1389,7 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) if (!dentry->d_lockref.count) { d_shrink_add(dentry, &data->dispose); data->found++; + data->actually_found++; } }