From patchwork Tue Oct 9 18:47:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10633189 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFA13933 for ; Tue, 9 Oct 2018 18:48:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A488B2953E for ; Tue, 9 Oct 2018 18:48:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 98E252955C; Tue, 9 Oct 2018 18:48:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 036E42953E for ; Tue, 9 Oct 2018 18:48:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 325156B000D; Tue, 9 Oct 2018 14:48:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 263816B0010; Tue, 9 Oct 2018 14:48:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E96AB6B0266; Tue, 9 Oct 2018 14:48:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f71.google.com (mail-yw1-f71.google.com [209.85.161.71]) by kanga.kvack.org (Postfix) with ESMTP id AD5A96B000D for ; Tue, 9 Oct 2018 14:48:22 -0400 (EDT) Received: by mail-yw1-f71.google.com with SMTP id x5-v6so1514722ywd.19 for ; Tue, 09 Oct 2018 11:48:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=IJnCLRw3K+q3Md8hRn4AAV6ZWYmte8NOr8osfiV9S1M=; b=E9XvRdewqqJ2rZ+VXVFqevx0Rpxa/Sba9L+z93CHcXyiW9USVWb7sA/A3Jml3XIJKz bcUjzfKysI4VgikusPMGjwz4/OFLg3jGckTqeLdIfxFlFOlrhOD8b6LU1tGtPPgSrrGx WaYrDuyZmFW2xd9C3fbf8uErt20iVxVbzoym9kpHjsCBj8zGQ/X40479yQaTAjJN18XF QWp3MHwt/eTBqiCc9QSHojcIXeHZVPSWZdQSSzdW71RxJ/7DGnbVqYKvwKNkVcQvk6Tq P98GfYvCas+qG/vJsJjB3G/4D1bLy3ImCXRnw9Bfa7vNjVRLoB6p5ZsZRZdi3dx1njjX pG5g== X-Gm-Message-State: ABuFfohdYfgyli52y6Ai6x5kMTblDh1pzhmKRxfmk0Lvo7tt5NH5Vwny p5NcB71PcJAJNp6ac2NyLBqQoXLoX4NRJF6Xdl8nqrgUl9n0dpCCwkMhHiwSnhc/RgkEvTy5pj3 MYz3VpIeBxhOXskg01lTIQ94QSNe5CzQaRAcAIabf86ZlP6OwZmMoFyBrFoqLbufOcNr4H0F9kd sxAYcuWup/3FkOoLlapCV3PlrtOY2MwU/T3jfegoVeJ4LyDi57k5CvZ2xDQMY9MDGRdf4kGDh+z kyA5fy7m43M71AbvUwrZ0Ow8QyQ2zn1tiV9c1jIk6hK1bxVNa+LWMl/ElyMIN9PFMUUbd3xTCC0 2oiFfqUz89L2OHCWXgj6mE2cxDkP5rzg0oCZnZRbp4Cw39Z2rxCw+FPw5cTj52xb34nkEu7GrHt h X-Received: by 2002:a25:1342:: with SMTP id 63-v6mr16352555ybt.495.1539110902379; Tue, 09 Oct 2018 11:48:22 -0700 (PDT) X-Received: by 2002:a25:1342:: with SMTP id 63-v6mr16352524ybt.495.1539110901449; Tue, 09 Oct 2018 11:48:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539110901; cv=none; d=google.com; s=arc-20160816; b=vT0ve1aWfvVuxbH/EcuQK5LZP92ITSyYSf6MxQrw+r0iZRtCjxo68j75RA+BEZApAA fCq1vAaxc6hSd1k+1V0/nvkCztRp1cW1HoVtu2hEZf4AKNMpPYVcxbBeoLWlSQdnbUa0 U7d0NN9AKnOHJmPRyOUaLYInf3HPoGi1YsgHlo4hKTYscsNdVM/0P490573rOjwljXQD J9FQteuXXEULQl9GrA1vUP1IRWzjSygOLsuyucm3e4Q1h0F4HgjiocKke6BBq5j7kOiF CjfZQzQeG7SlKZC1E5GtkPJPIrWVEsNaOM2VFkLnYAnhMyGKoOVAniajaoHWd7Txx/5a NWgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=IJnCLRw3K+q3Md8hRn4AAV6ZWYmte8NOr8osfiV9S1M=; b=cdk33AzRBy3ZKCijW8THwB5fOigCBrHVoJqucyrxFhGdJypQS6ZXOvb0Hma8r8TNOX WFKys+e6caeSLb/utOIgVTOIAuKWstPivbwkN4Sn99EIWi87i1JJPmtSki77qHXKkb4W StK4w/bB/z0+5pcKf9RjZyVAHyxmEIYS/+tXkeRKXm3uzCBGGJu9zNpkZlB8kONUvxiU CVCLefNJEEboiHmbtdJ5i2m0OzQfRLpGNSlLM9J/J5sJZrIa0p1AhFLIbByowF4dzCbn XmEGdd4+cUyQ/7XrCg4ccZui4dT4OTOa6fE6VXfoT40NgCF9oMHiEwEb6zuE4zDP9CZj s08w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=el2EUKH7; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id p3-v6sor2976494ywf.7.2018.10.09.11.48.21 for (Google Transport Security); Tue, 09 Oct 2018 11:48:21 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=el2EUKH7; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IJnCLRw3K+q3Md8hRn4AAV6ZWYmte8NOr8osfiV9S1M=; b=el2EUKH7k0vuEAzqiBypMiUy3RyKKukFBLG0f2cl17Plo4OYPhfACk0S11Jf4PBgRa b4cTCDsdeeYj81uWm6iEZDx/TlqhMh6WonLGKHU68SbN9yRHOdh3oowHVP2erFo64mei yLcny3Kx2s7peV4X6z8UxNZJo7Emr4+/0/TlBLLnbBXBQd0WTufMOXLdu9ojkrkmoJd5 WQqVYKT/pDBkGYMiEWUPlkG9e8GPbNmyeU6jFf9IgVJNEX2RvAcU+rHPlY4ml0UdhAGL VbiaVpjlKmGyiv+efZ5DRsc3zFholac0xI4hhQ834jskAMLJwwhtTovei1j31saiN2YJ OveQ== X-Google-Smtp-Source: ACcGV62qRx2HfQJfPX3jBPoOLE9skEPk+s+BCfnqkpFsTggsdSHGjGMixGKyjd7YcA94hpHjdAvWHQ== X-Received: by 2002:a81:e901:: with SMTP id d1-v6mr16555259ywm.383.1539110901166; Tue, 09 Oct 2018 11:48:21 -0700 (PDT) Received: from localhost ([2620:10d:c091:200::6:14c5]) by smtp.gmail.com with ESMTPSA id x133-v6sm13947696ywg.66.2018.10.09.11.48.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 09 Oct 2018 11:48:20 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Rik van Riel , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/4] mm: zero-seek shrinkers Date: Tue, 9 Oct 2018 14:47:33 -0400 Message-Id: <20181009184732.762-5-hannes@cmpxchg.org> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20181009184732.762-1-hannes@cmpxchg.org> References: <20181009184732.762-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The page cache and most shrinkable slab caches hold data that has been read from disk, but there are some caches that only cache CPU work, such as the dentry and inode caches of procfs and sysfs, as well as the subset of radix tree nodes that track non-resident page cache. Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for the shrinker's seeks setting tells the reclaim algorithm that for every two page cache pages scanned it should scan one slab object. This is a bogus setting. A virtual inode that required no IO to create is not twice as valuable as a page cache page; shadow cache entries with eviction distances beyond the size of memory aren't either. In most cases, the behavior in practice is still fine. Such virtual caches don't tend to grow and assert themselves aggressively, and usually get picked up before they cause problems. But there are scenarios where that's not true. Our database workloads suffer from two of those. For one, their file workingset is several times bigger than available memory, which has the kernel aggressively create shadow page cache entries for the non-resident parts of it. The workingset code does tell the VM that most of these are expendable, but the VM ends up balancing them 2:1 to cache pages as per the seeks setting. This is a huge waste of memory. These workloads also deal with tens of thousands of open files and use /proc for introspection, which ends up growing the proc_inode_cache to absurdly large sizes - again at the cost of valuable cache space, which isn't a reasonable trade-off, given that proc inodes can be re-created without involving the disk. This patch implements a "zero-seek" setting for shrinkers that results in a target ratio of 0:1 between their objects and IO-backed caches. This allows such virtual caches to grow when memory is available (they do cache/avoid CPU work after all), but effectively disables them as soon as IO-backed objects are under pressure. It then switches the shrinkers for procfs and sysfs metadata, as well as excess page cache shadow nodes, to the new zero-seek setting. Reported-by: Domas Mituzas Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel --- fs/kernfs/mount.c | 3 +++ fs/proc/root.c | 3 +++ mm/vmscan.c | 15 ++++++++++++--- mm/workingset.c | 2 +- 4 files changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index 1bd43f6947f3..7d56b624e0dc 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -251,6 +251,9 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k sb->s_export_op = &kernfs_export_ops; sb->s_time_gran = 1; + /* sysfs dentries and inodes don't require IO to create */ + sb->s_shrink.seeks = 0; + /* get root inode, initialize and unlock it */ mutex_lock(&kernfs_mutex); inode = kernfs_get_inode(sb, info->root->kn); diff --git a/fs/proc/root.c b/fs/proc/root.c index 8912a8b57ac3..74975ca77b71 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -127,6 +127,9 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc) */ s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH; + /* procfs dentries and inodes don't require IO to create */ + s->s_shrink.seeks = 0; + pde_get(&proc_root); root_inode = proc_get_inode(s, &proc_root); if (!root_inode) { diff --git a/mm/vmscan.c b/mm/vmscan.c index a859f64a2166..62ac0c488624 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -474,9 +474,18 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0); total_scan = nr; - delta = freeable >> priority; - delta *= 4; - do_div(delta, shrinker->seeks); + if (shrinker->seeks) { + delta = freeable >> priority; + delta *= 4; + do_div(delta, shrinker->seeks); + } else { + /* + * These objects don't require any IO to create. Trim + * them aggressively under memory pressure to keep + * them from causing refetches in the IO caches. + */ + delta = freeable / 2; + } /* * Make sure we apply some minimal pressure on default priority diff --git a/mm/workingset.c b/mm/workingset.c index cfdf6adf7e7c..97523c4d3496 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -523,7 +523,7 @@ static unsigned long scan_shadow_nodes(struct shrinker *shrinker, static struct shrinker workingset_shadow_shrinker = { .count_objects = count_shadow_nodes, .scan_objects = scan_shadow_nodes, - .seeks = DEFAULT_SEEKS, + .seeks = 0, /* ->count reports only fully expendable nodes */ .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, };