From patchwork Sun Mar 15 09:53:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11438733 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9D7592A for ; Sun, 15 Mar 2020 07:53:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5B46A206E9 for ; Sun, 15 Mar 2020 07:53:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cipyPITv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B46A206E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C40CC6B0005; Sun, 15 Mar 2020 03:52:59 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BF1CD6B0006; Sun, 15 Mar 2020 03:52:59 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE09B6B0007; Sun, 15 Mar 2020 03:52:59 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id 91FE16B0005 for ; Sun, 15 Mar 2020 03:52:59 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 27E7E181AEF32 for ; Sun, 15 Mar 2020 07:52:59 +0000 (UTC) X-FDA: 76596830478.20.star09_714e06adaeb37 X-Spam-Summary: 2,0,0,caf7b2fae0c0c982,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:2:41:69:355:379:541:800:965:966:967:973:988:989:1260:1345:1437:1535:1605:1606:1730:1747:1777:1792:1801:2196:2198:2199:2200:2393:2525:2553:2559:2563:2682:2685:2693:2731:2740:2859:2895:2897:2912:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4118:4250:4321:4385:4390:4395:4470:4605:5007:6119:6261:6630:6653:7875:7903:9025:9121:9413:10004:11026:11232:11658:11914:12043:12048:12291:12296:12297:12517:12519:12555:12663:12679:12683:12895:12986:13153:13160:13161:13228:13229:14394:14687:21080:21444:21450:21451:21627:21666:21740:21749:21795:21811:21972:30012:30026:30051:30054:30070:30085:30090,0,RBL:209.85.215.194:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0, LFtime:2 X-HE-Tag: star09_714e06adaeb37 X-Filterd-Recvd-Size: 7787 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Sun, 15 Mar 2020 07:52:58 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id a32so6866149pga.4 for ; Sun, 15 Mar 2020 00:52:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=Q6FJASjtZqzOalHsgC8qcYtZ8TkZFFYx7Anoq682QsM=; b=cipyPITvIndnMqeBhyZulRA3kH44VIccHvDGc/5CMJY/f1BZRe1aiXVQPg5yiG7bKh ndpnXGEdVANXgHitFOw0N4oAwt6IEyIq/LVvrSiBWxqW2t4BQkmMgmfpnomsF+QwJIl3 lIA2p3LvSvzXEil9IRrm+q3OZaN7TRpYIEBoSOa1PX26J93gR3fNWRvj5qO/WO+OenNH us7pwQX0w9LTqpRSr1hLNR5LPcykiqg3vMrJus9xKFmS7Oj87CY71CKPFCGckepE3kPu N/2X91nhguCz8T1kySWWFSLBTvG/VDLr659MFc6NN1HPB+eQfkJJhfkUoxl1wrOCJOl2 XEjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Q6FJASjtZqzOalHsgC8qcYtZ8TkZFFYx7Anoq682QsM=; b=O/i3dRVZEdKLcNx3vJeWqsQg+UelbmCZ7L+XVRH6s4OJ1XJq0S8iFe6RqkibKGYXo7 VtKvPKmMf1WB8fnJQB2iQbAsv0zf9ijGmRA69U7rqTrSS+cGRWqC8grmrAc3GKNBRaXS bHOSPAELivqLI652T6whcOxXAVm1ZdUHklTZAIe2XPgx6BUioBToh5gOXEJJWbgxdhwy ACJD6PAEAgVs9u1QbsiO11wwILtuj/d37A4kylBQSILPIEN3pCtHeIg1kNHQrx/hv2zc AuwjRIEoTuuPnN0TtOgcybYLQ4+Dl0bXDOdyr1rr9/C9+nCNToEumXH7SaIxzb+yPqwS y5IQ== X-Gm-Message-State: ANhLgQ3WY+SXzVOKe6q1RJTQVcKiLZbMUY16Z2rt1xYoZ3pmutonz3K4 betuzfIpgwTliurYOLZ7B9I= X-Google-Smtp-Source: ADFU+vuu3cQLjgWFHfdSNe+9QhP5epW7bLGxVnj+uHynyCSBR1iSiGRqigSE3tPCYSLNMmDJtqaXRA== X-Received: by 2002:a63:384e:: with SMTP id h14mr20869866pgn.295.1584258777498; Sun, 15 Mar 2020 00:52:57 -0700 (PDT) Received: from master.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id w11sm62592984pfn.4.2020.03.15.00.52.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 15 Mar 2020 00:52:56 -0700 (PDT) From: Yafang Shao To: dchinner@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, guro@fb.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao Subject: [PATCH v5 0/3] protect page cache from freeing inode Date: Sun, 15 Mar 2020 05:53:39 -0400 Message-Id: <20200315095342.10178-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.14.1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On my server there're some running MEMCGs protected by memory.{min, low}, but I found the usage of these MEMCGs abruptly became very small, which were far less than the protect limit. It confused me and finally I found that was because of inode stealing. Once an inode is freed, all its belonging page caches will be dropped as well, no matter how may page caches it has. So if we intend to protect the page caches in a memcg, we must protect their host (the inode) first. Otherwise the memcg protection can be easily bypassed with freeing inode, especially if there're big files in this memcg. The inherent mismatch between memcg and inode is a trouble. One inode can be shared by different MEMCGs, but it is a very rare case. If an inode is shared, its belonging page caches may be charged to different MEMCGs. Currently there's no perfect solution to fix this kind of issue, but the inode majority-writer ownership switching can help it more or less. After this patch, it may take extra time to skip these inodes when workload outside of a memcg protected by memory.min or memory.low is trying to do page reclaim, especially if there're lots of inodes pinned by pagecache in this protected memcg. In order to measure the potential regressions, I constructed bellow test case on my server. My server is a machine with two nodes, and each of these nodes has 64GB memory. I created two memcgs, and memory.low of these memcgs are both set with 1G. Then I generated more than 500 thousand inodes in each of them, and pagacaches of these inodes are from 4K to 4M. IOW, there're totally more than 1 million xfs_inode in the memory and the total pagecache of them are nearly 128GB. Then I run a workload outside of these two protected memcgs. That workload is usemem in Mel's mmtests with a little modification to alloc almost all the memory and iterate only once. Bellow is the compared result of the Amean of elapsed time and sys%. 5.6.0-rc4 patched Amean syst-4 65.75 ( 0.00%) 68.08 * -3.54%* Amean elsp-4 32.14 ( 0.00%) 32.63 * -1.52%* Amean syst-7 67.47 ( 0.00%) 66.71 * 1.13%* Amean elsp-7 19.83 ( 0.00%) 18.41 * 7.16%* Amean syst-12 98.27 ( 0.00%) 99.29 * -1.04%* Amean elsp-12 15.60 ( 0.00%) 16.00 * -2.56%* Amean syst-21 174.69 ( 0.00%) 172.92 * 1.01%* Amean elsp-21 14.63 ( 0.00%) 14.75 * -0.82%* Amean syst-30 195.78 ( 0.00%) 205.90 * -5.17%* Amean elsp-30 12.42 ( 0.00%) 12.73 * -2.50%* Amean syst-40 249.85 ( 0.00%) 250.81 * -0.38%* Amean elsp-40 12.19 ( 0.00%) 12.25 * -0.49%* I did many times. Each time I run this test, I got different result. But the differece is not too big. Furthmore, this behavior only occurs when memory.min or memory.low is set, and the user already knows that memory.{min, low} can protect the pages at the cost of taking more CPU times, so small extra time is expected by the user. While if the workload trying to reclaim these protected inodes is inside of a protected memcg, then this workload will not be effected at all because memory.{min, low} doesn't take effect under these condition. - Changes against v4: Update with the test result to measure the potential regression. And rebase this patchset on 5.6.0-rc4. - Changes against v3: Fix the possible risk pointed by Johannes in another patchset [1]. Per discussion with Johannes in that mail thread, I found that the issue Johannes is trying to fix is different with the issue I'm trying to fix. That's why I update this patchset and post it again. This specific memcg protection issue should be addressed. - Changes against v2: 1. Seperates memcg patches from this patchset, suggested by Roman. 2. Improves code around the usage of for_each_mem_cgroup(), suggested by Dave 3. Use memcg_low_reclaim passed from scan_control, instead of introducing a new member in struct mem_cgroup. 4. Some other code improvement suggested by Dave. - Changes against v1: Use the memcg passed from the shrink_control, instead of getting it from inode itself, suggested by Dave. That could make the laying better. [1]. https://lore.kernel.org/linux-mm/20200211175507.178100-1-hannes@cmpxchg.org/ Yafang Shao (3): mm, list_lru: make memcg visible to lru walker isolation function mm, shrinker: make memcg low reclaim visible to lru walker isolation function inode: protect page cache from freeing inode fs/inode.c | 76 ++++++++++++++++++++++++++++++++++++-- include/linux/memcontrol.h | 21 +++++++++++ include/linux/shrinker.h | 3 ++ mm/list_lru.c | 47 +++++++++++++---------- mm/memcontrol.c | 15 -------- mm/vmscan.c | 27 ++++++++------ 6 files changed, 141 insertions(+), 48 deletions(-)