From patchwork Mon Jun 1 03:22:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Thelen X-Patchwork-Id: 11581177 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1899F60D for ; Mon, 1 Jun 2020 03:22:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CA10B2074B for ; Mon, 1 Jun 2020 03:22:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="O2UdUjD8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA10B2074B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 04DC48E0006; Sun, 31 May 2020 23:22:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F40528E0003; Sun, 31 May 2020 23:22:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2F268E0006; Sun, 31 May 2020 23:22:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id C79128E0003 for ; Sun, 31 May 2020 23:22:08 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 85990180AD804 for ; Mon, 1 Jun 2020 03:22:08 +0000 (UTC) X-FDA: 76879194336.07.rain56_69f47792a3f2c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 685BD1803F9AF for ; Mon, 1 Jun 2020 03:22:08 +0000 (UTC) X-Spam-Summary: 2,0,0,e1b1e58b75180b63,d41d8cd98f00b204,3x3tuxgckcjy6j74b4d6ee6b4.2ecb8dkn-ccal02a.eh6@flex--gthelen.bounces.google.com,,RULES_HIT:41:69:152:355:379:541:800:960:973:988:989:1260:1277:1313:1314:1345:1437:1516:1518:1535:1544:1593:1594:1605:1711:1730:1747:1777:1792:2393:2553:2559:2562:2901:3138:3139:3140:3141:3142:3152:3653:3865:3866:3867:3868:3870:3871:3872:3874:4118:4250:4321:4605:5007:6119:6261:6630:6653:7903:8957:9036:9969:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12555:12679:12683:12895:14096:14097:14181:14394:14659:14721:21080:21324:21444:21451:21611:21627:21987:21990:30054:30075:30090,0,RBL:209.85.219.201:@flex--gthelen.bounces.google.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: rain56_69f47792a3f2c X-Filterd-Recvd-Size: 7277 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Jun 2020 03:22:08 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id z7so11182198ybn.21 for ; Sun, 31 May 2020 20:22:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=cAL9AzTlR3/rckuz0QiBMeDp6G7+Z3C8GGU0gCN+rUE=; b=O2UdUjD8EOczgn/SjK0fKFL0Zv4xhbEPuhxmDtj5bVx0uGLy+65C/pgAov0drvlu5M fdLUJ/cY4PDDSlFxpcj4jNB3caWX6tsvWf6ngrtryjT9djpEPqg+03CSUjtgoEO0EN/T NqGNPho1JQhBF4XnSu8EP9J3RQyUAq9KXt5g2A58syxbm05Ic99xDyeHy0TdH3uQIWqO rCA8ga3goyC1hhi5tJkkkTLdlamubwNf+kBTaNZD0XWCYwLIjGRu4BxaY70vwQs9+Tk8 0WQnOERJvOdvExMPdmRHUF409YX3M5mCiIR27/VEip1u3T74GRPX7zWl1jCUCsco2iTt n70Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=cAL9AzTlR3/rckuz0QiBMeDp6G7+Z3C8GGU0gCN+rUE=; b=jPwlo0F/erxyErc0hEWle/1wEoqyEfxRS/y89lrZ81xJjBETF5zibd2ehNJ4akrTQW dyJ6mHipIJeLKg+e+tPJPue1LNrr9nABZV1Nw6IxUIxxZ1vKKVmqIu0dhZYcJh4bMSA5 YxMHH+ILeAs8d+JJufU/10yy2Fi48tI9zCYo1Z/7re9TeEmUSObTtkKISy6l3sKPOeHC u003ZyY2raUbqRheZiYZvgSBY6AtHpiRfbHUowo8y9+thi3nqiI5r7nWvNBAinCDrH67 W25g9uWcsFRmd9Wy9LuXA7bzIxj/zcAv1s9edMwzzH74D5Vbe88izL0o4wQrsstpyIwD Pb1Q== X-Gm-Message-State: AOAM533jWLLKQLRMWqsII/Moj1uqGlE07QFhDpQo4x2Duj+aTMLb7Jbe P4ZyuGU2U0hMqstxBlF0BERLsQkwJnyi X-Google-Smtp-Source: ABdhPJyiQN0OxMj6OkJD/8I4N+N1aiQSXh9A8vYaaHWG0D4IYmq0xioXXLwVDT7cexpNePzq0XmE9Oq4X5qi X-Received: by 2002:a25:3203:: with SMTP id y3mr20025143yby.77.1590981727300; Sun, 31 May 2020 20:22:07 -0700 (PDT) Date: Sun, 31 May 2020 20:22:04 -0700 Message-Id: <20200601032204.124624-1-gthelen@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.27.0.rc0.183.gde8f92d652-goog Subject: [PATCH] shmem, memcg: enable memcg aware shrinker From: Greg Thelen To: Hugh Dickins , Andrew Morton , Kirill Tkhai Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Greg Thelen , stable@vger.kernel.org X-Rspamd-Queue-Id: 685BD1803F9AF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since v4.19 commit b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()") a memcg aware shrinker is only called when the per-memcg per-node shrinker_map indicates that the shrinker may have objects to release to the memcg and node. shmem_unused_huge_count and shmem_unused_huge_scan support the per-tmpfs shrinker which advertises per memcg and numa awareness. The shmem shrinker releases memory by splitting hugepages that extend beyond i_size. Shmem does not currently set bits in shrinker_map. So, starting with b0dedc49a2da, memcg reclaim avoids calling the shmem shrinker under pressure. This leads to undeserved memcg OOM kills. Example that reliably sees memcg OOM kill in unpatched kernel: FS=/tmp/fs CONTAINER=/cgroup/memory/tmpfs_shrinker mkdir -p $FS mount -t tmpfs -o huge=always nodev $FS # Create 1000 MB container, which shouldn't suffer OOM. mkdir $CONTAINER echo 1000M > $CONTAINER/memory.limit_in_bytes echo $BASHPID >> $CONTAINER/cgroup.procs # Create 4000 files. Ideally each file uses 4k data page + a little # metadata. Assume 8k total per-file, 32MB (4000*8k) should easily # fit within container's 1000 MB. But if data pages use 2MB # hugepages (due to aggressive huge=always) then files consume 8GB, # which hits memcg 1000 MB limit. for i in {1..4000}; do echo . > $FS/$i done v5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") maintains the per-node per-memcg shrinker bitmap for THP shrinker. But there's no such logic in shmem. Make shmem set the per-memcg per-node shrinker bits when it modifies inodes to have shrinkable pages. Fixes: b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()") Cc: # 4.19+ Signed-off-by: Greg Thelen --- mm/shmem.c | 61 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 26 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index bd8840082c94..e11090f78cb5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1002,6 +1002,33 @@ static int shmem_getattr(const struct path *path, struct kstat *stat, return 0; } +/* + * Expose inode and optional page to shrinker as having a possibly splittable + * hugepage that reaches beyond i_size. + */ +static void shmem_shrinker_add(struct shmem_sb_info *sbinfo, + struct inode *inode, struct page *page) +{ + struct shmem_inode_info *info = SHMEM_I(inode); + + spin_lock(&sbinfo->shrinklist_lock); + /* + * _careful to defend against unlocked access to ->shrink_list in + * shmem_unused_huge_shrink() + */ + if (list_empty_careful(&info->shrinklist)) { + list_add_tail(&info->shrinklist, &sbinfo->shrinklist); + sbinfo->shrinklist_len++; + } + spin_unlock(&sbinfo->shrinklist_lock); + +#ifdef CONFIG_MEMCG + if (page && PageTransHuge(page)) + memcg_set_shrinker_bit(page->mem_cgroup, page_to_nid(page), + inode->i_sb->s_shrink.id); +#endif +} + static int shmem_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = d_inode(dentry); @@ -1048,17 +1075,13 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr) * to shrink under memory pressure. */ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - spin_lock(&sbinfo->shrinklist_lock); - /* - * _careful to defend against unlocked access to - * ->shrink_list in shmem_unused_huge_shrink() - */ - if (list_empty_careful(&info->shrinklist)) { - list_add_tail(&info->shrinklist, - &sbinfo->shrinklist); - sbinfo->shrinklist_len++; - } - spin_unlock(&sbinfo->shrinklist_lock); + struct page *page; + + page = find_get_page(inode->i_mapping, + (newsize & HPAGE_PMD_MASK) >> PAGE_SHIFT); + shmem_shrinker_add(sbinfo, inode, page); + if (page) + put_page(page); } } } @@ -1889,21 +1912,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, if (PageTransHuge(page) && DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) < hindex + HPAGE_PMD_NR - 1) { - /* - * Part of the huge page is beyond i_size: subject - * to shrink under memory pressure. - */ - spin_lock(&sbinfo->shrinklist_lock); - /* - * _careful to defend against unlocked access to - * ->shrink_list in shmem_unused_huge_shrink() - */ - if (list_empty_careful(&info->shrinklist)) { - list_add_tail(&info->shrinklist, - &sbinfo->shrinklist); - sbinfo->shrinklist_len++; - } - spin_unlock(&sbinfo->shrinklist_lock); + shmem_shrinker_add(sbinfo, inode, page); } /*