From patchwork Wed Sep 9 15:20:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chunxin Zang X-Patchwork-Id: 11765623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AC9E59D for ; Wed, 9 Sep 2020 15:21:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 596CF21D80 for ; Wed, 9 Sep 2020 15:20:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="cj5wRTpK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 596CF21D80 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB7936B008C; Wed, 9 Sep 2020 11:20:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D42F36B0092; Wed, 9 Sep 2020 11:20:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0A836B0093; Wed, 9 Sep 2020 11:20:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id AC4006B008C for ; Wed, 9 Sep 2020 11:20:57 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6B0123626 for ; Wed, 9 Sep 2020 15:20:57 +0000 (UTC) X-FDA: 77243885754.24.flock04_290e567270de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 434CC1A4A5 for ; Wed, 9 Sep 2020 15:20:57 +0000 (UTC) X-Spam-Summary: 1,0,0,967b0012cc152911,d41d8cd98f00b204,zangchunxin@bytedance.com,,RULES_HIT:41:152:355:379:541:800:960:966:973:988:989:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1535:1543:1593:1594:1711:1730:1747:1777:1792:1801:2196:2199:2393:2559:2562:2693:3138:3139:3140:3141:3142:3354:3865:3866:3870:3871:4321:4385:4605:5007:6261:6653:7576:7903:7904:8784:9149:10004:10400:11026:11658:11914:12043:12220:12296:12297:12438:12517:12519:12555:12679:12740:12895:12986:13161:13229:13894:14096:14097:14181:14394:14659:14721:21080:21444:21451:21627:21789:21939:21990:30054:30070:30075,0,RBL:209.85.210.193:@bytedance.com:.lbl8.mailshell.net-62.2.0.100 66.100.201.201;04ygcz6gpbadw7qesydixcckg1uawypybp4ca6ekercbqcyermxpi1et3syadko.cz7ir46bzmci8pr3deaa1cp5unqqen9ffneuhr43xn5bputp43ujcznq5sybztp.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: flock04_290e567270de X-Filterd-Recvd-Size: 5986 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 9 Sep 2020 15:20:56 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id w7so2453542pfi.4 for ; Wed, 09 Sep 2020 08:20:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=nRSn5ntJRj7Oony3O2CA+XH1U2imtO9TR4JjB0ilhXo=; b=cj5wRTpKPlMV7QtPieSvF+gBLFmGNSf4JanrLhXcm1B5OrnUUj1J+zxeg1fyofhtf2 3vBYnuNYVg64VaSiT3R4PNhx9T0xwRvhs40W/XjP+BU96SRg2S4JnuWl6lI1+Qdw8iEX AUEVbabpxWiwbvu1/lJ0lWGjPFOvXxMMb9/1R+HPWwajs3QCss1MktD4Y9oTy7B8/mpA UUq/uKY2NL58Qg9LHJAUgapV3HJnDZw111zdVw8Bzfk1tPbjzlco3BK3i3UtY27Y9qgd DU0MtSjR3ygVRfl7kWJ1iFcl9XC+s9WUR8B232PzYVeIlxC1ysT9H7Re6kDlZYSGP61g u61w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=nRSn5ntJRj7Oony3O2CA+XH1U2imtO9TR4JjB0ilhXo=; b=tXjFRxuJT3iTTzB/0g8OzFtFH+fhkC1dxJnhCRaa+w/4QDmJDgztBxNqBDlx+qfjaf bFqgatu6bOQwm52EAw0JiChx04A/BudyzvtA+JdbjYIY3gyRjAivXKhhB182IyXruaqt DzpF4rWE7bQGDlDWm4mUDyHppf3v02Fw3nJqBIUL3RBUHp0Tw8v2ZJKObbD9LVBPpCNi +2vjmkHcQujjzusA3U81ZOIe5E7uH+wOSJ84c24MZt5ldHVO5roGZbMn68JIAzBav7JK /ZGzppi2HeXm7nuDElImrMZXUKf8z3XujQ7XXsClnKC9dtu52scMUZfqDUfZyQYjFKy8 /kaQ== X-Gm-Message-State: AOAM533w0yzxE9vo2AnwCusEXaUATPkdLw2iW4OzarODBMHE49/X5PPy B4ylN7CeSQwdgnmxmwVN8T0chg== X-Google-Smtp-Source: ABdhPJwgUlU/RNTULEXezR/V5cY6SyF8WCPd3zyU4+7r+bNYT6rqiVXWT2Yvv4GBBk3ZzuVSYxpw3g== X-Received: by 2002:a63:af01:: with SMTP id w1mr1012742pge.23.1599664855461; Wed, 09 Sep 2020 08:20:55 -0700 (PDT) Received: from Zs-MacBook-Pro.local.net ([103.136.221.66]) by smtp.gmail.com with ESMTPSA id kf10sm2160958pjb.2.2020.09.09.08.20.52 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Sep 2020 08:20:54 -0700 (PDT) From: zangchunxin@bytedance.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chunxin Zang , Muchun Song Subject: [PATCH v2] mm/vmscan: fix infinite loop in drop_slab_node Date: Wed, 9 Sep 2020 23:20:47 +0800 Message-Id: <20200909152047.27905-1-zangchunxin@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) MIME-Version: 1.0 X-Rspamd-Queue-Id: 434CC1A4A5 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000034, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Chunxin Zang On our server, there are about 10k memcg in one machine. They use memory very frequently. When I tigger drop caches,the process will infinite loop in drop_slab_node. There are two reasons: 1.We have too many memcgs, even though one object freed in one memcg, the sum of object is bigger than 10. 2.We spend a lot of time in traverse memcg once. So, the memcg who traversed at the first have been freed many objects. Traverse memcg next time, the freed count bigger than 10 again. We can get the following info through 'ps': root:~# ps -aux | grep drop root 357956 ... R Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_caches root 1771385 ... R Aug16 21146421:17 echo 3 > /proc/sys/vm/drop_caches root 1986319 ... R 18:56 117:27 echo 3 > /proc/sys/vm/drop_caches root 2002148 ... R Aug24 5720:39 echo 3 > /proc/sys/vm/drop_caches root 2564666 ... R 18:59 113:58 echo 3 > /proc/sys/vm/drop_caches root 2639347 ... R Sep03 2383:39 echo 3 > /proc/sys/vm/drop_caches root 3904747 ... R 03:35 993:31 echo 3 > /proc/sys/vm/drop_caches root 4016780 ... R Aug21 7882:18 echo 3 > /proc/sys/vm/drop_caches Use bpftrace follow 'freed' value in drop_slab_node: root:~# bpftrace -e 'kprobe:drop_slab_node+70 {@ret=hist(reg("bp")); }' Attaching 1 probe... ^B^C @ret: [64, 128) 1 | | [128, 256) 28 | | [256, 512) 107 |@ | [512, 1K) 298 |@@@ | [1K, 2K) 613 |@@@@@@@ | [2K, 4K) 4435 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4K, 8K) 442 |@@@@@ | [8K, 16K) 299 |@@@ | [16K, 32K) 100 |@ | [32K, 64K) 139 |@ | [64K, 128K) 56 | | [128K, 256K) 26 | | [256K, 512K) 2 | | In the while loop, we can check whether the TASK_KILLABLE signal is set, if so, we should break the loop. Signed-off-by: Chunxin Zang Signed-off-by: Muchun Song Acked-by: Chris Down Acked-by: Michal Hocko --- changelogs in v2: 1) Via check TASK_KILLABLE signal break loop. mm/vmscan.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index b6d84326bdf2..c3ed8b45d264 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -704,6 +704,9 @@ void drop_slab_node(int nid) do { struct mem_cgroup *memcg = NULL; + if (fatal_signal_pending(current)) + return; + freed = 0; memcg = mem_cgroup_iter(NULL, NULL, NULL); do {