From patchwork Tue Nov 12 14:06:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239363 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C58F315AB for ; Tue, 12 Nov 2019 14:06:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 928DE21783 for ; Tue, 12 Nov 2019 14:06:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 928DE21783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D2D36B0007; Tue, 12 Nov 2019 09:06:51 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 90D8E6B0008; Tue, 12 Nov 2019 09:06:51 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84D296B000C; Tue, 12 Nov 2019 09:06:51 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 6A4476B0007 for ; Tue, 12 Nov 2019 09:06:51 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 377C1180AD80F for ; Tue, 12 Nov 2019 14:06:51 +0000 (UTC) X-FDA: 76147801422.11.A122617 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id D6906180F8B81 for ; Tue, 12 Nov 2019 14:06:50 +0000 (UTC) X-Spam-Summary: 2,0,0,af458d1fe34fc55b,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:vbabka@suse.cz:dan.j.williams@intel.com:mhocko@suse.com:richard.weiyang@gmail.com:hannes@cmpxchg.org:arunks@codeaurora.org,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:2693:2742:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3870:3874:4321:4605:5007:6261:6737:7514:7903:9207:10004:11026:11473:11658:11914:12043:12048:12296:12297:12555:12895:12986:13069:13161:13229:13311:13357:13846:14040:14181:14384:14394:14721:14915:21060:21080:21451:21627:30054:30064,0,RBL:115.124.30.132:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5 ,0.5,0.5 X-HE-Tag: pigs15_58b6f160ff33e X-Filterd-Recvd-Size: 3381 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:49 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R571e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0ThuR20b_1573567598; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0ThuR20b_1573567598) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:38 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Vlastimil Babka , Dan Williams , Michal Hocko , Wei Yang , Johannes Weiner , Arun KS Subject: [PATCH v2 1/8] mm/lru: add per lruvec lock for memcg Date: Tue, 12 Nov 2019 22:06:21 +0800 Message-Id: <1573567588-47048-2-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently memcg still use per node pgdat->lru_lock to guard its lruvec. That causes some lru_lock contention in a high container density system. If we can use per lruvec lock, that could relief much of the lru_lock contention. The later patches will replace the pgdat->lru_lock with lruvec->lru_lock and show the performance benefit by benchmarks. Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Vlastimil Babka Cc: Dan Williams Cc: Michal Hocko Cc: Mel Gorman Cc: Wei Yang Cc: Johannes Weiner Cc: Arun KS Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- include/linux/mmzone.h | 2 ++ mm/mmzone.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bda20282746b..787a42d527a2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -303,6 +303,8 @@ struct lruvec { atomic_long_t inactive_age; /* Refaults at the time of last reclaim cycle */ unsigned long refaults; + /* per lruvec lru_lock for memcg */ + spinlock_t lru_lock; #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif diff --git a/mm/mmzone.c b/mm/mmzone.c index 4686fdc23bb9..3750a90ed4a0 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -91,6 +91,7 @@ void lruvec_init(struct lruvec *lruvec) enum lru_list lru; memset(lruvec, 0, sizeof(struct lruvec)); + spin_lock_init(&lruvec->lru_lock); for_each_lru(lru) INIT_LIST_HEAD(&lruvec->lists[lru]); From patchwork Tue Nov 12 14:06:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46FEB15AB for ; Tue, 12 Nov 2019 14:06:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E50621783 for ; Tue, 12 Nov 2019 14:06:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E50621783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5D82F6B0008; Tue, 12 Nov 2019 09:06:53 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 58DD76B000C; Tue, 12 Nov 2019 09:06:53 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 451B96B000D; Tue, 12 Nov 2019 09:06:53 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id 2B4B66B0008 for ; Tue, 12 Nov 2019 09:06:53 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id E2D5B499610 for ; Tue, 12 Nov 2019 14:06:52 +0000 (UTC) X-FDA: 76147801464.29.trail63_59175ef61e104 X-Spam-Summary: 2,0,0,c8d7a064e8163f48,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:vbabka@suse.cz:dan.j.williams@intel.com:mhocko@suse.com:richard.weiyang@gmail.com:hannes@cmpxchg.org:arunks@codeaurora.org:rong.a.chen@intel.com,RULES_HIT:41:355:379:541:800:960:973:981:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3872:4321:5007:6261:6737:7514:9207:10004:11658:11914:12048:12296:12297:12555:12895:13069:13161:13229:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21451:21627:30012:30045:30054:30056:30064,0,RBL:115.124.30.42:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0 .5,Netch X-HE-Tag: trail63_59175ef61e104 X-Filterd-Recvd-Size: 3085 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:51 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0ThubemG_1573567599; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0ThubemG_1573567599) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:39 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Vlastimil Babka , Dan Williams , Michal Hocko , Wei Yang , Johannes Weiner , Arun KS , Rong Chen Subject: [PATCH v2 2/8] mm/lruvec: add irqsave flags into lruvec struct Date: Tue, 12 Nov 2019 22:06:22 +0800 Message-Id: <1573567588-47048-3-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We need a flags vaiable to save state when do irqsave action, declare it here would make code more clear/clean. Rong Chen report the flag variable needs to move the tail of lruvec struct otherwise it causes 18% regressions of vm-scalability testing on his machine. Add the flags and lru_lock to both near struct tail. Originally-from: Hugh Dickins Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Vlastimil Babka Cc: Dan Williams Cc: Michal Hocko Cc: Mel Gorman Cc: Wei Yang Cc: Johannes Weiner Cc: Arun KS Cc: Tejun Heo Cc: Konstantin Khlebnikov CC: Rong Chen Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- include/linux/mmzone.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 787a42d527a2..da00615baa52 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -305,6 +305,8 @@ struct lruvec { unsigned long refaults; /* per lruvec lru_lock for memcg */ spinlock_t lru_lock; + /* flags for irqsave */ + unsigned long flags; #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif From patchwork Tue Nov 12 14:06:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DB15159A for ; Tue, 12 Nov 2019 14:07:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1BDD62196E for ; Tue, 12 Nov 2019 14:07:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1BDD62196E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CE0E96B000D; Tue, 12 Nov 2019 09:06:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C24136B0266; Tue, 12 Nov 2019 09:06:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95B266B000E; Tue, 12 Nov 2019 09:06:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 6404D6B000D for ; Tue, 12 Nov 2019 09:06:55 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 11D464995F2 for ; Tue, 12 Nov 2019 14:06:55 +0000 (UTC) X-FDA: 76147801590.03.berry27_58c79d5df0c5f X-Spam-Summary: 2,0,0,3400bb4a78f93459,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:hannes@cmpxchg.org:mhocko@kernel.org:vdavydov.dev@gmail.com:guro@fb.com:shakeelb@google.com:chris@chrisdown.name:tglx@linutronix.de:vbabka@suse.cz:cai@lca.pw:aryabinin@virtuozzo.com:kirill.shutemov@linux.intel.com:jglisse@redhat.com:aarcange@redhat.com:rientjes@google.com:aneesh.kumar@linux.ibm.com:swkhack@gmail.com:stefan.potyra@elektrobit.com:rppt@linux.vnet.ibm.com:sfr@canb.auug.org.au:colin.king@canonical.com:jgg@ziepe.ca:mchehab+samsung@kernel.org:willy@infradead.org:peng.fan@nxp.com:nborisov@suse.com:ira.weiny@intel.com:ktkhai@virtuozzo.com:laoar.shao@gmail.com,RULES_HIT:69:152:327:355:379:541:960:966:967:973:988:989:1260:1261:1277:1311:1313:1314 :1345:13 X-HE-Tag: berry27_58c79d5df0c5f X-Filterd-Recvd-Size: 33156 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:48 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04446;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=39;SR=0;TI=SMTPD_---0ThubemN_1573567599; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0ThubemN_1573567599) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:40 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Thomas Gleixner , Vlastimil Babka , Qian Cai , Andrey Ryabinin , "Kirill A. Shutemov" , =?utf-8?b?SsOpcsO0?= =?utf-8?b?bWUgR2xpc3Nl?= , Andrea Arcangeli , David Rientjes , "Aneesh Kumar K.V" , swkhack , "Potyra, Stefan" , Mike Rapoport , Stephen Rothwell , Colin Ian King , Jason Gunthorpe , Mauro Carvalho Chehab , Matthew Wilcox , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: [PATCH v2 3/8] mm/lru: replace pgdat lru_lock with lruvec lock Date: Tue, 12 Nov 2019 22:06:23 +0800 Message-Id: <1573567588-47048-4-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patchset move lru_lock into lruvec, give a lru_lock for each of lruvec, thus bring a lru_lock for each of memcg per node. This is the main patch to replace per node lru_lock with per memcg lruvec lock. It also fold the irqsave flags into lruvec. We introduce function lock_page_lruvec, it's same as vanilla pgdat lock when memory cgroup unset, w/o memcg, the function will keep repin the lruvec's lock to guard from page->mem_cgroup changes in page migrations between memcgs. (Thanks Hugh Dickins and Konstantin Khlebnikov reminder on this. Than the core logical is same as their previous patchs) According to Daniel Jordan's suggestion, I run 64 'dd' with on 32 containers on my 2s* 8 core * HT box with the modefied case: https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice With this and later patches, the dd performance is 144MB/s, the vanilla kernel performance is 123MB/s. 17% performance increased. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: Roman Gushchin Cc: Shakeel Butt Cc: Chris Down Cc: Thomas Gleixner Cc: Mel Gorman Cc: Vlastimil Babka Cc: Qian Cai Cc: Andrey Ryabinin Cc: "Kirill A. Shutemov" Cc: "Jérôme Glisse" Cc: Andrea Arcangeli Cc: Yang Shi Cc: David Rientjes Cc: "Aneesh Kumar K.V" Cc: swkhack Cc: "Potyra, Stefan" Cc: Mike Rapoport Cc: Stephen Rothwell Cc: Colin Ian King Cc: Jason Gunthorpe Cc: Mauro Carvalho Chehab Cc: Matthew Wilcox Cc: Peng Fan Cc: Nikolay Borisov Cc: Ira Weiny Cc: Kirill Tkhai Cc: Yafang Shao Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org --- include/linux/memcontrol.h | 23 ++++++++++++++ mm/compaction.c | 62 ++++++++++++++++++++++++------------ mm/huge_memory.c | 16 ++++------ mm/memcontrol.c | 64 +++++++++++++++++++++++++++++-------- mm/mlock.c | 31 +++++++++--------- mm/page_idle.c | 5 +-- mm/swap.c | 79 +++++++++++++++++++--------------------------- mm/vmscan.c | 58 ++++++++++++++++------------------ 8 files changed, 201 insertions(+), 137 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ae703ea3ef48..1c1e68537eca 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -428,6 +428,9 @@ static inline struct lruvec *mem_cgroup_lruvec(struct pglist_data *pgdat, struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); +struct lruvec *lock_page_lruvec_irq(struct page *, struct pglist_data *); +struct lruvec *lock_page_lruvec_irqsave(struct page *, struct pglist_data *); + struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); @@ -911,6 +914,26 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page, return &pgdat->lruvec; } +static inline struct lruvec *lock_page_lruvec_irq(struct page *page, + struct pglist_data *pgdat) +{ + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat); + + spin_lock_irq(&lruvec->lru_lock); + + return lruvec; +} + +static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page, + struct pglist_data *pgdat) +{ + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat); + + spin_lock_irqsave(&lruvec->lru_lock, lruvec->flags); + + return lruvec; +} + static inline bool mm_match_cgroup(struct mm_struct *mm, struct mem_cgroup *memcg) { diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6ab..6bf9c4866f33 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -785,8 +785,7 @@ static bool too_many_isolated(pg_data_t *pgdat) pg_data_t *pgdat = cc->zone->zone_pgdat; unsigned long nr_scanned = 0, nr_isolated = 0; struct lruvec *lruvec; - unsigned long flags = 0; - bool locked = false; + struct lruvec *locked_lruvec = NULL; struct page *page = NULL, *valid_page = NULL; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; @@ -846,11 +845,21 @@ static bool too_many_isolated(pg_data_t *pgdat) * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ - if (!(low_pfn % SWAP_CLUSTER_MAX) - && compact_unlock_should_abort(&pgdat->lru_lock, - flags, &locked, cc)) { - low_pfn = 0; - goto fatal_pending; + if (!(low_pfn % SWAP_CLUSTER_MAX)) { + if (locked_lruvec) { + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); + locked_lruvec = NULL; + } + + if (fatal_signal_pending(current)) { + cc->contended = true; + + low_pfn = 0; + goto fatal_pending; + } + + cond_resched(); } if (!pfn_valid_within(low_pfn)) @@ -919,10 +928,10 @@ static bool too_many_isolated(pg_data_t *pgdat) */ if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { - if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, - flags); - locked = false; + if (locked_lruvec) { + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); + locked_lruvec = NULL; } if (!isolate_movable_page(page, isolate_mode)) @@ -948,10 +957,22 @@ static bool too_many_isolated(pg_data_t *pgdat) if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) goto isolate_fail; +reget_lruvec: + lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* If we already hold the lock, we can skip some rechecking */ - if (!locked) { - locked = compact_lock_irqsave(&pgdat->lru_lock, - &flags, cc); + if (lruvec != locked_lruvec) { + if (locked_lruvec) { + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); + locked_lruvec = NULL; + } + if (compact_lock_irqsave(&lruvec->lru_lock, + &lruvec->flags, cc)) + locked_lruvec = lruvec; + + if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) + goto reget_lruvec; /* Try get exclusive access under lock */ if (!skip_updated) { @@ -975,7 +996,6 @@ static bool too_many_isolated(pg_data_t *pgdat) } } - lruvec = mem_cgroup_page_lruvec(page, pgdat); /* Try isolate the page */ if (__isolate_lru_page(page, isolate_mode) != 0) @@ -1016,9 +1036,10 @@ static bool too_many_isolated(pg_data_t *pgdat) * page anyway. */ if (nr_isolated) { - if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + if (locked_lruvec) { + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); + locked_lruvec = NULL; } putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; @@ -1043,8 +1064,9 @@ static bool too_many_isolated(pg_data_t *pgdat) low_pfn = end_pfn; isolate_abort: - if (locked) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (locked_lruvec) + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); /* * Updated the cached scanner pfn once the pageblock has been scanned diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 13cc93785006..4334705f2687 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2495,17 +2495,13 @@ static void __split_huge_page_tail(struct page *head, int tail, } static void __split_huge_page(struct page *page, struct list_head *list, - pgoff_t end, unsigned long flags) + struct lruvec *lruvec, pgoff_t end) { struct page *head = compound_head(page); - pg_data_t *pgdat = page_pgdat(head); - struct lruvec *lruvec; struct address_space *swap_cache = NULL; unsigned long offset = 0; int i; - lruvec = mem_cgroup_page_lruvec(head, pgdat); - /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head); @@ -2554,7 +2550,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, xa_unlock(&head->mapping->i_pages); } - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); remap_page(head); @@ -2697,9 +2693,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) struct deferred_split *ds_queue = get_deferred_split_queue(page); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; + struct lruvec *lruvec; int count, mapcount, extra_pins, ret; bool mlocked; - unsigned long flags; pgoff_t end; VM_BUG_ON_PAGE(is_huge_zero_page(page), page); @@ -2766,7 +2762,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) lru_add_drain(); /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock_irqsave(&pgdata->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(head, pgdata); if (mapping) { XA_STATE(xas, &mapping->i_pages, page_index(head)); @@ -2797,7 +2793,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) } spin_unlock(&ds_queue->split_queue_lock); - __split_huge_page(page, list, end, flags); + __split_huge_page(page, list, lruvec, end); if (PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; @@ -2816,7 +2812,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) spin_unlock(&ds_queue->split_queue_lock); fail: if (mapping) xa_unlock(&mapping->i_pages); - spin_unlock_irqrestore(&pgdata->lru_lock, flags); + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); remap_page(head); ret = -EBUSY; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 37592dd7ae32..d2539bac4677 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1263,6 +1263,42 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd return lruvec; } +struct lruvec *lock_page_lruvec_irq(struct page *page, + struct pglist_data *pgdat) +{ + struct lruvec *lruvec; + +again: + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock_irq(&lruvec->lru_lock); + + /* lruvec may changed in commit_charge() */ + if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) { + spin_unlock_irq(&lruvec->lru_lock); + goto again; + } + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irqsave(struct page *page, + struct pglist_data *pgdat) +{ + struct lruvec *lruvec; + +again: + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock_irqsave(&lruvec->lru_lock, lruvec->flags); + + /* lruvec may changed in commit_charge() */ + if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) { + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); + goto again; + } + + return lruvec; +} + /** * mem_cgroup_update_lru_size - account for adding or removing an lru page * @lruvec: mem_cgroup per zone lru vector @@ -2687,41 +2723,43 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) css_put_many(&memcg->css, nr_pages); } -static void lock_page_lru(struct page *page, int *isolated) +static struct lruvec *lock_page_lru(struct page *page, int *isolated) { pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec = lock_page_lruvec_irq(page, pgdat); - spin_lock_irq(&pgdat->lru_lock); if (PageLRU(page)) { - struct lruvec *lruvec; - lruvec = mem_cgroup_page_lruvec(page, pgdat); ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_lru(page)); *isolated = 1; } else *isolated = 0; + + return lruvec; } -static void unlock_page_lru(struct page *page, int isolated) +static void unlock_page_lru(struct page *page, int isolated, + struct lruvec *locked_lruvec) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; - if (isolated) { - struct lruvec *lruvec; + spin_unlock_irq(&locked_lruvec->lru_lock); + lruvec = lock_page_lruvec_irq(page, page_pgdat(page)); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + if (isolated) { VM_BUG_ON_PAGE(PageLRU(page), page); SetPageLRU(page); add_page_to_lru_list(page, lruvec, page_lru(page)); } - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); } static void commit_charge(struct page *page, struct mem_cgroup *memcg, bool lrucare) { int isolated; + struct lruvec *lruvec; VM_BUG_ON_PAGE(page->mem_cgroup, page); @@ -2730,7 +2768,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, * may already be on some other mem_cgroup's LRU. Take care of it. */ if (lrucare) - lock_page_lru(page, &isolated); + lruvec = lock_page_lru(page, &isolated); /* * Nobody should be changing or seriously looking at @@ -2749,7 +2787,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, page->mem_cgroup = memcg; if (lrucare) - unlock_page_lru(page, isolated); + unlock_page_lru(page, isolated, lruvec); } #ifdef CONFIG_MEMCG_KMEM @@ -3045,7 +3083,7 @@ void __memcg_kmem_uncharge(struct page *page, int order) /* * Because tail pages are not marked as "used", set it. We're under - * pgdat->lru_lock and migration entries setup in all page mappings. + * pgdat->lruvec.lru_lock and migration entries setup in all page mappings. */ void mem_cgroup_split_huge_fixup(struct page *head) { diff --git a/mm/mlock.c b/mm/mlock.c index a72c1eeded77..b509b80b8513 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -106,12 +106,10 @@ void mlock_vma_page(struct page *page) * Isolate a page from LRU with optional get_page() pin. * Assumes lru_lock already held and page already pinned. */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) +static bool __munlock_isolate_lru_page(struct page *page, + struct lruvec *lruvec, bool getpage) { if (PageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); if (getpage) get_page(page); ClearPageLRU(page); @@ -183,6 +181,7 @@ unsigned int munlock_vma_page(struct page *page) { int nr_pages; pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; /* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); @@ -194,7 +193,7 @@ unsigned int munlock_vma_page(struct page *page) * might otherwise copy PageMlocked to part of the tail pages before * we clear it in the head page. It also stabilizes hpage_nr_pages(). */ - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page, pgdat); if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ @@ -205,15 +204,15 @@ unsigned int munlock_vma_page(struct page *page) nr_pages = hpage_nr_pages(page); __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (__munlock_isolate_lru_page(page, true)) { - spin_unlock_irq(&pgdat->lru_lock); + if (__munlock_isolate_lru_page(page, lruvec, true)) { + spin_unlock_irq(&lruvec->lru_lock); __munlock_isolated_page(page); goto out; } __munlock_isolation_failed(page); unlock_out: - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); out: return nr_pages - 1; @@ -291,28 +290,29 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) { int i; int nr = pagevec_count(pvec); - int delta_munlocked = -nr; struct pagevec pvec_putback; + struct lruvec *lruvec = NULL; int pgrescued = 0; pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; + lruvec = lock_page_lruvec_irq(page, page_pgdat(page)); + if (TestClearPageMlocked(page)) { /* * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, false)) + if (__munlock_isolate_lru_page(page, lruvec, false)) { + __mod_zone_page_state(zone, NR_MLOCK, -1); + spin_unlock_irq(&lruvec->lru_lock); continue; - else + } else __munlock_isolation_failed(page); - } else { - delta_munlocked++; } /* @@ -323,9 +323,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) */ pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; + spin_unlock_irq(&lruvec->lru_lock); } - __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(&zone->zone_pgdat->lru_lock); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); diff --git a/mm/page_idle.c b/mm/page_idle.c index 295512465065..25f4b1cf3e0f 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -32,6 +32,7 @@ static struct page *page_idle_get_page(unsigned long pfn) { struct page *page; pg_data_t *pgdat; + struct lruvec *lruvec; if (!pfn_valid(pfn)) return NULL; @@ -42,12 +43,12 @@ static struct page *page_idle_get_page(unsigned long pfn) return NULL; pgdat = page_pgdat(page); - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page, pgdat); if (unlikely(!PageLRU(page))) { put_page(page); page = NULL; } - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); return page; } diff --git a/mm/swap.c b/mm/swap.c index 38c3fa4308e2..267c3e262254 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -62,14 +62,12 @@ static void __page_cache_release(struct page *page) if (PageLRU(page)) { pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; - unsigned long flags; - spin_lock_irqsave(&pgdat->lru_lock, flags); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irqsave(page, pgdat); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); } __ClearPageWaiters(page); } @@ -192,26 +190,17 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, void *arg) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; - unsigned long flags = 0; + struct lruvec *lruvec = NULL; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); - } + lruvec = lock_page_lruvec_irqsave(page, page_pgdat(page)); - lruvec = mem_cgroup_page_lruvec(page, pgdat); (*move_fn)(page, lruvec, arg); + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } @@ -325,11 +314,12 @@ static inline void activate_page_drain(int cpu) void activate_page(struct page *page) { pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; page = compound_head(page); - spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL); - spin_unlock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page, pgdat); + __activate_page(page, lruvec, NULL); + spin_unlock_irq(&lruvec->lru_lock); } #endif @@ -761,9 +751,7 @@ void release_pages(struct page **pages, int nr) { int i; LIST_HEAD(pages_to_free); - struct pglist_data *locked_pgdat = NULL; - struct lruvec *lruvec; - unsigned long uninitialized_var(flags); + struct lruvec *lruvec = NULL; unsigned int uninitialized_var(lock_batch); for (i = 0; i < nr; i++) { @@ -772,21 +760,22 @@ void release_pages(struct page **pages, int nr) /* * Make sure the IRQ-safe lock-holding time does not get * excessive with a continuous string of pages from the - * same pgdat. The lock is held only if pgdat != NULL. + * same lruvec. The lock is held only if lruvec != NULL. */ - if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { + spin_unlock_irqrestore(&lruvec->lru_lock, + lruvec->flags); + lruvec = NULL; } if (is_huge_zero_page(page)) continue; if (is_zone_device_page(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); - locked_pgdat = NULL; + if (lruvec) { + spin_unlock_irqrestore(&lruvec->lru_lock, + lruvec->flags); + lruvec = NULL; } /* * ZONE_DEVICE pages that return 'false' from @@ -803,27 +792,25 @@ void release_pages(struct page **pages, int nr) continue; if (PageCompound(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec) { + spin_unlock_irqrestore(&lruvec->lru_lock, + lruvec->flags); + lruvec = NULL; } __put_compound_page(page); continue; } if (PageLRU(page)) { - struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (pgdat != locked_pgdat) { - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); + if (new_lruvec != lruvec) { + if (lruvec) + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); lock_batch = 0; - locked_pgdat = pgdat; - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); - } + lruvec = lock_page_lruvec_irqsave(page, page_pgdat(page)); - lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); + } VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); @@ -835,8 +822,8 @@ void release_pages(struct page **pages, int nr) list_add(&page->lru, &pages_to_free); } - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + if (lruvec) + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); @@ -874,7 +861,7 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, VM_BUG_ON_PAGE(!PageHead(page), page); VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + lockdep_assert_held(&lruvec->lru_lock); if (!list) SetPageLRU(page_tail); diff --git a/mm/vmscan.c b/mm/vmscan.c index ee4eecc7e1c2..50f15d1e5f18 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1804,8 +1804,7 @@ int isolate_lru_page(struct page *page) pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; - spin_lock_irq(&pgdat->lru_lock); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irq(page, pgdat); if (PageLRU(page)) { int lru = page_lru(page); get_page(page); @@ -1813,7 +1812,7 @@ int isolate_lru_page(struct page *page) del_page_from_lru_list(page, lruvec, lru); ret = 0; } - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); } return ret; } @@ -1878,7 +1877,6 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { - struct pglist_data *pgdat = lruvec_pgdat(lruvec); int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; @@ -1889,12 +1887,11 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, VM_BUG_ON_PAGE(PageLRU(page), page); if (unlikely(!page_evictable(page))) { list_del(&page->lru); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); SetPageLRU(page); lru = page_lru(page); @@ -1909,9 +1906,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, del_page_from_lru_list(page, lruvec, lru); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); (*get_compound_page_dtor(page))(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); } else { @@ -1974,7 +1971,7 @@ static int current_may_throttle(void) lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); @@ -1986,7 +1983,7 @@ static int current_may_throttle(void) if (global_reclaim(sc)) __count_vm_events(item, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); if (nr_taken == 0) return 0; @@ -1994,7 +1991,7 @@ static int current_may_throttle(void) nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0, &stat, false); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (global_reclaim(sc)) @@ -2007,7 +2004,7 @@ static int current_may_throttle(void) __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&page_list); free_unref_page_list(&page_list); @@ -2060,7 +2057,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); @@ -2071,7 +2068,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); while (!list_empty(&l_hold)) { cond_resched(); @@ -2117,7 +2114,7 @@ static void shrink_active_list(unsigned long nr_to_scan, /* * Move pages back to the lru list. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); /* * Count referenced pages from currently used mappings as rotated, * even though only some of them are actually re-activated. This @@ -2135,7 +2132,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&l_active); free_unref_page_list(&l_active); @@ -2427,7 +2424,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) + lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) { reclaim_stat->recent_scanned[0] /= 2; reclaim_stat->recent_rotated[0] /= 2; @@ -2448,7 +2445,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); fraction[0] = ap; fraction[1] = fp; @@ -4334,24 +4331,25 @@ int page_evictable(struct page *page) */ void check_move_unevictable_pages(struct pagevec *pvec) { - struct lruvec *lruvec; - struct pglist_data *pgdat = NULL; + struct lruvec *lruvec = NULL; int pgscanned = 0; int pgrescued = 0; int i; for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, pgdat); + pgscanned++; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irq(&pgdat->lru_lock); - pgdat = pagepgdat; - spin_lock_irq(&pgdat->lru_lock); + + if (lruvec != new_lruvec) { + if (lruvec) + spin_unlock_irq(&lruvec->lru_lock); + lruvec = new_lruvec; + spin_lock_irq(&lruvec->lru_lock); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); if (!PageLRU(page) || !PageUnevictable(page)) continue; @@ -4367,10 +4365,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) } } - if (pgdat) { + if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages); From patchwork Tue Nov 12 14:06:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239373 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A79B15AB for ; Tue, 12 Nov 2019 14:07:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1655C21783 for ; Tue, 12 Nov 2019 14:07:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1655C21783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 032FC6B0010; Tue, 12 Nov 2019 09:06:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F2D0C6B0266; Tue, 12 Nov 2019 09:06:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA4D26B0269; Tue, 12 Nov 2019 09:06:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id BF2A46B0010 for ; Tue, 12 Nov 2019 09:06:58 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 7DD20181AEF07 for ; Tue, 12 Nov 2019 14:06:58 +0000 (UTC) X-FDA: 76147801716.19.ink91_59e16270a0422 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:hannes@cmpxchg.org:mhocko@kernel.org:vdavydov.dev@gmail.com:guro@fb.com:shakeelb@google.com:chris@chrisdown.name:tglx@linutronix.de:vbabka@suse.cz:aryabinin@virtuozzo.com:swkhack@gmail.com:stefan.potyra@elektrobit.com:jgg@ziepe.ca:willy@infradead.org:mchehab+samsung@kernel.org:peng.fan@nxp.com:nborisov@suse.com:ira.weiny@intel.com:ktkhai@virtuozzo.com:laoar.shao@gmail.com,RULES_HIT:30034:30054:30064:30070:30090,0,RBL:47.88.44.36:@linux.alibaba.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: ink91_59e16270a0422 X-Filterd-Recvd-Size: 10635 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:57 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R481e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=30;SR=0;TI=SMTPD_---0ThuRPWj_1573567600; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0ThuRPWj_1573567600) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:40 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Thomas Gleixner , Vlastimil Babka , Andrey Ryabinin , swkhack , "Potyra, Stefan" , Jason Gunthorpe , Matthew Wilcox , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: [PATCH v2 4/8] mm/lru: only change the lru_lock iff page's lruvec is different Date: Tue, 12 Nov 2019 22:06:24 +0800 Message-Id: <1573567588-47048-5-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During the pagevec locking, a new page's lruvec is may same as previous one. So we may save a lruvec locking if the lruvec is same as previous one, and only change the lock iff lruvec is new. Function named relock_page_lruvec following Hugh Dickins patch. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: Roman Gushchin Cc: Shakeel Butt Cc: Chris Down Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Andrey Ryabinin Cc: swkhack Cc: "Potyra, Stefan" Cc: Jason Gunthorpe Cc: Matthew Wilcox Cc: Mauro Carvalho Chehab Cc: Peng Fan Cc: Nikolay Borisov Cc: Ira Weiny Cc: Kirill Tkhai Cc: Yang Shi Cc: Yafang Shao Cc: Mel Gorman Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org --- include/linux/memcontrol.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++ mm/mlock.c | 16 ++++++++------- mm/swap.c | 14 ++++++------- mm/vmscan.c | 24 +++++++++++------------ 4 files changed, 76 insertions(+), 27 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1c1e68537eca..2421b720d272 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1300,6 +1300,55 @@ static inline void dec_lruvec_page_state(struct page *page, mod_lruvec_page_state(page, idx, -1); } +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, + struct lruvec *locked_lruvec) +{ + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *lruvec; + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + + if (locked_lruvec == lruvec) { + rcu_read_unlock(); + return lruvec; + } + rcu_read_unlock(); + + if (locked_lruvec) + spin_unlock_irq(&locked_lruvec->lru_lock); + + lruvec = lock_page_lruvec_irq(page, pgdat); + + return lruvec; +} + +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, + struct lruvec *locked_lruvec) +{ + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *lruvec; + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + + if (locked_lruvec == lruvec) { + rcu_read_unlock(); + return lruvec; + } + rcu_read_unlock(); + + if (locked_lruvec) + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); + + lruvec = lock_page_lruvec_irqsave(page, pgdat); + + return lruvec; +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); diff --git a/mm/mlock.c b/mm/mlock.c index b509b80b8513..8b3a97b62c0a 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -290,6 +290,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) { int i; int nr = pagevec_count(pvec); + int delta_munlocked = -nr; struct pagevec pvec_putback; struct lruvec *lruvec = NULL; int pgrescued = 0; @@ -300,20 +301,19 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; - lruvec = lock_page_lruvec_irq(page, page_pgdat(page)); + lruvec = relock_page_lruvec_irq(page, lruvec); if (TestClearPageMlocked(page)) { /* * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, lruvec, false)) { - __mod_zone_page_state(zone, NR_MLOCK, -1); - spin_unlock_irq(&lruvec->lru_lock); + if (__munlock_isolate_lru_page(page, lruvec, false)) continue; - } else + else __munlock_isolation_failed(page); - } + } else + delta_munlocked++; /* * We won't be munlocking this page in the next phase @@ -323,8 +323,10 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) */ pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; - spin_unlock_irq(&lruvec->lru_lock); } + __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + if (lruvec) + spin_unlock_irq(&lruvec->lru_lock); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); diff --git a/mm/swap.c b/mm/swap.c index 267c3e262254..0639b3e9e03b 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -195,11 +195,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - lruvec = lock_page_lruvec_irqsave(page, page_pgdat(page)); + lruvec = relock_page_lruvec_irqsave(page, lruvec); (*move_fn)(page, lruvec, arg); - spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); } + if (lruvec) + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); @@ -802,15 +803,12 @@ void release_pages(struct page **pages, int nr) } if (PageLRU(page)) { - struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + struct lruvec *pre_lruvec = lruvec; - if (new_lruvec != lruvec) { - if (lruvec) - spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); + lruvec = relock_page_lruvec_irqsave(page, lruvec); + if (pre_lruvec != lruvec) lock_batch = 0; - lruvec = lock_page_lruvec_irqsave(page, page_pgdat(page)); - } VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); diff --git a/mm/vmscan.c b/mm/vmscan.c index 50f15d1e5f18..cbebd9b0b9c8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1874,22 +1874,25 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, * Returns the number of pages moved to the given lruvec. */ -static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, +static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *llvec, struct list_head *list) { int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; enum lru_list lru; + struct lruvec *lruvec = llvec; while (!list_empty(list)) { page = lru_to_page(list); + lruvec = relock_page_lruvec_irq(page, lruvec); + VM_BUG_ON_PAGE(PageLRU(page), page); if (unlikely(!page_evictable(page))) { list_del(&page->lru); spin_unlock_irq(&lruvec->lru_lock); + lruvec = NULL; putback_lru_page(page); - spin_lock_irq(&lruvec->lru_lock); continue; } @@ -1907,8 +1910,8 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, if (unlikely(PageCompound(page))) { spin_unlock_irq(&lruvec->lru_lock); + lruvec = NULL; (*get_compound_page_dtor(page))(page); - spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); } else { @@ -1916,6 +1919,11 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, } } + if (lruvec != llvec) { + if (lruvec) + spin_unlock_irq(&lruvec->lru_lock); + spin_lock_irq(&llvec->lru_lock); + } /* * To save our caller's stack, now use input list for pages to free. */ @@ -4338,18 +4346,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pgdat = page_pgdat(page); - struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, pgdat); - pgscanned++; - if (lruvec != new_lruvec) { - if (lruvec) - spin_unlock_irq(&lruvec->lru_lock); - lruvec = new_lruvec; - spin_lock_irq(&lruvec->lru_lock); - } + lruvec = relock_page_lruvec_irq(page, lruvec); if (!PageLRU(page) || !PageUnevictable(page)) continue; From patchwork Tue Nov 12 14:06:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239359 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E24315AB for ; Tue, 12 Nov 2019 14:06:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8CCA21E6F for ; Tue, 12 Nov 2019 14:06:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8CCA21E6F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EB1BF6B0003; Tue, 12 Nov 2019 09:06:47 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E63526B0006; Tue, 12 Nov 2019 09:06:47 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D78CB6B0007; Tue, 12 Nov 2019 09:06:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id C15716B0003 for ; Tue, 12 Nov 2019 09:06:47 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 81E3E485F for ; Tue, 12 Nov 2019 14:06:47 +0000 (UTC) X-FDA: 76147801254.24.band60_5842d6ed12150 X-Spam-Summary: 2,0,0,1e48aea802db91b0,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:vbabka@suse.cz:dan.j.williams@intel.com:mhocko@suse.com:richard.weiyang@gmail.com:arunks@codeaurora.org:osalvador@suse.de:rppt@linux.vnet.ibm.com:alexander.h.duyck@linux.intel.com:pasha.tatashin@soleen.com:glider@google.com:hannes@cmpxchg.org,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:3138:3139:3140:3141:3142:3352:3872:3876:4321:4605:5007:6261:6737:6742:7514:7903:9207:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:12986:13069:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21451:21627:30064,0,RBL:115.124.30.57:@linux.al ibaba.co X-HE-Tag: band60_5842d6ed12150 X-Filterd-Recvd-Size: 3530 Received: from out30-57.freemail.mail.aliyun.com (out30-57.freemail.mail.aliyun.com [115.124.30.57]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:46 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R981e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0ThulKmK_1573567600; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0ThulKmK_1573567600) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:41 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Vlastimil Babka , Dan Williams , Michal Hocko , Wei Yang , Arun KS , Oscar Salvador , Mike Rapoport , Alexander Duyck , Pavel Tatashin , Alexander Potapenko , Johannes Weiner Subject: [PATCH v2 5/8] mm/pgdat: remove pgdat lru_lock Date: Tue, 12 Nov 2019 22:06:25 +0800 Message-Id: <1573567588-47048-6-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now pgdat was replaced by lruvec lock. It's not used anymore. Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Vlastimil Babka Cc: Dan Williams Cc: Michal Hocko Cc: Mel Gorman Cc: Wei Yang Cc: Arun KS Cc: Oscar Salvador Cc: Mike Rapoport Cc: Alexander Duyck Cc: Pavel Tatashin Cc: Alexander Potapenko Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Johannes Weiner Cc: Tejun Heo Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org --- include/linux/mmzone.h | 1 - mm/page_alloc.c | 1 - 2 files changed, 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index da00615baa52..3b6029bcb577 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -759,7 +759,6 @@ struct deferred_split { /* Write-intensive fields used by page reclaim */ ZONE_PADDING(_pad1_) - spinlock_t lru_lock; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f391c0c4ed1d..ffc30375c05b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6710,7 +6710,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) init_waitqueue_head(&pgdat->pfmemalloc_wait); pgdat_page_ext_init(pgdat); - spin_lock_init(&pgdat->lru_lock); lruvec_init(node_lruvec(pgdat)); } From patchwork Tue Nov 12 14:06:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5453159A for ; Tue, 12 Nov 2019 14:07:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7F14B21783 for ; Tue, 12 Nov 2019 14:07:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F14B21783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D4D46B0266; Tue, 12 Nov 2019 09:07:21 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 785696B0269; Tue, 12 Nov 2019 09:07:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C1D06B026A; Tue, 12 Nov 2019 09:07:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 569326B0266 for ; Tue, 12 Nov 2019 09:07:21 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 189092AAE for ; Tue, 12 Nov 2019 14:07:21 +0000 (UTC) X-FDA: 76147802682.01.cup78_5d08754d40b3b X-Spam-Summary: 2,0,0,110d454e82adb348,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:hannes@cmpxchg.org:guro@fb.com:shakeelb@google.com:chris@chrisdown.name:tglx@linutronix.de,RULES_HIT:41:69:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:2194:2199:2393:2559:2562:3138:3139:3140:3141:3142:3353:3865:3867:3871:4321:4419:5007:6261:6737:9592:10004:11026:11658:11914:12043:12048:12050:12296:12297:12438:12555:12895:13161:13221:13229:13846:14096:14181:14394:14721:14915:21060:21080:21324:21451:21627:30054:30055:30064:30075,0,RBL:115.124.30.133:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSB L:0,DNSB X-HE-Tag: cup78_5d08754d40b3b X-Filterd-Recvd-Size: 3874 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:07:18 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0Thuhzkb_1573567601; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Thuhzkb_1573567601) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:41 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Chris Down , Thomas Gleixner Subject: [PATCH v2 6/8] mm/lru: remove rcu_read_lock to fix performance regression Date: Tue, 12 Nov 2019 22:06:26 +0800 Message-Id: <1573567588-47048-7-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Intel 0day report there are performance regression on this patchset. The detailed info points to rcu_read_lock + PROVE_LOCKING which causes queued_spin_lock_slowpath waiting too long time to get lock. Remove rcu_read_lock is safe here since we had a spinlock hold. Reported-by: kbuild test robot Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Johannes Weiner Cc: Roman Gushchin Cc: Shakeel Butt Cc: Chris Down Cc: Tejun Heo Cc: Thomas Gleixner Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Shi --- include/linux/memcontrol.h | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 2421b720d272..f869897a68f0 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1307,20 +1307,18 @@ static inline struct lruvec *relock_page_lruvec_irq(struct page *page, struct pglist_data *pgdat = page_pgdat(page); struct lruvec *lruvec; - rcu_read_lock(); + if (!locked_lruvec) + goto lock; + lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (locked_lruvec == lruvec) { - rcu_read_unlock(); + if (locked_lruvec == lruvec) return lruvec; - } - rcu_read_unlock(); - if (locked_lruvec) - spin_unlock_irq(&locked_lruvec->lru_lock); + spin_unlock_irq(&locked_lruvec->lru_lock); +lock: lruvec = lock_page_lruvec_irq(page, pgdat); - return lruvec; } @@ -1331,21 +1329,18 @@ static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, struct pglist_data *pgdat = page_pgdat(page); struct lruvec *lruvec; - rcu_read_lock(); + if (!locked_lruvec) + goto lock; + lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (locked_lruvec == lruvec) { - rcu_read_unlock(); + if (locked_lruvec == lruvec) return lruvec; - } - rcu_read_unlock(); - if (locked_lruvec) - spin_unlock_irqrestore(&locked_lruvec->lru_lock, - locked_lruvec->flags); + spin_unlock_irqrestore(&locked_lruvec->lru_lock, locked_lruvec->flags); +lock: lruvec = lock_page_lruvec_irqsave(page, pgdat); - return lruvec; } From patchwork Tue Nov 12 14:06:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16D3D159A for ; Tue, 12 Nov 2019 14:07:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E135F21783 for ; Tue, 12 Nov 2019 14:06:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E135F21783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6D3A6B000C; Tue, 12 Nov 2019 09:06:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A1DF26B000D; Tue, 12 Nov 2019 09:06:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 875476B0010; Tue, 12 Nov 2019 09:06:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id 5B7176B000C for ; Tue, 12 Nov 2019 09:06:55 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 15645181AEF15 for ; Tue, 12 Nov 2019 14:06:55 +0000 (UTC) X-FDA: 76147801590.17.hen77_59639dfedb82b X-Spam-Summary: 2,0,0,94899555b811c573,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:hannes@cmpxchg.org:mhocko@kernel.org:vdavydov.dev@gmail.com:guro@fb.com:shakeelb@google.com:chris@chrisdown.name:tglx@linutronix.de,RULES_HIT:41:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:2393:2553:2559:2562:3138:3139:3140:3141:3142:3353:3867:3870:4321:4605:5007:6261:6737:7514:10004:11026:11658:11914:12043:12048:12296:12297:12438:12555:12895:12986:13846:14096:14181:14394:14721:14915:21060:21080:21451:21627:30054:30064:30090,0,RBL:115.124.30.44:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0 ,DNSBL:n X-HE-Tag: hen77_59639dfedb82b X-Filterd-Recvd-Size: 4264 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:52 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R511e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0Thubemo_1573567601; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Thubemo_1573567601) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:41 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Thomas Gleixner Subject: [PATCH v2 7/8] mm/lru: likely enhancement Date: Tue, 12 Nov 2019 22:06:27 +0800 Message-Id: <1573567588-47048-8-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use likely() to remove speculations according to pagevec usage mode. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: Roman Gushchin Cc: Shakeel Butt Cc: Chris Down Cc: Tejun Heo Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org --- include/linux/memcontrol.h | 8 ++++---- mm/memcontrol.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index f869897a68f0..2a6d7a503452 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1307,12 +1307,12 @@ static inline struct lruvec *relock_page_lruvec_irq(struct page *page, struct pglist_data *pgdat = page_pgdat(page); struct lruvec *lruvec; - if (!locked_lruvec) + if (unlikely(!locked_lruvec)) goto lock; lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (locked_lruvec == lruvec) + if (likely(locked_lruvec == lruvec)) return lruvec; spin_unlock_irq(&locked_lruvec->lru_lock); @@ -1329,12 +1329,12 @@ static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, struct pglist_data *pgdat = page_pgdat(page); struct lruvec *lruvec; - if (!locked_lruvec) + if (unlikely(!locked_lruvec)) goto lock; lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (locked_lruvec == lruvec) + if (likely(locked_lruvec == lruvec)) return lruvec; spin_unlock_irqrestore(&locked_lruvec->lru_lock, locked_lruvec->flags); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d2539bac4677..d95adf49fae3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1273,7 +1273,7 @@ struct lruvec *lock_page_lruvec_irq(struct page *page, spin_lock_irq(&lruvec->lru_lock); /* lruvec may changed in commit_charge() */ - if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) { + if (unlikely(lruvec != mem_cgroup_page_lruvec(page, pgdat))) { spin_unlock_irq(&lruvec->lru_lock); goto again; } @@ -1291,7 +1291,7 @@ struct lruvec *lock_page_lruvec_irqsave(struct page *page, spin_lock_irqsave(&lruvec->lru_lock, lruvec->flags); /* lruvec may changed in commit_charge() */ - if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) { + if (unlikely(lruvec != mem_cgroup_page_lruvec(page, pgdat))) { spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); goto again; } From patchwork Tue Nov 12 14:06:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11239371 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 96B381747 for ; Tue, 12 Nov 2019 14:07:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 53F3421783 for ; Tue, 12 Nov 2019 14:07:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 53F3421783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 873056B000E; Tue, 12 Nov 2019 09:06:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8246F6B0010; Tue, 12 Nov 2019 09:06:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 678776B0266; Tue, 12 Nov 2019 09:06:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 402806B000E for ; Tue, 12 Nov 2019 09:06:56 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id D2468499604 for ; Tue, 12 Nov 2019 14:06:55 +0000 (UTC) X-FDA: 76147801590.28.nest96_58ee5b093b239 X-Spam-Summary: 2,0,0,6d5bec4d6b1c99be,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:alex.shi@linux.alibaba.com:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:jgg@ziepe.ca:dan.j.williams@intel.com:vbabka@suse.cz:ira.weiny@intel.com:brouer@redhat.com:aryabinin@virtuozzo.com:jannh@google.com:logang@deltatee.com:jrdr.linux@gmail.com:rcampbell@nvidia.com:tobin@kernel.org:mhocko@suse.com:osalvador@suse.de:richard.weiyang@gmail.com:hannes@cmpxchg.org:arunks@codeaurora.org:willy@infradead.org:darrick.wong@oracle.com:amir73il@gmail.com:dchinner@redhat.com:josef@toxicpanda.com:kirill.shutemov@linux.intel.com:jglisse@redhat.com:mike.kravetz@oracle.com:ktkhai@virtuozzo.com:laoar.shao@gmail.com,RULES_HIT:4:41:69:152:355:379:541:800:960:966:968:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1431:1437:1515:1516:1518: 1593:159 X-HE-Tag: nest96_58ee5b093b239 X-Filterd-Recvd-Size: 16495 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 14:06:50 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01419;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=37;SR=0;TI=SMTPD_---0ThuR21T_1573567602; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0ThuR21T_1573567602) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Nov 2019 22:06:42 +0800 From: Alex Shi To: alex.shi@linux.alibaba.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com Cc: Jason Gunthorpe , Dan Williams , Vlastimil Babka , Ira Weiny , Jesper Dangaard Brouer , Andrey Ryabinin , Jann Horn , Logan Gunthorpe , Souptick Joarder , Ralph Campbell , "Tobin C. Harding" , Michal Hocko , Oscar Salvador , Wei Yang , Johannes Weiner , Arun KS , Matthew Wilcox , "Darrick J. Wong" , Amir Goldstein , Dave Chinner , Josef Bacik , "Kirill A. Shutemov" , =?utf-8?b?SsOpcsO0?= =?utf-8?b?bWUgR2xpc3Nl?= , Mike Kravetz , Kirill Tkhai , Yafang Shao Subject: [PATCH v2 8/8] mm/lru: revise the comments of lru_lock Date: Tue, 12 Nov 2019 22:06:28 +0800 Message-Id: <1573567588-47048-9-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> References: <1573567588-47048-1-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. Originally-from: Hugh Dickins Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Jason Gunthorpe Cc: Dan Williams Cc: Vlastimil Babka Cc: Ira Weiny Cc: Jesper Dangaard Brouer Cc: Andrey Ryabinin Cc: Jann Horn Cc: Logan Gunthorpe Cc: Souptick Joarder Cc: Ralph Campbell Cc: "Tobin C. Harding" Cc: Michal Hocko Cc: Oscar Salvador Cc: Mel Gorman Cc: Wei Yang Cc: Johannes Weiner Cc: Arun KS Cc: Matthew Wilcox Cc: "Darrick J. Wong" Cc: Amir Goldstein Cc: Dave Chinner Cc: Josef Bacik Cc: "Kirill A. Shutemov" Cc: "Jérôme Glisse" Cc: Mike Kravetz Cc: Hugh Dickins Cc: Kirill Tkhai Cc: Daniel Jordan Cc: Yafang Shao Cc: Yang Shi Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +++------------ Documentation/admin-guide/cgroup-v1/memory.rst | 6 +++--- Documentation/trace/events-kmem.rst | 2 +- Documentation/vm/unevictable-lru.rst | 22 ++++++++-------------- include/linux/mm_types.h | 2 +- include/linux/mmzone.h | 2 +- mm/filemap.c | 4 ++-- mm/rmap.c | 2 +- mm/vmscan.c | 12 ++++++++---- 9 files changed, 28 insertions(+), 39 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst index 3f7115e07b5d..0b9f91589d3d 100644 --- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst +++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst @@ -133,18 +133,9 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. 8. LRU ====== - Each memcg has its own private LRU. Now, its handling is under global - VM's control (means that it's handled under global pgdat->lru_lock). - Almost all routines around memcg's LRU is called by global LRU's - list management functions under pgdat->lru_lock. - - A special function is mem_cgroup_isolate_pages(). This scans - memcg's private LRU and call __isolate_lru_page() to extract a page - from LRU. - - (By __isolate_lru_page(), the page is removed from both of global and - private LRU.) - + Each memcg has its own vector of LRUs (inactive anon, active anon, + inactive file, active file, unevictable) of pages from each node, + each LRU handled under a single lru_lock for that memcg and node. 9. Typical Tests. ================= diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 0ae4f564c2d6..60d97e8b7f3c 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -297,13 +297,13 @@ When oom event notifier is registered, event will be delivered. PG_locked. mm->page_table_lock - pgdat->lru_lock + lruvec->lru_lock lock_page_cgroup. In many cases, just lock_page_cgroup() is called. - per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by - pgdat->lru_lock, it has no lock of its own. + per-node-per-cgroup LRU (cgroup's private LRU) is just guarded by + lruvec->lru_lock, it has no lock of its own. 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) ----------------------------------------------- diff --git a/Documentation/trace/events-kmem.rst b/Documentation/trace/events-kmem.rst index 555484110e36..68fa75247488 100644 --- a/Documentation/trace/events-kmem.rst +++ b/Documentation/trace/events-kmem.rst @@ -69,7 +69,7 @@ When pages are freed in batch, the also mm_page_free_batched is triggered. Broadly speaking, pages are taken off the LRU lock in bulk and freed in batch with a page list. Significant amounts of activity here could indicate that the system is under memory pressure and can also indicate -contention on the zone->lru_lock. +contention on the lruvec->lru_lock. 4. Per-CPU Allocator Activity ============================= diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst index 17d0861b0f1d..0e1490524f53 100644 --- a/Documentation/vm/unevictable-lru.rst +++ b/Documentation/vm/unevictable-lru.rst @@ -33,7 +33,7 @@ reclaim in Linux. The problems have been observed at customer sites on large memory x86_64 systems. To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of -main memory will have over 32 million 4k pages in a single zone. When a large +main memory will have over 32 million 4k pages in a single node. When a large fraction of these pages are not evictable for any reason [see below], vmscan will spend a lot of time scanning the LRU lists looking for the small fraction of pages that are evictable. This can result in a situation where all CPUs are @@ -55,7 +55,7 @@ unevictable, either by definition or by circumstance, in the future. The Unevictable Page List ------------------------- -The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list +The Unevictable LRU infrastructure consists of an additional, per-node, LRU list called the "unevictable" list and an associated page flag, PG_unevictable, to indicate that the page is being managed on the unevictable list. @@ -84,15 +84,9 @@ The unevictable list does not differentiate between file-backed and anonymous, swap-backed pages. This differentiation is only important while the pages are, in fact, evictable. -The unevictable list benefits from the "arrayification" of the per-zone LRU +The unevictable list benefits from the "arrayification" of the per-node LRU lists and statistics originally proposed and posted by Christoph Lameter. -The unevictable list does not use the LRU pagevec mechanism. Rather, -unevictable pages are placed directly on the page's zone's unevictable list -under the zone lru_lock. This allows us to prevent the stranding of pages on -the unevictable list when one task has the page isolated from the LRU and other -tasks are changing the "evictability" state of the page. - Memory Control Group Interaction -------------------------------- @@ -101,8 +95,8 @@ The unevictable LRU facility interacts with the memory control group [aka memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the lru_list enum. -The memory controller data structure automatically gets a per-zone unevictable -list as a result of the "arrayification" of the per-zone LRU lists (one per +The memory controller data structure automatically gets a per-node unevictable +list as a result of the "arrayification" of the per-node LRU lists (one per lru_list enum element). The memory controller tracks the movement of pages to and from the unevictable list. @@ -196,7 +190,7 @@ for the sake of expediency, to leave a unevictable page on one of the regular active/inactive LRU lists for vmscan to deal with. vmscan checks for such pages in all of the shrink_{active|inactive|page}_list() functions and will "cull" such pages that it encounters: that is, it diverts those pages to the -unevictable list for the zone being scanned. +unevictable list for the node being scanned. There may be situations where a page is mapped into a VM_LOCKED VMA, but the page is not marked as PG_mlocked. Such pages will make it all the way to @@ -328,7 +322,7 @@ If the page was NOT already mlocked, mlock_vma_page() attempts to isolate the page from the LRU, as it is likely on the appropriate active or inactive list at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put back the page - by calling putback_lru_page() - which will notice that the page -is now mlocked and divert the page to the zone's unevictable list. If +is now mlocked and divert the page to the node's unevictable list. If mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle it later if and when it attempts to reclaim the page. @@ -603,7 +597,7 @@ Some examples of these unevictable pages on the LRU lists are: unevictable list in mlock_vma_page(). shrink_inactive_list() also diverts any unevictable pages that it finds on the -inactive lists to the appropriate zone's unevictable list. +inactive lists to the appropriate node's unevictable list. shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd after shrink_active_list() had moved them to the inactive list, or pages mapped diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 270aa8fd2800..ff08a6a8145c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -78,7 +78,7 @@ struct page { struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by - * pgdat->lru_lock. Sometimes used as a generic list + * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ struct list_head lru; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 3b6029bcb577..3c7a00016f77 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -159,7 +159,7 @@ static inline bool free_area_empty(struct free_area *area, int migratetype) struct pglist_data; /* - * zone->lock and the zone lru_lock are two of the hottest locks in the kernel. + * zone->lock and the lru_lock are two of the hottest locks in the kernel. * So add a wild amount of padding here to ensure that they fall into separate * cachelines. There are very few zone structures in the machine, so space * consumption is not a concern here. diff --git a/mm/filemap.c b/mm/filemap.c index 85b7d087eb45..c508ae620635 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -101,8 +101,8 @@ * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->i_pages lock (try_to_unmap_one) - * ->pgdat->lru_lock (follow_page->mark_page_accessed) - * ->pgdat->lru_lock (check_pte_range->isolate_lru_page) + * ->lruvec->lru_lock (follow_page->mark_page_accessed) + * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) * ->private_lock (page_remove_rmap->set_page_dirty) * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) diff --git a/mm/rmap.c b/mm/rmap.c index 0c7b2a9400d4..561c6ad1cbe9 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -27,7 +27,7 @@ * mapping->i_mmap_rwsem * anon_vma->rwsem * mm->page_table_lock or pte_lock - * pgdat->lru_lock (in mark_page_accessed, isolate_lru_page) + * lruvec->lru_lock (in mark_page_accessed, isolate_lru_page) * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in __set_page_dirty_buffers) diff --git a/mm/vmscan.c b/mm/vmscan.c index cbebd9b0b9c8..77948db33c3a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1661,14 +1661,16 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec, } /** - * pgdat->lru_lock is heavily contended. Some of the functions that + * Isolating page from the lruvec to fill in @dst list by nr_to_scan times. + * + * lruvec->lru_lock is heavily contended. Some of the functions that * shrink the lists perform better by taking out a batch of pages * and working on them outside the LRU lock. * * For pagecache intensive workloads, this function is the hottest * spot in the kernel (apart from copy_*_user functions). * - * Appropriate locks must be held before calling this function. + * Lru_lock must be held before calling this function. * * @nr_to_scan: The number of eligible pages to look through on the list. * @lruvec: The LRU vector to pull pages from. @@ -1856,14 +1858,16 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, /* * This moves pages from @list to corresponding LRU list. + * The pages from @list is out of any lruvec, and in the end list reuses as + * pages_to_free list. * * We move them the other way if the page is referenced by one or more * processes, from rmap. * * If the pages are mostly unmapped, the processing is fast and it is - * appropriate to hold zone_lru_lock across the whole operation. But if + * appropriate to hold lru_lock across the whole operation. But if * the pages are mapped, the processing is slow (page_referenced()) so we - * should drop zone_lru_lock around each page. It's impossible to balance + * should drop lru_lock around each page. It's impossible to balance * this, so instead we remove the pages from the LRU while processing them. * It is safe to rely on PG_active against the non-LRU pages in here because * nobody will play with that bit on a non-LRU page.