From patchwork Thu Oct 28 11:56:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ning Zhang X-Patchwork-Id: 12589937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21CB0C433F5 for ; Thu, 28 Oct 2021 11:57:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C33B660E8C for ; Thu, 28 Oct 2021 11:57:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C33B660E8C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8CD31940008; Thu, 28 Oct 2021 07:57:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 745CA94000C; Thu, 28 Oct 2021 07:57:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48132940008; Thu, 28 Oct 2021 07:57:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 196BF94000A for ; Thu, 28 Oct 2021 07:57:04 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A3C9582499A8 for ; Thu, 28 Oct 2021 11:57:03 +0000 (UTC) X-FDA: 78745695126.25.80EC15F Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf25.hostedemail.com (Postfix) with ESMTP id F34B7B0001A3 for ; Thu, 28 Oct 2021 11:56:55 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=ningzhang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Uu.Soim_1635422218; Received: from localhost(mailfrom:ningzhang@linux.alibaba.com fp:SMTPD_---0Uu.Soim_1635422218) by smtp.aliyun-inc.com(127.0.0.1); Thu, 28 Oct 2021 19:56:58 +0800 From: Ning Zhang To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Vladimir Davydov , Yu Zhao Subject: [RFC 6/6] mm, thp: add document for zero subpages reclaim Date: Thu, 28 Oct 2021 19:56:55 +0800 Message-Id: <1635422215-99394-7-git-send-email-ningzhang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com> References: <1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com> X-Rspamd-Queue-Id: F34B7B0001A3 Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of ningzhang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=ningzhang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Stat-Signature: 7zxo9x5iz7jkd7nxq7kkj3pf14jfpgee X-Rspamd-Server: rspam06 X-HE-Tag: 1635422215-435215 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add user guide for thp zero subpages reclaim. Signed-off-by: Ning Zhang --- Documentation/admin-guide/mm/transhuge.rst | 75 ++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index c9c37f1..85cd3b7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -421,3 +421,78 @@ support enabled just fine as always. No difference can be noted in hugetlbfs other than there will be less overall fragmentation. All usual features belonging to hugetlbfs are preserved and unaffected. libhugetlbfs will also work fine as usual. + +THP zero subpages reclaim +========================= +THP may lead to memory bloat which may cause OOM. The reason is a huge +page may contain some zero subpages which users didn't really access them. +To avoid this, a mechanism to reclaim these zero subpages is introduced:: + + echo 1 > /sys/fs/cgroup/memory/{memcg}/memory.thp_reclaim + echo 0 > /sys/fs/cgroup/memory/{memcg}/memory.thp_reclaim + +Echo 1 to enable and echo 0 to disable. +The default value is inherited from its parent. The default mode of root +memcg is disable. + +We also add a global interface, if you don't want to configure it by +configuring every memory cgroup, you can use this one:: + + /sys/kernel/mm/transparent_hugepage/reclaim + +memcg + The default mode. It means every mem cgroup will use their own + configure. + +enable + means every mem cgroup will enable reclaim. + +disable + means every mem cgroup will disable reclaim. + +If zero subpages reclaim is enabled, the new huge page will be add to a +reclaim queue in mem_cgroup, and the queue would be scanned when memory +reclaiming. The queue stat can be checked like this:: + + cat /sys/fs/cgroup/memory/{memcg}/memory.thp_reclaim_stat + +queue_length + means the queue length of each node. + +split_hpage + means the numbers of huge pages split by thp reclaim of each node. + +split_failed + means the numbers of huge pages split failed by thp reclaim of + each node. + +reclaim_subpage + means the numbers of zero subpages reclaimed by thp reclaim of + each node. + +We also add a controller interface to set configs for thp reclaim:: + + /sys/fs/cgroup/memory/{memcg}/memory.thp_reclaim_ctrl + +threshold + means the huge page which contains at least threshold zero pages would + be split (estimate it by checking some discrete unsigned long values). + The default value of threshold is 16, and will inherit from it's parent. + The range of this value is (0, HPAGE_PMD_NR], which means the value must + be less than or equal to HPAGE_PMD_NR (512 in x86), and be greater than 0. + We can set reclaim threshold to be 8 by this:: + + echo "threshold 8" > memory.thp_reclaim_ctrl + +reclaim + triggers action immediately for the huge pages in the reclaim queue. + The action deponds on the thp reclaim config (reclaim, swap or disable, + disable means just remove the huge page from the queue). + This contronller has two value, 1 and 2. 1 means just reclaim the current + memcg, and 2 means reclaim the current memcg and all the children memcgs. + Like this:: + + echo "reclaim 1" > memory.thp_reclaim_ctrl + echo "reclaim 2" > memory.thp_reclaim_ctrl + +Only one of the configs mentioned above can be set at a time.