From patchwork Wed Oct 30 08:33:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gutierrez Asier X-Patchwork-Id: 13856116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 015E2D2AB2C for ; Wed, 30 Oct 2024 08:33:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A92C8D0005; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 242818D0006; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E01C98D0007; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A86738D0005 for ; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 304261C5BA9 for ; Wed, 30 Oct 2024 08:33:46 +0000 (UTC) X-FDA: 82729604706.18.3F5BF6B Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf07.hostedemail.com (Postfix) with ESMTP id 50B4B40018 for ; Wed, 30 Oct 2024 08:33:08 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei-partners.com; spf=pass (imf07.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730277180; a=rsa-sha256; cv=none; b=hG8wxqBgDyoPZcf1PSE6Zb41wiTCfxGgBEwbRBFm1VIwwbjq3ExGsZ7ijCcBOaBdwMxH8j jcsN1/fDxLZ4gdytBJTHioBNlNuSHbIfRGFhd1YLlP8wxLM9+IINKMaKBEEvE/5aG7hdmk BmRBJIixZwlSzgklTOssW7SW45f94G0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei-partners.com; spf=pass (imf07.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730277180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=S96ez9rhRw9zl8JzlQHjMAC9DUMAjvr5SOzZsNw8DKw=; b=2eyHgjSkWEoV12JnMx+woODd6NcOYsSx1LzUMsbLajpBdHMwun62WlCLR99mBFY169vF1t qyIsO0ZnyILYCJTQm8e8n98Jug9QvjyvV8dYu+ogrxwUYiM9YSwaiKL2Whei4JDBLTxtv3 0x7pjkV+faq5balyiBHZHbUyqlqIo18= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XdgJK6rb8z6GFsL; Wed, 30 Oct 2024 16:28:49 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id C0FF8140A35; Wed, 30 Oct 2024 16:33:40 +0800 (CST) Received: from mscphis01197.huawei.com (10.123.65.218) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Wed, 30 Oct 2024 11:33:40 +0300 From: To: , , , , , , , , , , CC: , , , , , , , , , , , Subject: [RFC PATCH 0/3] Cgroup-based THP control Date: Wed, 30 Oct 2024 16:33:08 +0800 Message-ID: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.123.65.218] X-ClientProxiedBy: mscpeml500003.china.huawei.com (7.188.49.51) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspam-User: X-Stat-Signature: duiowjoip3s9613akwzjto77sjci9s4o X-Rspamd-Queue-Id: 50B4B40018 X-Rspamd-Server: rspam02 X-HE-Tag: 1730277188-758695 X-HE-Meta: U2FsdGVkX1/d/KqGduBQRNBfSZxkAESui6dKGPrJDmmgcdOA3vum6b18rzjLGr2C1rlzm1pX9pzorbeeNMS+gbo6tN/jv9lAGk2e30otfNnAzUvHxvfqqbxwu/e0r//QJDz9x06exmMkhJXtkXtgsRwpi/mVMX4uZTAjluknuQZYbH3rhj/23CGENRGLHWK62RdizWhQqvAgWctDHzg2MjWO8Xq/0GzdWZ22MSZJrJn84G3RYo7xas3JtuNegOGGcFtEoK/s9NOPfQbF+BWwuCmTBLsLjBR5nu+LZsN8gPwBigtaYm89OWqrCSh7mW9am6rGUr8/2pKYXaJ2tKmkZWOJq3bITlP0gnS+rx+wgLCn9m8C5SZv606hbPEW5CRN7996LmWojyT/4ApR7UeP9InEG1yBDRpn3bfCbi7nOJj6VcKiAYUUDz3KM0atU0xEQo7uXEVIn7V6167Hi3cgVfNYbR2mB4/zjgUiT/rLLbkKs7OwY0Hpw222fxJKlRMKuDLG+lBgJ8xS1eS1NJczcWYW8+YVQ0kb4klT0giXoGQyHuTD5niUOANv4s/a/dhQePZ3lusbxhcmhh4dKWBVkhmzeVmtOFyFJr5cqJu8hggaHDYfRcLoBtR4Ev3FUiILaxzWOT+nRKXi1uymf0zXEoigx/XMjM7WVzJKAlwsn9XJMikTIQSYJEcG+/9RTDKQdRe5E/mLyvZu061NhUzJFnarcjG2yLlXpdTeG6kjAiRKfsQg09TrXR9JYc7RZb2nRePicZXPoNXyU8Q5lDB+ct4+k1oocr0WiRUVNU4nshhaTQq+A0SgD3udjbIHrep2k/tAyB71PhSzCklCVJc01V8vaW6RVZJhAFLOt8vvVbc6GJx+e9atMDx2hA060IaC+cc2QvnxeB/PCxaP40uHuv/D/UYWD5rW/T5ruhkxL4qQPo4e7j1K9aa1GOUAvB7tiVd8rJyvXfUwAOaJ7tf Sgb+qP8M 9/E1xQFLw5AsAtxO991VutXl1xd2Ee6rado86esgijQ8rl7WAZO+5+/qkuCWRyReuFaY+vmIYZg59pBCn2qMXxVs8WsK0jGf7sIqDgxjhCUGXEeSiovWjte7skqpZhWCndRGPTPucAjr0dYOuXF7xhbopEX1ogYW9HK0+ykQDoqLJlPgJ7lMNwtUDSKv1tCPgtbsAJcI3ppMT6/JZ/rBEqjFxpA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Asier Gutierrez Currently THP modes are set globally. It can be an overkill if only some specific app/set of apps need to get benefits from THP usage. Moreover, various apps might need different THP settings. Here we propose a cgroup-based THP control mechanism. THP interface is added to memory cgroup subsystem. Existing global THP control semantics is supported for backward compatibility. When THP modes are set globally all the changes are propagated to memory cgroups. However, when a particular cgroup changes its THP policy, the global THP policy in sysfs remains the same. New memcg files are exposed: memory.thp_enabled and memory.thp_defrag, which have completely the same format as global THP enabled/defrag. Child cgroups inherit THP settings from parent cgroup upon creation. Particular cgroup mode changes aren't propagated to child cgroups. During the memory cgroup attachment stage, the correct slots are added or removed to khugepaged according to the THP policy. Usage examples: Set globally "madvise" mode: # echo madvise > /sys/kernel/mm/transparent_hugepage/enabled # cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never All the settings are propagated # cat /sys/fs/cgroup/memory.thp_enabled always [madvise] never # cat /sys/fs/cgroup/test/memory.thp_enabled always [madvise] never Set "always" for some specific cgroup: # echo always > /sys/fs/cgroup/test/memory.thp_enabled # cat /sys/fs/cgroup/test/memory.thp_enabled [always] madvise never Root cgroup remains with "madvise" mode: # cat /sys/fs/cgroup/memory.thp_enabled always [madvise] never When attempting to read global settings we get "mixed state" warning as the THP-mode isn't the same for every cgroup: # cat /sys/kernel/mm/transparent_hugepage/enabled Mixed state: see particular memcg flags! Again, set THP mode globally, make sure everything works fine: # echo never > /sys/kernel/mm/transparent_hugepage/enabled # cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] # cat /sys/fs/cgroup/memory.thp_enabled always madvise [never] # cat /sys/fs/cgroup/test/memory.thp_enabled always madvise [never] Here is a simple demo with a test which is doing anon. mmap() and a series of random reads. System is rebooted between the cases. Case 1: Global THP - always. No cgroup. // Global THP stats: AnonHugePages: 391168 kB FileHugePages: 120832 kB FilePmdMapped: 67584 kB // THP stats from *smaps* of the testing process AnonHugePages: 12288 kB Case 2: Global THP - never. Cgroup - always. // Global THP stats: AnonHugePages: 12288 kB FileHugePages: 2048 kB FilePmdMapped: 2048 kB // THP stats from *smaps* of the testing process AnonHugePages: 12288 kB // The cgroup THP stats anon_thp 12582912 file_thp 2097152 Obviously there's a huge difference between the two in terms of global THP usage, thus showing the cgroup approach is beneficial for such cases, when a specific app/set of apps needs THP, but not willing to change anything in the app. code. TODO list: 1. Anonymous mTHP 2. Fine-grained mode selection for different VMA types: "anon|exec|ro|file", to be able to support combinations as: "always + exec", "always + anon", etc. 3. Per-cgroup limit for the THP usage Signed-off-by: Asier Gutierrez Signed-off-by: Anatoly Stepanov Reviewed-by: Alexander Kozhevnikov Asier Gutierrez, Anatoly Stepanov (3): mm: Add thp_flags control for cgroup mm: Support for huge pages in cgroups mm: Add thp_defrag control for cgroup include/linux/huge_mm.h | 23 +++- include/linux/khugepaged.h | 2 +- include/linux/memcontrol.h | 28 ++++ mm/huge_memory.c | 207 ++++++++++++++++++----------- mm/khugepaged.c | 8 +- mm/memcontrol.c | 262 +++++++++++++++++++++++++++++++++++++ 6 files changed, 449 insertions(+), 81 deletions(-)