From patchwork Tue Oct 3 00:18:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13406629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82BEBE776DA for ; Tue, 3 Oct 2023 00:18:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E5178D0035; Mon, 2 Oct 2023 20:18:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 795418D0003; Mon, 2 Oct 2023 20:18:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60F0B8D0035; Mon, 2 Oct 2023 20:18:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4C5AA8D0003 for ; Mon, 2 Oct 2023 20:18:33 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 11F0840113 for ; Tue, 3 Oct 2023 00:18:33 +0000 (UTC) X-FDA: 81302238906.03.21A5DCF Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf15.hostedemail.com (Postfix) with ESMTP id 4C2C4A0011 for ; Tue, 3 Oct 2023 00:18:31 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=elEFfZ1W; spf=pass (imf15.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696292311; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=/BmJv0MBfIMWl5p2QDJcq2KnxKw0Xqy0pinCaSmlKA0=; b=3LiEID6V4enW3Ths7P9kIt+ukDL2UPIj24UIceY43fAG007dDbdVdlhO2h9EM3gykqATt8 xjduBSKHCJSHYMc4laPXhstw4rYUxw7pUNFk2eybf25mvE+Tp+9M5SxRW3KogmCpemA2n4 8cjcYMK7M9/nB56dhRBLYdU3vg2lK0Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696292311; a=rsa-sha256; cv=none; b=QZdOWiTNsiVj7EY/4jeBRSGSinF1NObTgy9729pepkPQlG7sWzMxA2DkFmbJYgRUoexv8C TC2l+VrrCZoi2Cc2GSw1LudhHkJYTDNNGVLkHnL3r16HNEUyX5Vte+QXzcmoKJ+16SpCij ynUSDB42ncM3quuwtyUw9VMGDmJIbs0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=elEFfZ1W; spf=pass (imf15.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-5859a7d6556so200680a12.0 for ; Mon, 02 Oct 2023 17:18:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696292310; x=1696897110; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=/BmJv0MBfIMWl5p2QDJcq2KnxKw0Xqy0pinCaSmlKA0=; b=elEFfZ1WQnKxNueJ1Rzv+yQr+LUAno4FDYP1pdtD4Z4NfkgXJBZgdJ8YGQmnunM/if LvO3EosWPKgj8jiEAM0hIsPp1rtia0zGEzLP3ktO7qkdNw2giOoJZWM3krcLeVwxOSiW 3Uzf9SIZ8L9l6NMNzrBmueGljRPIhT5aCKecNYPaLfz8yOq7suYEnpciO2yGl8Vs877n ilnN/R/+TGLn/xtC6OFshyVKV5ydz+b7XDgIxKJDdMY4FMACA9/8yEV+51NDEK4A9aDs 8+6H5sAjVgV5wgaf1RBQslxACRdvdePJZy2YpvYvI/6aIYR8SMpnLq/qvwNf0O7X9yVe pK1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696292310; x=1696897110; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/BmJv0MBfIMWl5p2QDJcq2KnxKw0Xqy0pinCaSmlKA0=; b=nJBPSDaSgRkFkttrvnnD3ON1KgH92ZiTpUlYSi2halLfFSgk4Pw34CbH0mAVxQnDI2 DiMsLh/2V0QO4mFsPZuxiH1xLw6weuD/XRDjAn1UTIyKF94s+LmYWhk5/MEO89IjAK8j Wsp7iTAiccvBk785+hPhP7tMeFnYBTPPRbMaeBcdGiawRF+lAdOq2AjYV2UmgDr1yKhX WI+2Jj6m0KjuOr0jaIKEH+rdSBESqIzRAXGlS9WQFzv36S0dmUEZPYylkMWQiIZffdxx pO8LNwJ4Vn0sleH/XbfQDzAtAY6/SKU4El/IicGyvNve+Rj6thcrl+fGKaPgFgJVTLMU fX3w== X-Gm-Message-State: AOJu0Yz05cxNYDSjabt7KF6LOAag0w4URJvqcakuTvwgzvIxk0sZf02k RjRJPHMlu3Y5qhaB/aYgoXw= X-Google-Smtp-Source: AGHT+IEtRLbmYGNzQPbUlvZunesWVPYxVjxoOOuUZUSuJa0fiHw8kPNU15WfO3xHatmOWY85p7O+XQ== X-Received: by 2002:a05:6a20:9719:b0:162:6588:7174 with SMTP id hr25-20020a056a20971900b0016265887174mr11557807pzc.28.1696292309933; Mon, 02 Oct 2023 17:18:29 -0700 (PDT) Received: from localhost (fwdproxy-prn-018.fbsv.net. [2a03:2880:ff:12::face:b00c]) by smtp.gmail.com with ESMTPSA id g7-20020aa78187000000b00688435a9915sm73158pfi.189.2023.10.02.17.18.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 17:18:29 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v3 0/3] hugetlb memcg accounting Date: Mon, 2 Oct 2023 17:18:25 -0700 Message-Id: <20231003001828.2554080-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 4C2C4A0011 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: qqisrgz1r8j5fg18mpa34m9mc3x7xs6q X-HE-Tag: 1696292311-432633 X-HE-Meta: U2FsdGVkX18oOJomGUuu/EefuKXvvxoK0zr7Wo7ZI41odr6jr2uH2zym+yGZ3d1meyoFKqWm8SDeoNZm+HiETzDZ+lNpB0UoXZuvkOOtfjuIuLFy8B5euXfiUfuD173liUI306L5oluDbLZvIvn0Kg4UzT3bs1F8tZJBjdO+9IA76ptvpT7l1pA9+C0LzaP/u+qBz855aey4SC2d5akSc4xIBfbZUOcREsf1/jWJkpbdXscxcV1VRVUVxZmxGMlNvy5ZHD+/s5Z8Fsk0QaEdtZtEOhCrluzwGfVT0W6PmUp/ZBdvv+raSo7Eh3jDTtbNpIE8OBtfeT+b3cbU0ST7ZNY5Xd2gP56N2JrXka9GyaXypUbING71pAo3xm0vhzE7Ht7jvqPpofAv66UWGiz8mH/LUenhTkNp/uZZQAU3thFDcS/JTzY2DOo1qDR1SVwOWw70UtZeV2PlHrT3Btw5SdqvGk+5ki20Z7rrihHvWjGto5Gxtq7ZQ4z4wCm6vOtu6LUyHFZw3vMRbW1HYVIYW8H66DyY81LYX6cDv9WcztKcAIcqYxsevB0FH9qRab8sOqillXR/OZA6+qTlQloNPUW+iRcb44DXA8sY0hm/h0ygGkucGBihcW82cAxU+iR2i/o7ZqHEwyoBqffWDiIeYCe6iyaIngtc3U8RT3g6UfzdpBBGtPKxIhMPcK5mikcaqJH/HqqO3y7E9VX3GDqNQod9iMCvH1is0neSg8SZjM6rvNeo1x5lA7pcfvcAByAOWd5FYEJyU5vt+b3XH0JGbItNZZSOyvNMJJkVcnZSJkhIrV3xapNB6DZvFlyfZG6+s2tH373nGKdzJkN4w6iuhihjiNSBcV7Texu8GaOQ4Ir5c2iugFfbk8vzsQ1fVYQwDgb1pwRhaO8mJQ5Ey6G14ubNyuoPaDZAtr9f48EF0gCQyZvNNMrIEPbz74XCBkIxx+mVUqPXLaXkQE18mFQ NF8dxVQo 4v/+xWmphYh8SDK4fS4hW+o0KUXSj1xfrVywOC/25Gpg9aqvUQfpQi1v74DNC0O7sFvTfAQ21ViXXyK+LJ8S0R4F9LlZDasu+84HvocpVhvVJwt0Qf7a3eFg9K3hrb+t9kD/AW2Tt8PKND7hCZJbKuj8QXY0pobnDeNnKPWEONtyrUcFuaWSyos3JjvryQ7Mwqp0ZB74so8iFg1WABmmlARa2kmyLS7AlMQgXiphOE7erQiGpv9hhncj/I5x9b73zQqGc2WfuyIby+sfDp0O7HPrrEqHm7WU3apMw4zux5bwvo5/0AzU/9GLVhyvKdAiZGINBSq/QJW8Sd5qSb09oGpcyJFbwq58WdjHGMLMR0I8M8ehqwaJQ+bjIzDbLGhPNXmmppBeCmo6iI8D8EgIckqBghQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog: v3: * Add a prep patch at the start of the series to extend the memory controller interface with new helper functions for hugetlb accounting. * Do not account hugetlb memory for memcontroller in cgroup v1 (patch 2) (suggested by Johannes Weiner). * Change the gfp flag passed to mem cgroup charging (patch 2) (suggested by Michal Hocko). * Add caveats to cgroup admin guide and commit changelog (patch 2) (suggested by Michal Hocko). v2: * Add a cgroup mount option to enable/disable the new hugetlb memcg accounting behavior (patch 1) (suggested by Johannes Weiner). * Add a couple more ksft_print_msg() on error to aid debugging when the selftest fails. (patch 2) Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system. For instance, here is one of our usecases: suppose there are two 32G containers. The machine is booted with hugetlb_cma=6G, and each container may or may not use up to 3 gigantic page, depending on the workload within it. The rest is anon, cache, slab, etc. We can set the hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. But it is very difficult to configure memory.max to keep overall consumption, including anon, cache, slab etc. fair. What we have had to resort to is to constantly poll hugetlb usage and readjust memory.max. Similar procedure is done to other memory limits (memory.low for e.g). However, this is rather cumbersome and buggy. Furthermore, when there is a delay in memory limits correction, (for e.g when hugetlb usage changes within consecutive runs of the userspace agent), the system could be in an over/underprotected state. This patch series rectifies this issue by charging the memcg when the hugetlb folio is allocated, and uncharging when the folio is freed. In addition, a new selftest is added to demonstrate and verify this new behavior. Nhat Pham (3): memcontrol: add helpers for hugetlb memcg accounting hugetlb: memcg: account hugetlb-backed memory in memory controller selftests: add a selftest to verify hugetlb usage in memcg Documentation/admin-guide/cgroup-v2.rst | 29 +++ MAINTAINERS | 2 + include/linux/cgroup-defs.h | 5 + include/linux/memcontrol.h | 30 +++ kernel/cgroup/cgroup.c | 15 +- mm/hugetlb.c | 35 ++- mm/memcontrol.c | 94 ++++++- tools/testing/selftests/cgroup/.gitignore | 1 + tools/testing/selftests/cgroup/Makefile | 2 + .../selftests/cgroup/test_hugetlb_memcg.c | 234 ++++++++++++++++++ 10 files changed, 427 insertions(+), 20 deletions(-) create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c