From patchwork Wed Sep 29 10:19:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 12525253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46CBEC4332F for ; Wed, 29 Sep 2021 10:20:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EEF6F613CD for ; Wed, 29 Sep 2021 10:20:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EEF6F613CD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BC09A94001B; Wed, 29 Sep 2021 06:20:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFF806B008A; Wed, 29 Sep 2021 06:20:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B19B94001C; Wed, 29 Sep 2021 06:20:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 6D37E6B0088 for ; Wed, 29 Sep 2021 06:20:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2D9F282499A8 for ; Wed, 29 Sep 2021 10:20:29 +0000 (UTC) X-FDA: 78640216578.23.591B95B Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf20.hostedemail.com (Postfix) with ESMTP id 04B26D0000B8 for ; Wed, 29 Sep 2021 10:20:27 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0Uq1.Yfp_1632910824; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Uq1.Yfp_1632910824) by smtp.aliyun-inc.com(127.0.0.1); Wed, 29 Sep 2021 18:20:25 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] hugetlb_cgroup: Add interfaces to move hugetlb charge at task migration Date: Wed, 29 Sep 2021 18:19:27 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 04B26D0000B8 X-Stat-Signature: z4woht8em87konz9a4yrbhcsoi7m578p X-HE-Tag: 1632910827-196876 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now in the hugetlb cgroup, charges associated with a task aren't moved to the new hugetlb cgroup at task migration, which is not reasonable. Thus this patch set adds some interfaces for charging to the new hugetlb cgroup and uncharging from the old hugetlb cgroup at task migration. This patch adds can_attach() and cancel_attach() to check if we can charge to the new hugetlb cgroup. Signed-off-by: Baolin Wang --- mm/hugetlb_cgroup.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 162 insertions(+) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 5383023..2568d0c 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -19,6 +19,7 @@ #include #include +#include #include #include #include @@ -32,6 +33,14 @@ static struct hugetlb_cgroup *root_h_cgroup __read_mostly; +static struct hugetlb_move_charge { + struct mm_struct *mm; + struct hugetlb_cgroup *from; + struct hugetlb_cgroup *to; + unsigned long precharge[HUGE_MAX_HSTATE]; + unsigned long moved_charge[HUGE_MAX_HSTATE]; +} hmc; + static inline struct page_counter * __hugetlb_cgroup_counter_from_cgroup(struct hugetlb_cgroup *h_cg, int idx, bool rsvd) @@ -151,6 +160,157 @@ static void hugetlb_cgroup_css_free(struct cgroup_subsys_state *css) kfree(h_cgroup); } +static int hugetlb_cgroup_precharge_pte_range(pte_t *pte, unsigned long hmask, + unsigned long addr, + unsigned long end, + struct mm_walk *walk) +{ + struct page *page; + spinlock_t *ptl; + pte_t entry; + struct hstate *h = hstate_vma(walk->vma); + + ptl = huge_pte_lock(h, walk->mm, pte); + entry = huge_ptep_get(pte); + /* TODO: only handle present hugetlb pages now. */ + if (!pte_present(entry)) { + spin_unlock(ptl); + return 0; + } + + page = pte_page(entry); + spin_unlock(ptl); + + spin_lock_irq(&hugetlb_lock); + if (hugetlb_cgroup_from_page(page) == hmc.from) { + int idx = hstate_index(h); + + hmc.precharge[idx]++; + } + spin_unlock_irq(&hugetlb_lock); + + cond_resched(); + return 0; +} + +static const struct mm_walk_ops hugetlb_precharge_walk_ops = { + .hugetlb_entry = hugetlb_cgroup_precharge_pte_range, +}; + +static int hugetlb_cgroup_precharge(struct mm_struct *mm) +{ + struct page_counter *counter; + unsigned long precharge; + int idx; + + mmap_read_lock(mm); + walk_page_range(mm, 0, mm->highest_vm_end, &hugetlb_precharge_walk_ops, NULL); + mmap_read_unlock(mm); + + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) { + if (!hmc.precharge[idx]) + continue; + + precharge = hmc.precharge[idx]; + hmc.precharge[idx] = 0; + + if (!page_counter_try_charge( + __hugetlb_cgroup_counter_from_cgroup(hmc.to, idx, false), + precharge * pages_per_huge_page(&hstates[idx]), &counter)) + return -ENOMEM; + + hmc.precharge[idx] = precharge; + } + + return 0; +} + +static void hugetlb_cgroup_clear(void) +{ + struct mm_struct *mm = hmc.mm; + struct hugetlb_cgroup *to = hmc.to; + int idx; + + /* we must uncharge all the leftover precharges from hmc.to */ + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) { + if (!hmc.precharge[idx]) + continue; + + page_counter_uncharge( + __hugetlb_cgroup_counter_from_cgroup(to, idx, false), + hmc.precharge[idx] * pages_per_huge_page(&hstates[idx])); + hmc.precharge[idx] = 0; + } + + hmc.from = NULL; + hmc.to = NULL; + hmc.mm = NULL; + + mmput(mm); +} + +static int hugetlb_cgroup_can_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *css; + struct task_struct *leader, *p; + struct hugetlb_cgroup *h_cgroup, *from_hcg; + struct mm_struct *mm; + int ret = 0, idx; + + if (hugetlb_cgroup_disabled()) + return 0; + + /* + * Multi-process migrations only happen on the default hierarchy + * where charge immigration is not used. Perform charge + * immigration if @tset contains a leader and whine if there are + * multiple. + */ + p = NULL; + cgroup_taskset_for_each_leader(leader, css, tset) { + WARN_ON_ONCE(p); + p = leader; + h_cgroup = hugetlb_cgroup_from_css(css); + } + if (!p) + return 0; + + from_hcg = hugetlb_cgroup_from_task(p); + VM_BUG_ON(from_hcg == h_cgroup); + + mm = get_task_mm(p); + if (!mm) + return 0; + + VM_BUG_ON(hmc.from); + VM_BUG_ON(hmc.to); + VM_BUG_ON(hmc.mm); + + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) { + VM_BUG_ON(hmc.precharge[idx]); + VM_BUG_ON(hmc.moved_charge[idx]); + } + + hmc.mm = mm; + hmc.from = from_hcg; + hmc.to = h_cgroup; + + ret = hugetlb_cgroup_precharge(mm); + if (ret) + hugetlb_cgroup_clear(); + + return ret; +} + +static void hugetlb_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ + if (hugetlb_cgroup_disabled()) + return; + + if (hmc.to) + hugetlb_cgroup_clear(); +} + /* * Should be called with hugetlb_lock held. * Since we are holding hugetlb_lock, pages cannot get moved from @@ -806,6 +966,8 @@ struct cgroup_subsys hugetlb_cgrp_subsys = { .css_alloc = hugetlb_cgroup_css_alloc, .css_offline = hugetlb_cgroup_css_offline, .css_free = hugetlb_cgroup_css_free, + .can_attach = hugetlb_cgroup_can_attach, + .cancel_attach = hugetlb_cgroup_cancel_attach, .dfl_cftypes = hugetlb_files, .legacy_cftypes = hugetlb_files, }; From patchwork Wed Sep 29 10:19:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 12525255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F31DFC433EF for ; Wed, 29 Sep 2021 10:20:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B4B82613DA for ; Wed, 29 Sep 2021 10:20:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B4B82613DA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8B54A94001C; Wed, 29 Sep 2021 06:20:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57E256B0089; Wed, 29 Sep 2021 06:20:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 444F76B008A; Wed, 29 Sep 2021 06:20:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id 32AF66B0087 for ; Wed, 29 Sep 2021 06:20:30 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D120D348C9 for ; Wed, 29 Sep 2021 10:20:29 +0000 (UTC) X-FDA: 78640216578.27.1EDE6F8 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by imf01.hostedemail.com (Postfix) with ESMTP id 141995067999 for ; Wed, 29 Sep 2021 10:20:28 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0Uq1.Yg-_1632910825; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Uq1.Yg-_1632910825) by smtp.aliyun-inc.com(127.0.0.1); Wed, 29 Sep 2021 18:20:25 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] hugetlb_cgroup: Add post_attach interface for tasks migration Date: Wed, 29 Sep 2021 18:19:28 +0800 Message-Id: <5c9b016a8fced386e85a2198c62314aa3c344101.1632843268.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 141995067999 X-Stat-Signature: jwcywi9nxkd7n54j3xtrpaqg8aaza7yi Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf01.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.56 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com X-HE-Tag: 1632910828-384736 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add post_attach interface to change the page's hugetlb cgroup and uncharge the old hugetlb cgroup when tasks migration finished. Signed-off-by: Baolin Wang --- mm/hugetlb_cgroup.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 2568d0c..bd53d04 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -229,6 +229,7 @@ static void hugetlb_cgroup_clear(void) { struct mm_struct *mm = hmc.mm; struct hugetlb_cgroup *to = hmc.to; + struct hugetlb_cgroup *from = hmc.from; int idx; /* we must uncharge all the leftover precharges from hmc.to */ @@ -242,6 +243,17 @@ static void hugetlb_cgroup_clear(void) hmc.precharge[idx] = 0; } + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) { + if (!hmc.moved_charge[idx]) + continue; + + page_counter_uncharge( + __hugetlb_cgroup_counter_from_cgroup(from, idx, false), + hmc.moved_charge[idx] * pages_per_huge_page(&hstates[idx])); + + hmc.moved_charge[idx] = 0; + } + hmc.from = NULL; hmc.to = NULL; hmc.mm = NULL; @@ -311,6 +323,61 @@ static void hugetlb_cgroup_cancel_attach(struct cgroup_taskset *tset) hugetlb_cgroup_clear(); } +static int hugetlb_cgroup_move_charge_pte_range(pte_t *pte, unsigned long hmask, + unsigned long addr, + unsigned long end, + struct mm_walk *walk) +{ + struct page *page; + spinlock_t *ptl; + pte_t entry; + struct hstate *h = hstate_vma(walk->vma); + + ptl = huge_pte_lock(h, walk->mm, pte); + entry = huge_ptep_get(pte); + /* TODO: only handle present hugetlb pages now. */ + if (!pte_present(entry)) { + spin_unlock(ptl); + return 0; + } + + page = pte_page(entry); + spin_unlock(ptl); + + spin_lock_irq(&hugetlb_lock); + if (hugetlb_cgroup_from_page(page) == hmc.from) { + int idx = hstate_index(h); + + set_hugetlb_cgroup(page, hmc.to); + hmc.precharge[idx]--; + hmc.moved_charge[idx]++; + } + spin_unlock_irq(&hugetlb_lock); + + cond_resched(); + return 0; +} + +static const struct mm_walk_ops hugetlb_charge_walk_ops = { + .hugetlb_entry = hugetlb_cgroup_move_charge_pte_range, +}; + +static void hugetlb_cgroup_move_task(void) +{ + if (hugetlb_cgroup_disabled()) + return; + + if (!hmc.to) + return; + + mmap_read_lock(hmc.mm); + walk_page_range(hmc.mm, 0, hmc.mm->highest_vm_end, + &hugetlb_charge_walk_ops, NULL); + mmap_read_unlock(hmc.mm); + + hugetlb_cgroup_clear(); +} + /* * Should be called with hugetlb_lock held. * Since we are holding hugetlb_lock, pages cannot get moved from @@ -968,6 +1035,7 @@ struct cgroup_subsys hugetlb_cgrp_subsys = { .css_free = hugetlb_cgroup_css_free, .can_attach = hugetlb_cgroup_can_attach, .cancel_attach = hugetlb_cgroup_cancel_attach, + .post_attach = hugetlb_cgroup_move_task, .dfl_cftypes = hugetlb_files, .legacy_cftypes = hugetlb_files, };