From patchwork Thu Sep 16 13:47:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B97A3C433F5 for ; Thu, 16 Sep 2021 13:52:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A88D6103B for ; Thu, 16 Sep 2021 13:52:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3A88D6103B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D65E36B0073; Thu, 16 Sep 2021 09:52:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D157C6B0074; Thu, 16 Sep 2021 09:52:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDC8D6B0075; Thu, 16 Sep 2021 09:52:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id AF9556B0073 for ; Thu, 16 Sep 2021 09:52:15 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6A4432DD67 for ; Thu, 16 Sep 2021 13:52:15 +0000 (UTC) X-FDA: 78593575830.15.96BA67A Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by imf26.hostedemail.com (Postfix) with ESMTP id 2FA5B20019D3 for ; Thu, 16 Sep 2021 13:52:15 +0000 (UTC) Received: by mail-pg1-f182.google.com with SMTP id n18so6181637pgm.12 for ; Thu, 16 Sep 2021 06:52:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=emzGgoQOnssPsBkzbuXL55Ev35q8xqQFOv8WuqwIjZE=; b=UUepn9rHGU0nXbviWK71cXLbiy3WYzBnnpHawMSiaeJZAk0nHTcl/tMUxPuf7W4E0c ZW7Cit53xZBc8ov3qoWPaCC5bRFmCBSL3OIOmd3J7eu4QAwoGVMLu42oOKo6TBRUgmzY Ve3GxknG7fDJ94IZdCDPeZ8GxuOAp+B8k6IsUUZPNwWOgZX6lsQIqMdA+qsHyAJl1w9q AK3pZ9xr+wKMiQ53oyLEGcof87i+LWxB7hqReWzY7ak/3tsZhAkC6hWHOIijfq5d2USK 8Ii8mDcqAw/pdQN9LsEkpFTSPLRtmrMb9q58fxgR5BIWfh6LI70/mdEnDlqLZriMF0vb TXOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=emzGgoQOnssPsBkzbuXL55Ev35q8xqQFOv8WuqwIjZE=; b=OwnYpIlCLHmQbGa9cRuxSuAugGFfBCGC5wPJff1Wr0qTPovSpmPtZA+Q6+3pfbT31B 27TgrViCHfi02vJjlDXeUK7E0BxBuGkVcHhrr5tEgK13LgUVKQzSsN1Ew8Xk0VF4pAVY 2oVvBGoq6OnFDonM6tLhzsQM0eVhPgiFa1g0IXBNFR7ctTn4bJlhvZl2cf5+uoXxz26V i7kjrzvEbR1KZ/dQQfcAAKCoCIs5dLVYWzbqnR0QCUqSls4tZ6PTuHqS0tm0Kb962zlE Wu97qJqY63dD0g5OGUn5cU8ISuRmxefmmilFWvRY4sK1Hqrcj01gd4TcjjssoCISDW/0 54fQ== X-Gm-Message-State: AOAM530qlNM9FI6p6yI/ZU/lr7lW6TuL12nKuv1n3MS/vvjAhdje4EkR NqcKr3IdKLn1TPuZtYsXmC4Mtw== X-Google-Smtp-Source: ABdhPJz+/bQHjro14FEEsTQjuOXbu6Tbg3dK0yLwJP7VQvx5HnZiCjNaoIlN/+bo9hEBsCrcPR5xIA== X-Received: by 2002:a62:ae13:0:b0:440:385c:2ee9 with SMTP id q19-20020a62ae13000000b00440385c2ee9mr5637538pff.39.1631800334233; Thu, 16 Sep 2021 06:52:14 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:13 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 01/13] mm: move mem_cgroup_kmem_disabled() to memcontrol.h Date: Thu, 16 Sep 2021 21:47:36 +0800 Message-Id: <20210916134748.67712-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UUepn9rH; spf=pass (imf26.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.182 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Stat-Signature: 8expzipm6zu3uqqj69oc4dgg1mip49nq X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2FA5B20019D3 X-HE-Tag: 1631800335-325176 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The cgroup_memory_nokmem is already a global variable, it is unnecessary to define a global function of mem_cgroup_kmem_disabled(). Just move it to memcontrol.h and mark inline on it. slab_common.c already includes memcontrol.h, replace cgroup_memory_nokmem with mem_cgroup_kmem_disabled(). Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 8 +++++++- mm/internal.h | 5 ----- mm/memcontrol.c | 5 ----- mm/slab_common.c | 2 +- 4 files changed, 8 insertions(+), 12 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3096c9a0ee01..e194d90aff56 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1639,7 +1639,13 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg, #endif #ifdef CONFIG_MEMCG_KMEM -bool mem_cgroup_kmem_disabled(void); +extern bool cgroup_memory_nokmem; + +static inline bool mem_cgroup_kmem_disabled(void) +{ + return cgroup_memory_nokmem; +} + int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); diff --git a/mm/internal.h b/mm/internal.h index cf3cb933eba3..649684e20fe4 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -116,11 +116,6 @@ extern void putback_lru_page(struct page *page); extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* - * in mm/memcontrol.c: - */ -extern bool cgroup_memory_nokmem; - -/* * in mm/page_alloc.c */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b762215d73eb..528b134ca50c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -264,11 +264,6 @@ struct mem_cgroup *vmpressure_to_memcg(struct vmpressure *vmpr) #ifdef CONFIG_MEMCG_KMEM extern spinlock_t css_set_lock; -bool mem_cgroup_kmem_disabled(void) -{ - return cgroup_memory_nokmem; -} - static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, unsigned int nr_pages); diff --git a/mm/slab_common.c b/mm/slab_common.c index ec2bb0beed75..4c0549e0f349 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -857,7 +857,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) if (type == KMALLOC_RECLAIM) { flags |= SLAB_RECLAIM_ACCOUNT; } else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_CGROUP)) { - if (cgroup_memory_nokmem) { + if (mem_cgroup_kmem_disabled()) { kmalloc_caches[type][idx] = kmalloc_caches[KMALLOC_NORMAL][idx]; return; } From patchwork Thu Sep 16 13:47:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72D7EC4332F for ; Thu, 16 Sep 2021 13:52:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DA1B161056 for ; Thu, 16 Sep 2021 13:52:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DA1B161056 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7981F6B0074; Thu, 16 Sep 2021 09:52:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7476D6B0075; Thu, 16 Sep 2021 09:52:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E8166B0078; Thu, 16 Sep 2021 09:52:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 4FB3E6B0074 for ; Thu, 16 Sep 2021 09:52:22 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 178DA18200A07 for ; Thu, 16 Sep 2021 13:52:22 +0000 (UTC) X-FDA: 78593576124.19.C5B7BD4 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf04.hostedemail.com (Postfix) with ESMTP id BB11350000BC for ; Thu, 16 Sep 2021 13:52:21 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id lb1-20020a17090b4a4100b001993f863df2so4792591pjb.5 for ; Thu, 16 Sep 2021 06:52:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=O+lhnVk7PNaw87qVUGvTWZVe5ibvmpQFNKH+BgTcKz8=; b=EJQB64IheoOUUYV8uBbFxOrZQ0bA2eBWzQUajrW7B2SAQfbPETCpLBQ5pbZRarUg3M MjcRE0VGVU0WyDLRzmsfEdddTI/z+UmqlKicA8hjHRukMTXT/Vu5SB22Z+K2nxQiXZxA ZiREktdWOYMLZ5sPgiFOyQamMRfzA8TiZa14MFdJyfvS4EaDWw0iaEy2pHqaXPbLByYq qLBSJPJHtzxpu00LjkV0zzeOShA6D1GnWTjRW4FyeRO9sI6TEIASEIrSqwLTSjEt4drA g8uTivFdOYSLKF4y26Ek3y0GUnynx6DNGDv6DvTw8uXOirqiTNoKJlA70sMAew+DYaP+ icNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=O+lhnVk7PNaw87qVUGvTWZVe5ibvmpQFNKH+BgTcKz8=; b=1sDLEYcne3936wPDoTVSTBx0RQywnfaFaiOkxz3h/1gHsaaSRUUgbr1naFTyxMsLll QcK9jyQwtPkYZWYskM47nuNO9oa+JIU49TaBlCYlDmTelZZYVq6M+P1LxcYsy4r/Wf7j WvvcUJGjFuv3JAh6/obpSF1ogbS6oO18GMAu5ozXbUwCOaXyOSsmtk4e2r5decK+ZsJq TXzlOJcE82PhlAkYQ5QNMNHChnjQ8EvtEScL2kU6uWpMH/nbMX/HO0/TN3qoscE93Nyy 5dh3UuCXU7eEO35SykzylUCGbwoH7TbDx1UET1Py0GlMGeb8DSrDefiL6RXVsUS7AZPq WzWw== X-Gm-Message-State: AOAM530PGwDxXRI4W4DSS8NnlDqrXyeiiYaNEnTWWX8cjUi9gOIamQ6F ozDNLyw840VOJKTzuMzQtqO1Og== X-Google-Smtp-Source: ABdhPJwlkh8gx/1dLOzmmevDx4a2iG3zshcF6PRapEicFD7Mg4VLMeRDP4oy9Fq9DNOGgCMG97VDqw== X-Received: by 2002:a17:902:9895:b0:13c:94f8:d74b with SMTP id s21-20020a170902989500b0013c94f8d74bmr4935296plp.20.1631800340774; Thu, 16 Sep 2021 06:52:20 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:20 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 02/13] mm: memcontrol: prepare objcg API for non-kmem usage Date: Thu, 16 Sep 2021 21:47:37 +0800 Message-Id: <20210916134748.67712-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: BB11350000BC X-Stat-Signature: 666xut1osy9ybezdmo6mnmjbkdkoiyhb Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=EJQB64Ih; spf=pass (imf04.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800341-939551 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pagecache pages are charged at the allocation time and holding a reference to the original memory cgroup until being reclaimed. Depending on the memory pressure, specific patterns of the page sharing between different cgroups and the cgroup creation and destruction rates, a large number of dying memory cgroups can be pinned by pagecache pages. It makes the page reclaim less efficient and wastes memory. We can convert LRU pages and most other raw memcg pins to the objcg direction to fix this problem, and then the page->memcg will always point to an object cgroup pointer. Therefore, the infrastructure of objcg no longer only serves CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages can reuse it to charge pages. We know that the LRU pages are not accounted at the root level. But the page->memcg_data points to the root_mem_cgroup. So the page->memcg_data of the LRU pages always points to a valid pointer. But the root_mem_cgroup dose not have an object cgroup. If we use obj_cgroup APIs to charge the LRU pages, we should set the page->memcg_data to a root object cgroup. So we also allocate an object cgroup for the root_mem_cgroup. As roman said "we might wanna to eliminate CONFIG_MEMCG_KMEM completely", so we do not add new dependencies. Signed-off-by: Muchun Song Reported-by: kernel test robot --- include/linux/memcontrol.h | 2 +- mm/memcontrol.c | 60 +++++++++++++++++++++++++++------------------- 2 files changed, 37 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e194d90aff56..490d4849a05a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -319,9 +319,9 @@ struct mem_cgroup { #ifdef CONFIG_MEMCG_KMEM int kmemcg_id; enum memcg_kmem_state kmem_state; +#endif struct obj_cgroup __rcu *objcg; struct list_head objcg_list; /* list of inherited objcgs */ -#endif MEMCG_PADDING(_pad2_); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 528b134ca50c..f58010cd8414 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -261,18 +261,15 @@ struct mem_cgroup *vmpressure_to_memcg(struct vmpressure *vmpr) return container_of(vmpr, struct mem_cgroup, vmpressure); } -#ifdef CONFIG_MEMCG_KMEM extern spinlock_t css_set_lock; static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, unsigned int nr_pages); -static void obj_cgroup_release(struct percpu_ref *ref) +static void obj_cgroup_release_bytes(struct obj_cgroup *objcg) { - struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); unsigned int nr_bytes; unsigned int nr_pages; - unsigned long flags; /* * At this point all allocated objects are freed, and @@ -286,9 +283,9 @@ static void obj_cgroup_release(struct percpu_ref *ref) * 3) CPU1: a process from another memcg is allocating something, * the stock if flushed, * objcg->nr_charged_bytes = PAGE_SIZE - 92 - * 5) CPU0: we do release this object, + * 4) CPU0: we do release this object, * 92 bytes are added to stock->nr_bytes - * 6) CPU0: stock is flushed, + * 5) CPU0: stock is flushed, * 92 bytes are added to objcg->nr_charged_bytes * * In the result, nr_charged_bytes == PAGE_SIZE. @@ -300,6 +297,14 @@ static void obj_cgroup_release(struct percpu_ref *ref) if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); +} + +static void obj_cgroup_release(struct percpu_ref *ref) +{ + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); + unsigned long flags; + + obj_cgroup_release_bytes(objcg); spin_lock_irqsave(&css_set_lock, flags); list_del(&objcg->list); @@ -328,10 +333,14 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent; + + parent = parent_mem_cgroup(memcg); + if (!parent) + parent = root_mem_cgroup; objcg = rcu_replace_pointer(memcg->objcg, NULL, true); @@ -350,6 +359,7 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg, percpu_ref_kill(&objcg->refcnt); } +#ifdef CONFIG_MEMCG_KMEM /* * This will be used as a shrinker list's index. * The main reason for not using cgroup id for this: @@ -3587,7 +3597,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; int memcg_id; if (cgroup_memory_nokmem) @@ -3600,14 +3609,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) if (memcg_id < 0) return memcg_id; - objcg = obj_cgroup_alloc(); - if (!objcg) { - memcg_free_cache_id(memcg_id); - return -ENOMEM; - } - objcg->memcg = memcg; - rcu_assign_pointer(memcg->objcg, objcg); - static_branch_enable(&memcg_kmem_enabled_key); memcg->kmemcg_id = memcg_id; @@ -3631,8 +3632,6 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) if (!parent) parent = root_mem_cgroup; - memcg_reparent_objcgs(memcg, parent); - kmemcg_id = memcg->kmemcg_id; BUG_ON(kmemcg_id < 0); @@ -5159,8 +5158,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->socket_pressure = jiffies; #ifdef CONFIG_MEMCG_KMEM memcg->kmemcg_id = -1; - INIT_LIST_HEAD(&memcg->objcg_list); #endif + INIT_LIST_HEAD(&memcg->objcg_list); #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) @@ -5232,16 +5231,22 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct obj_cgroup *objcg; /* * A memcg must be visible for expand_shrinker_info() * by the time the maps are allocated. So, we allocate maps * here, when for_each_mem_cgroup() can't skip it. */ - if (alloc_shrinker_info(memcg)) { - mem_cgroup_id_remove(memcg); - return -ENOMEM; - } + if (alloc_shrinker_info(memcg)) + goto remove_id; + + objcg = obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg = memcg; + rcu_assign_pointer(memcg->objcg, objcg); /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); @@ -5251,6 +5256,12 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); return 0; + +free_shrinker: + free_shrinker_info(memcg); +remove_id: + mem_cgroup_id_remove(memcg); + return -ENOMEM; } static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) @@ -5274,6 +5285,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) page_counter_set_low(&memcg->memory, 0); memcg_offline_kmem(memcg); + memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); From patchwork Thu Sep 16 13:47:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2DC4C433EF for ; Thu, 16 Sep 2021 13:52:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 74DF861056 for ; Thu, 16 Sep 2021 13:52:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 74DF861056 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 17D966B0071; Thu, 16 Sep 2021 09:52:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12E4C6B0075; Thu, 16 Sep 2021 09:52:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1067900002; Thu, 16 Sep 2021 09:52:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id E45566B0071 for ; Thu, 16 Sep 2021 09:52:27 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AE6AD8248D7C for ; Thu, 16 Sep 2021 13:52:27 +0000 (UTC) X-FDA: 78593576334.32.0F0EB7F Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf18.hostedemail.com (Postfix) with ESMTP id 75AF84002098 for ; Thu, 16 Sep 2021 13:52:27 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id mv7-20020a17090b198700b0019c843e7233so1072412pjb.4 for ; Thu, 16 Sep 2021 06:52:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=os6W81L8Jp53y3z498y4ntoeCrrr5kt+lNChMkb1a3E=; b=cFoQbgMUuO71GUUl31WSwW9Dbd6Ng9hWx2q8bhXjtr2f6iVWCmSB4vMoHCnSRtKcgL w0T97hle6ch5Nk6/EEndFfacy14u+8F/bJCRXxT3WNWbZn4LjKOhnUBqV6YFkwwpYppD BwltsgwY0L6sqKNzKVT1mwVq7Ft69f1x1Pw5HxaopGobPq+AVEF4WzSOEF8cA9bYtD34 KOmMg+k9cvV1uDnKDWLdvHTqVliBAOjPC5XeiNFTYO/tB8SZIsfwQldVc7iX5e1F1RuH LxJot3R06na+lEc/QyRd75fmonwEGERx+Jj1AIpHsbWDBDR126XpyfTziZSuAQwSxnXF 3zvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=os6W81L8Jp53y3z498y4ntoeCrrr5kt+lNChMkb1a3E=; b=ZL8DELdsAR/bDiJ1yAOY3SH0qlcTUV1LbVUllpVoJwhp38TNXSIc2AVimLAnW37IDm cC1t6OU1jCLtLvF4EajVFZxQcRfJ3xdxgItU5nTB1Oeq8zMcV/36dCfqp55uZU27Hvy3 1SJuAfLrDhyUN4iMHujVVfyKowL80svQr4hV1nzD9hW9lpUnGxvPQDapWi4fOpxBVX1m we97Y51uZSEpnm3cSwR29zE8h8I/on1x4VMh6p/0im7+KgUp1GGjr0rBhofC5PJmX798 1hJzwkyR61/Zd/jf1d+eAvMtO6m+nbvI/d39Y3rEWW2dPsC/pJTSYG736c6TGQHlaz/G UaFw== X-Gm-Message-State: AOAM530UZPYQQEP44dboZswmyoq9lrBV52I32gd4dy/mECCFRugsmt7f bMUweIueScM2+fnj1dg4h5Sa5w== X-Google-Smtp-Source: ABdhPJy9fa7IUXtp12uZvQnAxns54IHKulitYG16Aq9stdgfxKixbhCQfcEEbJ0+OtWXqL8ifok1yQ== X-Received: by 2002:a17:90a:5513:: with SMTP id b19mr15279213pji.16.1631800346633; Thu, 16 Sep 2021 06:52:26 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:26 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 03/13] mm: memcontrol: introduce compact_lock_page_irqsave Date: Thu, 16 Sep 2021 21:47:38 +0800 Message-Id: <20210916134748.67712-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 75AF84002098 X-Stat-Signature: oc9aohuc1b5cxshapc8q36ue4pfhwaia Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=cFoQbgMU; spf=pass (imf18.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800347-812740 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If we reuse the objcg APIs to charge LRU pages, the page_memcg() can be changed when the LRU pages reparented. In this case, we need to acquire the new lruvec lock. lruvec = mem_cgroup_page_lruvec(page); // The page is reparented. compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); // Acquired the wrong lruvec lock and need to retry. But compact_lock_irqsave() only take lruvec lock as the parameter, we cannot aware this change. If it can take the page as parameter to acquire the lruvec lock. When the page memcg is changed, we can use the page_memcg() detect whether we need to reacquire the new lruvec lock. So compact_lock_irqsave() is not suitable for us. Similar to lock_page_lruvec_irqsave(), introduce compact_lock_page_irqsave() to acquire the lruvec lock in the compaction routine. Signed-off-by: Muchun Song Acked-by: Roman Gushchin --- mm/compaction.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index bfc93da1c2c7..bf1a6048b5a3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -509,6 +509,29 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } +static struct lruvec *compact_lock_page_irqsave(struct page *page, + unsigned long *flags, + struct compact_control *cc) +{ + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page); + + /* Track if the lock is contended in async mode */ + if (cc->mode == MIGRATE_ASYNC && !cc->contended) { + if (spin_trylock_irqsave(&lruvec->lru_lock, *flags)) + goto out; + + cc->contended = true; + } + + spin_lock_irqsave(&lruvec->lru_lock, *flags); +out: + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avoid @@ -1029,11 +1052,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (locked) unlock_page_lruvec_irqrestore(locked, flags); - compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + lruvec = compact_lock_page_irqsave(page, &flags, cc); locked = lruvec; - lruvec_memcg_debug(lruvec, page); - /* Try get exclusive access under lock */ if (!skip_updated) { skip_updated = true; From patchwork Thu Sep 16 13:47:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A06B6C433F5 for ; Thu, 16 Sep 2021 13:52:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E54B61056 for ; Thu, 16 Sep 2021 13:52:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4E54B61056 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DB3386B0075; Thu, 16 Sep 2021 09:52:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6355900002; Thu, 16 Sep 2021 09:52:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2ADB6B007B; Thu, 16 Sep 2021 09:52:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id B52AE6B0075 for ; Thu, 16 Sep 2021 09:52:33 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 664E72DEB3 for ; Thu, 16 Sep 2021 13:52:33 +0000 (UTC) X-FDA: 78593576586.33.DB85F0F Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf23.hostedemail.com (Postfix) with ESMTP id 2FBBC90000A9 for ; Thu, 16 Sep 2021 13:52:33 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id w19-20020a17090aaf9300b00191e6d10a19so4827451pjq.1 for ; Thu, 16 Sep 2021 06:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IjaeLDkNdZhTPTv/gmNAvH3aDq6cgB+MWQai9kSiTfY=; b=20ulyW5hYFscFP3he09U5EbDzlS3DZ57PB4D34MDxEnYh/v3nq61tuhu9r9+JzjwBo 1QzZbxIgxk4v0xKqHFXLw6kKhcn+j68n5xb67OGAQ/DHDBZSY9dZdmIsgxRVRovFiIkO 7LFgP0Ios8fPmDKXNGHp5l+8PPgXqtg8/wF62qYDUa1sTQVnZZKZgV0WrGzb2RM6RSd6 0ofSp6KEdF4ie0GW0JwQlL0k2SB5baqSkL4o5vHnKAGNsfGGCekRDC/cG4xzI16LVPiy 2b0kcsC9Cj5LNP7AqDrv7KVpYdtBFOms3xjajWUaxR2uqQy23V1/iHgIrc9uqsMzynL9 Zshg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IjaeLDkNdZhTPTv/gmNAvH3aDq6cgB+MWQai9kSiTfY=; b=5BKj7FEEb51/EZhAov0NPsArxBayylwypyIa1tfNwZdRKWarX+KRds/5bkyCoD+6FB mJNbNaofSBPx/Iiit1IkMcKNxrQ7wxRmRqYlo8hvDMaW4x//fKO7x1HiKPPGhHSy271y OQgkshqRUXJq7IV8zz3JMKc4+vBpEVOIqk75Bki8JFeGd2nqYOiYLQxXxksqsEGaw7EF YWXrprSLihjDEHbdg6/66DHOBTIhJdymvccYISfNAfJPY93yTWeyCJYcClNq/i5w0ilR 2hqtWMRiDuSPMZCDZBB8xUZ1Xoukl35N27+gf9pENa1gxJbgUfzwYR1SterEdVPOZD6W xdZQ== X-Gm-Message-State: AOAM532GbovDwWxYg7JVrkCzMneDm+4rByDY/Gx8QNOCT0uBoCE99FAq VF3UjpAG/nAUnHEOY/V84WQ7XQ== X-Google-Smtp-Source: ABdhPJybFFN5VDhv3mRTlB6DTZDsGPCl5X/CVrVIr+yT5LM/yviSYZF7nmYNrXJrLtBEb2Sn4DqYpA== X-Received: by 2002:a17:902:9a04:b0:13a:1b2d:8a5c with SMTP id v4-20020a1709029a0400b0013a1b2d8a5cmr4945875plp.47.1631800352300; Thu, 16 Sep 2021 06:52:32 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.26 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:32 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 04/13] mm: memcontrol: make lruvec lock safe when the LRU pages reparented Date: Thu, 16 Sep 2021 21:47:39 +0800 Message-Id: <20210916134748.67712-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=20ulyW5h; spf=pass (imf23.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2FBBC90000A9 X-Stat-Signature: 9irffwq5d3cspcmnkbtn91aes7xfuker X-HE-Tag: 1631800353-624286 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The diagram below shows how to make the page lruvec lock safe when the LRU pages reparented. lock_page_lruvec(page) retry: lruvec = mem_cgroup_page_lruvec(page); // The page is reparented at this time. spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) // Acquired the wrong lruvec lock and need to retry. // Because this page is on the parent memcg lruvec list. goto retry; // If we reach here, it means that page_memcg(page) is stable. memcg_reparent_objcgs(memcg) // lruvec belongs to memcg and lruvec_parent belongs to parent memcg. spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); // Move all the pages from the lruvec list to the parent lruvec list. spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); After we acquire the lruvec lock, we need to check whether the page is reparented. If so, we need to reacquire the new lruvec lock. On the routine of the LRU pages reparenting, we will also acquire the lruvec lock (Will be implemented in the later patch). So page_memcg() cannot be changed when we hold the lruvec lock. Since lruvec_memcg(lruvec) is always equal to page_memcg(page) after we hold the lruvec lock, lruvec_memcg_debug() check is pointless. So remove it. This is a preparation for reparenting the LRU pages. Signed-off-by: Muchun Song Acked-by: Roman Gushchin --- include/linux/memcontrol.h | 16 +++------------ mm/compaction.c | 10 +++++++++- mm/memcontrol.c | 50 +++++++++++++++++++++++++++------------------- mm/swap.c | 5 +++++ 4 files changed, 47 insertions(+), 34 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 490d4849a05a..6c2cb076c1a4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -756,7 +756,9 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page * @page: the page * - * This function relies on page->mem_cgroup being stable. + * The lruvec can be changed to its parent lruvec when the page reparented. + * The caller need to recheck if it cares about this change (just like + * lock_page_lruvec() does). */ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page) { @@ -776,14 +778,6 @@ struct lruvec *lock_page_lruvec_irq(struct page *page); struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags); -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page); -#else -static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1220,10 +1214,6 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page) return &pgdat->__lruvec; } -static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return NULL; diff --git a/mm/compaction.c b/mm/compaction.c index bf1a6048b5a3..c4ba41de8591 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -515,6 +515,8 @@ static struct lruvec *compact_lock_page_irqsave(struct page *page, { struct lruvec *lruvec; + rcu_read_lock(); +retry: lruvec = mem_cgroup_page_lruvec(page); /* Track if the lock is contended in async mode */ @@ -527,7 +529,13 @@ static struct lruvec *compact_lock_page_irqsave(struct page *page, spin_lock_irqsave(&lruvec->lru_lock, *flags); out: - lruvec_memcg_debug(lruvec, page); + if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + /* See the comments in lock_page_lruvec(). */ + rcu_read_unlock(); return lruvec; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f58010cd8414..a57cce0ea24b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1158,23 +1158,6 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, return ret; } -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg = page_memcg(page); - - if (!memcg) - VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != root_mem_cgroup, page); - else - VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != memcg, page); -} -#endif - /** * lock_page_lruvec - lock and return lruvec for a given page. * @page: the page @@ -1189,10 +1172,21 @@ struct lruvec *lock_page_lruvec(struct page *page) { struct lruvec *lruvec; + rcu_read_lock(); +retry: lruvec = mem_cgroup_page_lruvec(page); spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, page); + if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } + + /* + * Preemption is disabled in the internal of spin_lock, which can serve + * as RCU read-side critical sections. + */ + rcu_read_unlock(); return lruvec; } @@ -1201,10 +1195,18 @@ struct lruvec *lock_page_lruvec_irq(struct page *page) { struct lruvec *lruvec; + rcu_read_lock(); +retry: lruvec = mem_cgroup_page_lruvec(page); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, page); + if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } + + /* See the comments in lock_page_lruvec(). */ + rcu_read_unlock(); return lruvec; } @@ -1213,10 +1215,18 @@ struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) { struct lruvec *lruvec; + rcu_read_lock(); +retry: lruvec = mem_cgroup_page_lruvec(page); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, page); + if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + /* See the comments in lock_page_lruvec(). */ + rcu_read_unlock(); return lruvec; } diff --git a/mm/swap.c b/mm/swap.c index 897200d27dd0..18d44f978b2e 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -291,6 +291,11 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) void lru_note_cost_page(struct page *page) { + /* + * The rcu read lock is held by the caller, so we do not need to + * care about the lruvec returned by mem_cgroup_page_lruvec() being + * released. + */ lru_note_cost(mem_cgroup_page_lruvec(page), page_is_file_lru(page), thp_nr_pages(page)); } From patchwork Thu Sep 16 13:47:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF1ACC433F5 for ; Thu, 16 Sep 2021 13:52:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 677686103B for ; Thu, 16 Sep 2021 13:52:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 677686103B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 0453A6B0078; Thu, 16 Sep 2021 09:52:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F36986B007B; Thu, 16 Sep 2021 09:52:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB0DE6B007D; Thu, 16 Sep 2021 09:52:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id CD2D76B0078 for ; Thu, 16 Sep 2021 09:52:39 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7AEF18248D7C for ; Thu, 16 Sep 2021 13:52:39 +0000 (UTC) X-FDA: 78593576838.28.9FE488C Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf11.hostedemail.com (Postfix) with ESMTP id 30BF6F000204 for ; Thu, 16 Sep 2021 13:52:39 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id y8so5929087pfa.7 for ; Thu, 16 Sep 2021 06:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UBgSt1PKBh2hsHiY9dm9Ju+xM0YuuTYqFWANnDJd04w=; b=sC4DwSOmtG8p/DcjAo+DdQZMblnuKki6ByjoVl8FPu4OxP4m38XeeVz/I98gt4penv sBgbheYhFgFUZ9FVctIio08J/ADuMYA9Muk4DT+R7ztXlUvJvIYKXS+ROs3Rjtl+unnz PDLzeJJIGwZ0RQ2xVOvIB9pqICSvqhZ3YfAZ6txMTp1qDwOgVKmcHp8xSGUoFd7+3yl3 P6SsyV+E3q9QYaiwUVQE5lxyaeF0V6zR337YJxi0o8okqwaMQTo/05Ay05X9OD495ZVG kmdZneZgmlxmmSvNZ3bryFTcEZWlTnOB026BN6qgP1hGLOlvanybZLOXcgPIdNznjje2 tZ7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UBgSt1PKBh2hsHiY9dm9Ju+xM0YuuTYqFWANnDJd04w=; b=hmFoowbTY6+NW8A3uFi3zyxkYjxT5Hk1OteGs75z1kS54hwSdMdfPjHK1Eko/ed7OO X7I6VnwsD+OM3wmehjEJHJEM1eHEQ//TV0ygmh8ZxKpL16JSwaZV16HMIoheJGbPuOD5 VJfKYrwoB1NKPcskCk+SyIWUNTym8qRX5XnmrZPkedlEmcUPn8bg9CICi2hNTWh+GmwR J438dPXq9gSKQeKInSqBoNIbWxlIRTq6Tak6Dxj5byXqzCbUvuzeuL6j+04oXF7zz1wq wmGQCDMZnG7+R+kjhd6Iy+eNdylPgV7/UGDnK70u/rqSGiUEb+YxhPIAp5XTJdP+K2AI gd+A== X-Gm-Message-State: AOAM531yizshBZI7Ulb5lXldnhlhqnXb+g53+RNoOkPXHsh6QSVVv++r UVWzPwUDhwUjPLxDceO2vAbUJN5+u7oc6g== X-Google-Smtp-Source: ABdhPJwbQyjl6TV4cG+e5RDBIXtNXKmxSSdP72Y+ui6oM1/96GxVXfFHBvyCjYIGLmTOMdog/1SA9w== X-Received: by 2002:a62:2703:0:b0:42b:5319:cbbc with SMTP id n3-20020a622703000000b0042b5319cbbcmr5612530pfn.66.1631800358312; Thu, 16 Sep 2021 06:52:38 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:37 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 05/13] mm: vmscan: rework move_pages_to_lru() Date: Thu, 16 Sep 2021 21:47:40 +0800 Message-Id: <20210916134748.67712-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: 3xpbd3p6ixuoqgkqec1p9sf9i6r8pefr Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=sC4DwSOm; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf11.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 30BF6F000204 X-HE-Tag: 1631800359-227350 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the later patch, we will reparent the LRU pages. The pages which will move to appropriate LRU list can be reparented during the process of the move_pages_to_lru(). So holding a lruvec lock by the caller is wrong, we should use the more general interface of relock_page_lruvec_irq() to acquire the correct lruvec lock. Signed-off-by: Muchun Song --- mm/vmscan.c | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 74296c2d1fed..6878a6bff2f8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2149,23 +2149,27 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, * move_pages_to_lru() moves pages from private @list to appropriate LRU list. * On return, @list is reused as a list of pages to be freed by the caller. * - * Returns the number of pages moved to the given lruvec. + * Returns the number of pages moved to the appropriate LRU list. + * + * Note: The caller must not hold any lruvec lock. */ -static unsigned int move_pages_to_lru(struct lruvec *lruvec, - struct list_head *list) +static unsigned int move_pages_to_lru(struct list_head *list) { - int nr_pages, nr_moved = 0; + int nr_moved = 0; + struct lruvec *lruvec = NULL; LIST_HEAD(pages_to_free); - struct page *page; while (!list_empty(list)) { - page = lru_to_page(list); + int nr_pages; + struct page *page = lru_to_page(list); + + lruvec = relock_page_lruvec_irq(page, lruvec); VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&lruvec->lru_lock); + unlock_page_lruvec_irq(lruvec); putback_lru_page(page); - spin_lock_irq(&lruvec->lru_lock); + lruvec = NULL; continue; } @@ -2186,19 +2190,15 @@ static unsigned int move_pages_to_lru(struct lruvec *lruvec, __clear_page_lru_flags(page); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&lruvec->lru_lock); + unlock_page_lruvec_irq(lruvec); destroy_compound_page(page); - spin_lock_irq(&lruvec->lru_lock); + lruvec = NULL; } else list_add(&page->lru, &pages_to_free); continue; } - /* - * All pages were isolated from the same lruvec (and isolation - * inhibits memcg migration). - */ VM_BUG_ON_PAGE(!page_matches_lruvec(page, lruvec), page); add_page_to_lru_list(page, lruvec); nr_pages = thp_nr_pages(page); @@ -2207,6 +2207,8 @@ static unsigned int move_pages_to_lru(struct lruvec *lruvec, workingset_age_nonresident(lruvec, nr_pages); } + if (lruvec) + unlock_page_lruvec_irq(lruvec); /* * To save our caller's stack, now use input list for pages to free. */ @@ -2280,16 +2282,16 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, &stat, false); - spin_lock_irq(&lruvec->lru_lock); - move_pages_to_lru(lruvec, &page_list); + move_pages_to_lru(&page_list); + local_irq_disable(); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&lruvec->lru_lock); + local_irq_enable(); lru_note_cost(lruvec, file, stat.nr_pageout); mem_cgroup_uncharge_list(&page_list); @@ -2416,18 +2418,16 @@ static void shrink_active_list(unsigned long nr_to_scan, /* * Move pages back to the lru list. */ - spin_lock_irq(&lruvec->lru_lock); - - nr_activate = move_pages_to_lru(lruvec, &l_active); - nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); + nr_activate = move_pages_to_lru(&l_active); + nr_deactivate = move_pages_to_lru(&l_inactive); /* Keep all free pages in l_active list */ list_splice(&l_inactive, &l_active); + local_irq_disable(); __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&lruvec->lru_lock); + local_irq_enable(); mem_cgroup_uncharge_list(&l_active); free_unref_page_list(&l_active); From patchwork Thu Sep 16 13:47:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96417C433EF for ; Thu, 16 Sep 2021 13:52:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4880F60EB4 for ; Thu, 16 Sep 2021 13:52:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4880F60EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E4EF76B007B; Thu, 16 Sep 2021 09:52:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFDF76B007D; Thu, 16 Sep 2021 09:52:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9F2F6B007E; Thu, 16 Sep 2021 09:52:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id BBD0B6B007B for ; Thu, 16 Sep 2021 09:52:45 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7DBE6267D7 for ; Thu, 16 Sep 2021 13:52:45 +0000 (UTC) X-FDA: 78593577090.29.DDB9FF7 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf04.hostedemail.com (Postfix) with ESMTP id 383E350000B3 for ; Thu, 16 Sep 2021 13:52:45 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id j6so5981334pfa.4 for ; Thu, 16 Sep 2021 06:52:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Czw+lJoI8c+vqpLarmySj3yLnTuDl0dY5o95pqOIb5c=; b=ZgWK155sZkXvFxsdPVc9idbYRDUwcU0WSFJ7BHU4VE5Qi6Ormq9uEOWap7v+NHFnyz DiMm0dZqdTbYfmWz6m0Ux0MsIiyl7kOvaheG+HLrfxHKYEkDRIn7S5Zj4eGGfFke4zug O4l4wY1TpAcuGNCjqXEQnhjHVrgdAEw5PRK1zXJifY0hSOphTKYuX85mhkp3yTT0BVHq AFD5Bgb2XpK4E7T3X/zsUkFze5d5kmqo3/6dP3K3GSrlYElplKsBMF+ClyZrOdKTGQKa 3dzi/NtNJE0pucQigJVwo6dEZhqZDrkW/fwoOnBnaYMQNdw3OUy7mXkJP5npnJtoxlcf PfQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Czw+lJoI8c+vqpLarmySj3yLnTuDl0dY5o95pqOIb5c=; b=ULesY9Zd09xRSgTnHsI13q7f5/ZhoLKDITnCmJu+xjhk4IVZq5yu4l+prt1m56vWx6 6SKEkJ+MRyf+0XwUl3Nse3jH2vc0VjFYke6AVB3SuBwuytW6Y0KptdyBGKSkyDrrqkSy CDSYTUOFlUnj1gnNm3oIP11MzxpDitEJ7fODKJ4YqIxp/FeV5t/epZEvkguR4XR8kYjy 9RZ22+fgU2XnjKRgm8gXys29ajVMvDJfwnnZbTHlA0AW4Te6vZF4tnbKFE+S/Y079OaD BsclFmIqiWj5tIJlvA7BMhEkVIrCRTpA2Xm8e63duNFIxJzQV3ROAnohfESodr9N0EO9 5ifg== X-Gm-Message-State: AOAM533JA5YdvkN33qgRltkT4qtq9AZUrp5JkOFeZavrS9eZIcsTIgwN WouF/+qCASA8MNv24xTEGst3WA== X-Google-Smtp-Source: ABdhPJxjMl5ioCWxKY3IDwBaGXdistBIZIAxOdUzGGox92RGW/Jf0bHF4//MwnZaB7sanWqnuQQ77Q== X-Received: by 2002:a63:f050:: with SMTP id s16mr5058345pgj.258.1631800364394; Thu, 16 Sep 2021 06:52:44 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:44 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 06/13] mm: thp: introduce split_queue_lock/unlock{_irqsave}() Date: Thu, 16 Sep 2021 21:47:41 +0800 Message-Id: <20210916134748.67712-7-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 383E350000B3 X-Stat-Signature: 68ak4o6joty4bddpjcipmdrfr9i6qg1u Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=ZgWK155s; spf=pass (imf04.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800365-240014 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We should make thp deferred split queue lock safe when LRU pages reparented. Similar to lock_page_lruvec{_irqsave, _irq}(), we introduce split_queue_lock/unlock{_irqsave}() to make the deferred split queue lock easier to be reparented. And in the next patch, we can use a similar approach (just like lruvec lock did) to make thp deferred split queue lock safe when the LRU pages reparented. Signed-off-by: Muchun Song Reported-by: kernel test robot Reported-by: kernel test robot --- mm/huge_memory.c | 90 +++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 67 insertions(+), 23 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5e9ef0fc261e..9d8dfa82991a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -499,25 +499,70 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) } #ifdef CONFIG_MEMCG -static inline struct deferred_split *get_deferred_split_queue(struct page *page) +static inline struct mem_cgroup *split_queue_memcg(struct deferred_split *queue) { - struct mem_cgroup *memcg = page_memcg(compound_head(page)); - struct pglist_data *pgdat = NODE_DATA(page_to_nid(page)); + if (mem_cgroup_disabled()) + return NULL; + return container_of(queue, struct mem_cgroup, deferred_split_queue); +} - if (memcg) - return &memcg->deferred_split_queue; - else - return &pgdat->deferred_split_queue; +static inline struct deferred_split *page_memcg_split_queue(struct page *head) +{ + struct mem_cgroup *memcg = page_memcg(head); + + return memcg ? &memcg->deferred_split_queue : NULL; } #else -static inline struct deferred_split *get_deferred_split_queue(struct page *page) ++static inline struct mem_cgroup *split_queue_memcg(struct deferred_split *queue) { - struct pglist_data *pgdat = NODE_DATA(page_to_nid(page)); + return NULL; +} - return &pgdat->deferred_split_queue; +static inline struct deferred_split *page_memcg_split_queue(struct page *head) +{ + return NULL; } #endif +static struct deferred_split *page_split_queue(struct page *head) +{ + struct deferred_split *queue = page_memcg_split_queue(head); + + return queue ? : &NODE_DATA(page_to_nid(head))->deferred_split_queue; +} + +static struct deferred_split *split_queue_lock(struct page *head) +{ + struct deferred_split *queue; + + queue = page_split_queue(head); + spin_lock(&queue->split_queue_lock); + + return queue; +} + +static struct deferred_split * +split_queue_lock_irqsave(struct page *head, unsigned long *flags) +{ + struct deferred_split *queue; + + queue = page_split_queue(head); + spin_lock_irqsave(&queue->split_queue_lock, *flags); + + return queue; +} + +static inline void split_queue_unlock(struct deferred_split *queue) +{ + spin_unlock(&queue->split_queue_lock); +} + +static inline void split_queue_unlock_irqrestore(struct deferred_split *queue, + unsigned long flags) +{ + spin_unlock_irqrestore(&queue->split_queue_lock, flags); +} + void prep_transhuge_page(struct page *page) { /* @@ -2610,7 +2655,7 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) int split_huge_page_to_list(struct page *page, struct list_head *list) { struct page *head = compound_head(page); - struct deferred_split *ds_queue = get_deferred_split_queue(head); + struct deferred_split *ds_queue; struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; int extra_pins, ret; @@ -2690,13 +2735,13 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) } /* Prevent deferred_split_scan() touching ->_refcount */ - spin_lock(&ds_queue->split_queue_lock); + ds_queue = split_queue_lock(head); if (page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; list_del(page_deferred_list(head)); } - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); if (mapping) { int nr = thp_nr_pages(head); @@ -2711,7 +2756,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __split_huge_page(page, list, end); ret = 0; } else { - spin_unlock(&ds_queue->split_queue_lock); + split_queue_unlock(ds_queue); fail: if (mapping) xa_unlock(&mapping->i_pages); @@ -2734,24 +2779,22 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) void free_transhuge_page(struct page *page) { - struct deferred_split *ds_queue = get_deferred_split_queue(page); + struct deferred_split *ds_queue; unsigned long flags; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue = split_queue_lock_irqsave(page, &flags); if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; list_del(page_deferred_list(page)); } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); free_compound_page(page); } void deferred_split_huge_page(struct page *page) { - struct deferred_split *ds_queue = get_deferred_split_queue(page); -#ifdef CONFIG_MEMCG - struct mem_cgroup *memcg = page_memcg(compound_head(page)); -#endif + struct deferred_split *ds_queue; + struct mem_cgroup __maybe_unused *memcg; unsigned long flags; VM_BUG_ON_PAGE(!PageTransHuge(page), page); @@ -2769,7 +2812,8 @@ void deferred_split_huge_page(struct page *page) if (PageSwapCache(page)) return; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + ds_queue = split_queue_lock_irqsave(page, &flags); + memcg = split_queue_memcg(ds_queue); if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE); list_add_tail(page_deferred_list(page), &ds_queue->split_queue); @@ -2780,7 +2824,7 @@ void deferred_split_huge_page(struct page *page) deferred_split_shrinker.id); #endif } - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + split_queue_unlock_irqrestore(ds_queue, flags); } static unsigned long deferred_split_count(struct shrinker *shrink, From patchwork Thu Sep 16 13:47:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D8DEC433EF for ; Thu, 16 Sep 2021 13:52:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4081A60EB4 for ; Thu, 16 Sep 2021 13:52:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4081A60EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D64016B007D; Thu, 16 Sep 2021 09:52:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D13AE6B007E; Thu, 16 Sep 2021 09:52:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDCCD6B0080; Thu, 16 Sep 2021 09:52:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id AEFB36B007D for ; Thu, 16 Sep 2021 09:52:51 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 72157182811CA for ; Thu, 16 Sep 2021 13:52:51 +0000 (UTC) X-FDA: 78593577342.17.D6FC6CE Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 30CEC30000AC for ; Thu, 16 Sep 2021 13:52:51 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id y8so5929558pfa.7 for ; Thu, 16 Sep 2021 06:52:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xqRNOri396qVGI8Drv9BL2167qBW7PZton2b/SSrCrU=; b=2kJaAcBoYzi6fqMoyRdw1+AjsWMsTyGBQy7Hm71Due/Dq+2fhnE+Tsm9dpkwmuKRBh JMxcMEQmjWJfV81NQNe0RR++yf/2ED9Ng8Oqo1vcyLxobv0UXI+/ERUtF3d1VaJH772L AskxBx0CSNYzk5s6qYuFljJL9swhxrugGy/xPy4u0UgTxcbq9nXOedVcgWkFdnT84TGu RFK68oGYvbh0QAM/b/DrhgDqBIwwfI5nOtihvToD8ikCGQmmksCkIFkXrdNRQjQeSROT HCWR/m2E/xZQ8J91hVD+v9R8MXiusoDjrI1zfCPzqb/XCzmzYBgHWzAjwJgrocJKwrtD muJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xqRNOri396qVGI8Drv9BL2167qBW7PZton2b/SSrCrU=; b=KsQJQU9UgRCnvMaDwrFWbjtWhsoxSR2YhQw/X5mGwpJHCjfKsl7L74KtB5qZi7FEwA MGQmgWcwMxNaOtQIT0BMtfnEkBgQPlSRmqEFBmBv+sI5g4v5xiF2u2k2w1UYQEcnJpLp PnHbO7xzhiykaAleqTeIKkcwTLbgrHfUqZNiVzMEQwiEdnLnxsK5yTSjDpuyI+fgDA9Q 3EYNYLl6d1xa0w8XHwRBkdosZG9VISiKNRAtBPNHyiKtXZEJlmnjCaAl8n53fNsLd9pz urb1P1+++jgHHM8Gy6ek3wwd/0e+8cvT3yv6zE1IukTFF7DnJ8ag85zXWQZ2YBcjqsW1 O2fw== X-Gm-Message-State: AOAM530JaSW9+CRgNp3ReNsSX0myzPq36+DSdqWePVJMCG+av3aK2X2E OV1aA9Gwzy38DihSvBuaZ77Ozw== X-Google-Smtp-Source: ABdhPJzAK19Z2U3CeVnflfe4LQQJJPRoJrF0z9YnW5hMuY00wkZvM9UnIjFhxbQu5cpipON9b6hLTw== X-Received: by 2002:a62:dd94:0:b0:442:bb03:9663 with SMTP id w142-20020a62dd94000000b00442bb039663mr2523613pff.0.1631800370277; Thu, 16 Sep 2021 06:52:50 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:49 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 07/13] mm: thp: make split queue lock safe when LRU pages reparented Date: Thu, 16 Sep 2021 21:47:42 +0800 Message-Id: <20210916134748.67712-8-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=2kJaAcBo; spf=pass (imf03.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 30CEC30000AC X-Stat-Signature: 4dcyaeeksxkyjpp8f4k6uz4snqmoask8 X-HE-Tag: 1631800371-92220 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Similar to lruvec lock, we use the same approach to make the split queue lock safe when LRU pages reparented. Signed-off-by: Muchun Song Reported-by: kernel test robot --- mm/huge_memory.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9d8dfa82991a..12950d4988e6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -535,9 +535,22 @@ static struct deferred_split *split_queue_lock(struct page *head) { struct deferred_split *queue; + rcu_read_lock(); +retry: queue = page_split_queue(head); spin_lock(&queue->split_queue_lock); + if (unlikely(split_queue_memcg(queue) != page_memcg(head))) { + spin_unlock(&queue->split_queue_lock); + goto retry; + } + + /* + * Preemption is disabled in the internal of spin_lock, which can serve + * as RCU read-side critical sections. + */ + rcu_read_unlock(); + return queue; } @@ -546,9 +559,19 @@ split_queue_lock_irqsave(struct page *head, unsigned long *flags) { struct deferred_split *queue; + rcu_read_lock(); +retry: queue = page_split_queue(head); spin_lock_irqsave(&queue->split_queue_lock, *flags); + if (unlikely(split_queue_memcg(queue) != page_memcg(head))) { + spin_unlock_irqrestore(&queue->split_queue_lock, *flags); + goto retry; + } + + /* See the comments in split_queue_lock(). */ + rcu_read_unlock(); + return queue; } From patchwork Thu Sep 16 13:47:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA148C433F5 for ; Thu, 16 Sep 2021 13:52:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6245660EB4 for ; Thu, 16 Sep 2021 13:52:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6245660EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 045B66B007E; Thu, 16 Sep 2021 09:52:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F36606B0080; Thu, 16 Sep 2021 09:52:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFF11900002; Thu, 16 Sep 2021 09:52:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id D1C416B007E for ; Thu, 16 Sep 2021 09:52:57 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9A4382D4CA for ; Thu, 16 Sep 2021 13:52:57 +0000 (UTC) X-FDA: 78593577594.12.9F1BA25 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf12.hostedemail.com (Postfix) with ESMTP id 4B46310000AA for ; Thu, 16 Sep 2021 13:52:57 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id mv7-20020a17090b198700b0019c843e7233so1073388pjb.4 for ; Thu, 16 Sep 2021 06:52:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ajm3Sm96cZ8KFQqJftDUm/AAk++NS7Ihs73gDHBkIWY=; b=QVUY+1OybzlFTfcaJhZKd4XnKFBIxQNhIt9gdvnlhD1mLF8sr4WZznxvmUJt7vwgY1 Of5WlJDOoRdb2XoAHGdb8pR9Dh5B6DfIDy11sEM7xmK3Io1y2eXN7+QBfnu0b4iI4YXq KdIZjfscX0b4xe+lQs0SL10VUNfiBGfvWY02PBaerAEbrs8rGwDPAdBJIb4TlWL5wmDY v6CMsZ5OKN0FI/G9G4gaDCvgvscEOIwzdfT96ayVH2QVWyz4xm5CovLrW68kOCWCoRqx PDdZ0e1v4b1HIxjXc6RH0ZiXLkxCrEt3bTKl94XtQWHQZlHgfeyhKQdmB4CiHqM+lVc7 qpuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ajm3Sm96cZ8KFQqJftDUm/AAk++NS7Ihs73gDHBkIWY=; b=Vd5xE0LBYGMtNdjLvIIH86wjRfkhFLXvT3V1pj4xYc6+aPIHoS6BacIZoOmi9zBWUf C55V8TBkYWEtkqfU053pTbDi79H/+4QtzSYYFTU2SJTb4QQXcTwc+dn8EYlWBnWaLM8k SAsSLO1QKVFdj3irjthoYDOueSU3UxrKScHXt6dPZIBWwZ3KzZrVne0DrUmoJZNm5hfN JKbyWoqz/COWZiIrwQFTDSVBI0lbiwjv1EQiHq8dRusYV4PsOEhpbstdTraHf+uzls5q ygnqVcsCSMjsb51vRXZrwnwv2NMeLNqAefHco1D9qpFnJFk5tLF4ge9bJWXPWzHki0J5 xhCg== X-Gm-Message-State: AOAM530pMVWaWkY93aiviEmtzFJQVTmRlvLV3X3h4u0LdBZZdeE/L0qa fgq6xqVlwesIofutjkf0k7hnKA== X-Google-Smtp-Source: ABdhPJwGZdbWfbgEB9c2/uTLk05HGdVg5vbXYnVJcZmspknQjdGZbMw5JKXKvILoBT7Lmm3BkwMjGw== X-Received: by 2002:a17:90a:b105:: with SMTP id z5mr14723532pjq.64.1631800376272; Thu, 16 Sep 2021 06:52:56 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.50 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:52:55 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 08/13] mm: memcontrol: make all the callers of page_memcg() safe Date: Thu, 16 Sep 2021 21:47:43 +0800 Message-Id: <20210916134748.67712-9-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 4B46310000AA Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=QVUY+1Oy; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf12.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: 1frxs718jqds19x4kridisfm7wwh76iy X-HE-Tag: 1631800377-296820 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we use objcg APIs to charge the LRU pages, the page will not hold a reference to the memcg associated with the page. So the caller of the page_memcg() should hold an rcu read lock or obtain a reference to the memcg associated with the page to protect memcg from being released. So introduce get_mem_cgroup_from_page() to obtain a reference to the memory cgroup associated with the page. In this patch, make all the callers hold an rcu read lock or obtain a reference to the memcg to protect memcg from being released when the LRU pages reparented. We do not need to adjust the callers of page_memcg() during the whole process of mem_cgroup_move_task(). Because the cgroup migration and memory cgroup offlining are serialized by @cgroup_mutex. In this routine, the LRU pages cannot be reparented to its parent memory cgroup. So page_memcg(page) is stable and cannot be released. This is a preparation for reparenting the LRU pages. Signed-off-by: Muchun Song --- fs/buffer.c | 3 ++- fs/fs-writeback.c | 23 +++++++++++---------- include/linux/memcontrol.h | 39 ++++++++++++++++++++++++++++++++--- mm/memcontrol.c | 51 ++++++++++++++++++++++++++++++++++++---------- mm/migrate.c | 4 ++++ mm/page_io.c | 5 +++-- 6 files changed, 97 insertions(+), 28 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index ab7573d72dd7..52d257962343 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -823,7 +823,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, gfp |= __GFP_NOFAIL; /* The page lock pins the memcg */ - memcg = page_memcg(page); + memcg = get_mem_cgroup_from_page(page); old_memcg = set_active_memcg(memcg); head = NULL; @@ -843,6 +843,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, set_bh_page(bh, page, offset); } out: + mem_cgroup_put(memcg); set_active_memcg(old_memcg); return head; /* diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 81ec192ce067..d9a67fffcc78 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -243,15 +243,13 @@ void __inode_attach_wb(struct inode *inode, struct page *page) if (inode_cgwb_enabled(inode)) { struct cgroup_subsys_state *memcg_css; - if (page) { - memcg_css = mem_cgroup_css_from_page(page); - wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC); - } else { - /* must pin memcg_css, see wb_get_create() */ + /* must pin memcg_css, see wb_get_create() */ + if (page) + memcg_css = get_mem_cgroup_css_from_page(page); + else memcg_css = task_get_css(current, memory_cgrp_id); - wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC); - css_put(memcg_css); - } + wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC); + css_put(memcg_css); } if (!wb) @@ -866,16 +864,16 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page, if (!wbc->wb || wbc->no_cgroup_owner) return; - css = mem_cgroup_css_from_page(page); + css = get_mem_cgroup_css_from_page(page); /* dead cgroups shouldn't contribute to inode ownership arbitration */ if (!(css->flags & CSS_ONLINE)) - return; + goto out; id = css->id; if (id == wbc->wb_id) { wbc->wb_bytes += bytes; - return; + goto out; } if (id == wbc->wb_lcand_id) @@ -888,6 +886,9 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page, wbc->wb_tcand_bytes += bytes; else wbc->wb_tcand_bytes -= min(bytes, wbc->wb_tcand_bytes); + +out: + css_put(css); } EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner); diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6c2cb076c1a4..ab3cd844e91d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -376,7 +376,7 @@ static inline bool PageMemcgKmem(struct page *page); * a valid memcg, but can be atomically swapped to the parent memcg. * * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. + * e.g. acquire the rcu_read_lock or css_set_lock or cgroup_mutex. */ static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) { @@ -454,6 +454,31 @@ static inline struct mem_cgroup *page_memcg(struct page *page) } /* + * get_mem_cgroup_from_page - Obtain a reference on the memory cgroup associated + * with a page + * @page: a pointer to the page struct + * + * Returns a pointer to the memory cgroup (and obtain a reference on it) + * associated with the page, or NULL. This function assumes that the page + * is known to have a proper memory cgroup pointer. It's not safe to call + * this function against some type of pages, e.g. slab pages or ex-slab + * pages. + */ +static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); +retry: + memcg = page_memcg(page); + if (unlikely(memcg && !css_tryget(&memcg->css))) + goto retry; + rcu_read_unlock(); + + return memcg; +} + +/* * page_memcg_rcu - locklessly get the memory cgroup associated with a page * @page: a pointer to the page struct * @@ -881,7 +906,7 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, return match; } -struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page); +struct cgroup_subsys_state *get_mem_cgroup_css_from_page(struct page *page); ino_t page_cgroup_ino(struct page *page); static inline bool mem_cgroup_online(struct mem_cgroup *memcg) @@ -1037,10 +1062,13 @@ static inline void count_memcg_events(struct mem_cgroup *memcg, static inline void count_memcg_page_event(struct page *page, enum vm_event_item idx) { - struct mem_cgroup *memcg = page_memcg(page); + struct mem_cgroup *memcg; + rcu_read_lock(); + memcg = page_memcg(page); if (memcg) count_memcg_events(memcg, idx, 1); + rcu_read_unlock(); } static inline void count_memcg_event_mm(struct mm_struct *mm, @@ -1114,6 +1142,11 @@ static inline struct mem_cgroup *page_memcg(struct page *page) return NULL; } +static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) +{ + return NULL; +} + static inline struct mem_cgroup *page_memcg_rcu(struct page *page) { WARN_ON_ONCE(!rcu_read_lock_held()); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a57cce0ea24b..16db5b39cb81 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -413,7 +413,7 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); #endif /** - * mem_cgroup_css_from_page - css of the memcg associated with a page + * get_mem_cgroup_css_from_page - get css of the memcg associated with a page * @page: page of interest * * If memcg is bound to the default hierarchy, css of the memcg associated @@ -423,13 +423,15 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); * If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup * is returned. */ -struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page) +struct cgroup_subsys_state *get_mem_cgroup_css_from_page(struct page *page) { struct mem_cgroup *memcg; - memcg = page_memcg(page); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return &root_mem_cgroup->css; - if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) + memcg = get_mem_cgroup_from_page(page); + if (!memcg) memcg = root_mem_cgroup; return &memcg->css; @@ -1995,7 +1997,9 @@ void lock_page_memcg(struct page *page) * The RCU lock is held throughout the transaction. The fast * path can get away without acquiring the memcg->move_lock * because page moving starts with an RCU grace period. - */ + * + * The RCU lock also protects the memcg from being freed. + */ rcu_read_lock(); if (mem_cgroup_disabled()) @@ -4549,7 +4553,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, struct bdi_writeback *wb) { - struct mem_cgroup *memcg = page_memcg(page); + struct mem_cgroup *memcg; struct memcg_cgwb_frn *frn; u64 now = get_jiffies_64(); u64 oldest_at = now; @@ -4558,6 +4562,7 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, trace_track_foreign_dirty(page, wb); + memcg = get_mem_cgroup_from_page(page); /* * Pick the slot to use. If there is already a slot for @wb, keep * using it. If not replace the oldest one which isn't being @@ -4596,6 +4601,7 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, frn->memcg_id = wb->memcg_css->id; frn->at = now; } + css_put(&memcg->css); } /* issue foreign writeback flushes for recorded foreign dirtying events */ @@ -6163,6 +6169,14 @@ static void mem_cgroup_move_charge(void) atomic_dec(&mc.from->moving_account); } +/* + * The cgroup migration and memory cgroup offlining are serialized by + * @cgroup_mutex. If we reach here, it means that the LRU pages cannot + * be reparented to its parent memory cgroup. So during the whole process + * of mem_cgroup_move_task(), page_memcg(page) is stable. So we do not + * need to worry about the memcg (returned from page_memcg()) being + * released even if we do not hold an rcu read lock. + */ static void mem_cgroup_move_task(void) { if (mc.to) { @@ -6985,7 +6999,7 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) if (page_memcg(newpage)) return; - memcg = page_memcg(oldpage); + memcg = get_mem_cgroup_from_page(oldpage); VM_WARN_ON_ONCE_PAGE(!memcg, oldpage); if (!memcg) return; @@ -7006,6 +7020,8 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) mem_cgroup_charge_statistics(memcg, newpage, nr_pages); memcg_check_events(memcg, newpage); local_irq_restore(flags); + + css_put(&memcg->css); } DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); @@ -7192,6 +7208,10 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; + /* + * Interrupts should be disabled by the caller (see the comments below), + * which can serve as RCU read-side critical sections. + */ memcg = page_memcg(page); VM_WARN_ON_ONCE_PAGE(!memcg, page); @@ -7256,15 +7276,16 @@ int __mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return 0; + rcu_read_lock(); memcg = page_memcg(page); VM_WARN_ON_ONCE_PAGE(!memcg, page); if (!memcg) - return 0; + goto out; if (!entry.val) { memcg_memory_event(memcg, MEMCG_SWAP_FAIL); - return 0; + goto out; } memcg = mem_cgroup_id_get_online(memcg); @@ -7274,6 +7295,7 @@ int __mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) memcg_memory_event(memcg, MEMCG_SWAP_MAX); memcg_memory_event(memcg, MEMCG_SWAP_FAIL); mem_cgroup_id_put(memcg); + rcu_read_unlock(); return -ENOMEM; } @@ -7283,6 +7305,8 @@ int __mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg), nr_pages); VM_BUG_ON_PAGE(oldid, page); mod_memcg_state(memcg, MEMCG_SWAP, nr_pages); +out: + rcu_read_unlock(); return 0; } @@ -7337,17 +7361,22 @@ bool mem_cgroup_swap_full(struct page *page) if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) return false; + rcu_read_lock(); memcg = page_memcg(page); if (!memcg) - return false; + goto out; for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { unsigned long usage = page_counter_read(&memcg->swap); if (usage * 2 >= READ_ONCE(memcg->swap.high) || - usage * 2 >= READ_ONCE(memcg->swap.max)) + usage * 2 >= READ_ONCE(memcg->swap.max)) { + rcu_read_unlock(); return true; + } } +out: + rcu_read_unlock(); return false; } diff --git a/mm/migrate.c b/mm/migrate.c index a6a7743ee98f..940eaec234dc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -472,6 +472,10 @@ int migrate_page_move_mapping(struct address_space *mapping, struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; + /* + * Irq is disabled, which can serve as RCU read-side critical + * sections. + */ memcg = page_memcg(page); old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); diff --git a/mm/page_io.c b/mm/page_io.c index c493ce9ebcf5..81744777ab76 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -269,13 +269,14 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) struct cgroup_subsys_state *css; struct mem_cgroup *memcg; + rcu_read_lock(); memcg = page_memcg(page); if (!memcg) - return; + goto out; - rcu_read_lock(); css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); +out: rcu_read_unlock(); } #else From patchwork Thu Sep 16 13:47:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56BFBC433EF for ; Thu, 16 Sep 2021 13:53:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0CC9E60EB4 for ; Thu, 16 Sep 2021 13:53:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0CC9E60EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A7DE36B0080; Thu, 16 Sep 2021 09:53:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2E0C6B0081; Thu, 16 Sep 2021 09:53:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F555900002; Thu, 16 Sep 2021 09:53:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 7F5606B0080 for ; Thu, 16 Sep 2021 09:53:04 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 8DC742FD96 for ; Thu, 16 Sep 2021 13:53:03 +0000 (UTC) X-FDA: 78593577846.28.BF68F3F Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf17.hostedemail.com (Postfix) with ESMTP id 55D9FF0003A4 for ; Thu, 16 Sep 2021 13:53:03 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id y17so5899479pfl.13 for ; Thu, 16 Sep 2021 06:53:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9zJ+rajzCodDQWeoI37cULUx5xfm+EYhpR+j50QMba8=; b=UH0q2kMvwpE+oXiVGNhyaG/DpfCTcD3yJP4RI4OdyNI3bcpaFQG+yzqb8NVQKg1DpN DOUPe7YQUg3P01t3908CUfgU/X1z6/ooubNFka5e13rDo+0bEhnFU/wbaqU6bTOP2sjj ARz2nxJHBlGoK7BXo6ZKwbwWkMHew6yfg7dmmPwtmSmOxnVwzGTs4hVBSAcnaAqu3Bw3 dFopBR5Z7ZNZTkQxdsyqoEBiicslqlwytz0mt3/JfKVOkrmmuX82jNFNyDBYuLmvAikv U/COC/LXn5akRPaFXjlFHPLYnCbAj5jOLctJSqtysE8hPp08hvYBr7Ujy+BO89yjik3b 9UTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9zJ+rajzCodDQWeoI37cULUx5xfm+EYhpR+j50QMba8=; b=RAWxfjYktSf76Fb5sPeo5vj3vj062TGd4/7zGGcViG0cqNnV/GKBHXwnAJTvwM5s5I kaEXnvXwVOK55SBRhY3A5MCCgbSgzdNN9KwBzwV337sCKDS0H2aa0eAvrgnqPmxhry7P qa+6py8yPkWNN8fgnIRvolUks3AsCAQ8wP1uwKOSJ9BaitfTIP3NnzBQqafiQOfFGqq+ jAh4pCV/oiYLzfkHOCmA5Dsv750GQ0REThLBIl6nb1fyl5ii6ZtDozTt3EcxfxOHuFjg bRqoDJrWQjVmYv8daOSz1vBAHFZmZ1kQf+PyTV1iTontcL4rtMzJzkIv3I3+J+S50i/B pOzg== X-Gm-Message-State: AOAM5300xpEPIqR3VHgtO+i8uJFpJpq1p+O4TX4qof3ufR/tnz75MJ0n uI5G1aWDDbuFlMDv+RW8gDgNpg== X-Google-Smtp-Source: ABdhPJwxGZVjBFXnUvLHMSj361dGfdo0pyqFzMu2FSl2ZkHX6t6WAMc0bR0oqwQWyHCKbeJyZ6uePw== X-Received: by 2002:a63:da14:: with SMTP id c20mr5099842pgh.155.1631800382477; Thu, 16 Sep 2021 06:53:02 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.52.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:53:02 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 09/13] mm: memcontrol: introduce memcg_reparent_ops Date: Thu, 16 Sep 2021 21:47:44 +0800 Message-Id: <20210916134748.67712-10-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 55D9FF0003A4 X-Stat-Signature: radchxiberpjokn3kc6dt8tro3f4ku5b Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UH0q2kMv; spf=pass (imf17.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800383-358357 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the previous patch, we know how to make the lruvec lock safe when the LRU pages reparented. We should do something like following. memcg_reparent_objcgs(memcg) 1) lock // lruvec belongs to memcg and lruvec_parent belongs to parent memcg. spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); 2) do reparent // Move all the pages from the lruvec list to the parent lruvec list. 3) unlock spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); Apart from the page lruvec lock, the deferred split queue lock (THP only) also needs to do something similar. So we extract the necessary three steps in the memcg_reparent_objcgs(). memcg_reparent_objcgs(memcg) 1) lock memcg_reparent_ops->lock(memcg, parent); 2) reparent memcg_reparent_ops->reparent(memcg, reparent); 3) unlock memcg_reparent_ops->unlock(memcg, reparent); Now there are two different locks (e.g. lruvec lock and deferred split queue lock) need to use this infrastructure. In the next patch, we will use those APIs to make those locks safe when the LRU pages reparented. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 7 +++++++ mm/memcontrol.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ab3cd844e91d..18344c1f4333 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -350,6 +350,13 @@ struct mem_cgroup { struct mem_cgroup_per_node *nodeinfo[]; }; +struct memcg_reparent_ops { + /* Irq is disabled before calling those callbacks. */ + void (*lock)(struct mem_cgroup *memcg, struct mem_cgroup *parent); + void (*unlock)(struct mem_cgroup *memcg, struct mem_cgroup *parent); + void (*reparent)(struct mem_cgroup *memcg, struct mem_cgroup *parent); +}; + /* * size of first charge trial. "32" comes from vmscan.c's magic value. * TODO: maybe necessary to use big numbers in big irons. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 16db5b39cb81..3a73fd192734 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -333,6 +333,35 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } +static const struct memcg_reparent_ops *memcg_reparent_ops[] = {}; + +static void memcg_reparent_lock(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(memcg_reparent_ops); i++) + memcg_reparent_ops[i]->lock(memcg, parent); +} + +static void memcg_reparent_unlock(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(memcg_reparent_ops); i++) + memcg_reparent_ops[i]->unlock(memcg, parent); +} + +static void memcg_do_reparent(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(memcg_reparent_ops); i++) + memcg_reparent_ops[i]->reparent(memcg, parent); +} + static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; @@ -342,9 +371,13 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg) if (!parent) parent = root_mem_cgroup; + local_irq_disable(); + + memcg_reparent_lock(memcg, parent); + objcg = rcu_replace_pointer(memcg->objcg, NULL, true); - spin_lock_irq(&css_set_lock); + spin_lock(&css_set_lock); /* 1) Ready to reparent active objcg. */ list_add(&objcg->list, &memcg->objcg_list); @@ -354,7 +387,13 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg) /* 3) Move already reparented objcgs to the parent's list */ list_splice(&memcg->objcg_list, &parent->objcg_list); - spin_unlock_irq(&css_set_lock); + spin_unlock(&css_set_lock); + + memcg_do_reparent(memcg, parent); + + memcg_reparent_unlock(memcg, parent); + + local_irq_enable(); percpu_ref_kill(&objcg->refcnt); } From patchwork Thu Sep 16 13:47:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DF27C433EF for ; Thu, 16 Sep 2021 13:53:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DED0160EB4 for ; Thu, 16 Sep 2021 13:53:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DED0160EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 80CAC6B0081; Thu, 16 Sep 2021 09:53:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BBC3900002; Thu, 16 Sep 2021 09:53:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 684796B0083; Thu, 16 Sep 2021 09:53:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id 574796B0081 for ; Thu, 16 Sep 2021 09:53:10 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E61592C5AA for ; Thu, 16 Sep 2021 13:53:09 +0000 (UTC) X-FDA: 78593578098.13.AD90A61 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf06.hostedemail.com (Postfix) with ESMTP id 543AE801A8B2 for ; Thu, 16 Sep 2021 13:53:09 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id w7so6172620pgk.13 for ; Thu, 16 Sep 2021 06:53:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6FS2T3CuVsbF37UlIhZKo2KSUxlfa4i+78W1D2SgkOM=; b=jHD/EzyUApRRlS9YM81+l6oEbmaUT+bMNOzsObSL/mdUOuX9JehEu9gkGFoUnw1Rn7 ewZUPfPaxCD1SJ+brWdQVE4ipoO3ufJtSD0uOvtlXaKdzTb6RL++oXYSOEjxNRif+OSC yJam5GWiiIPIXgReYkeOQC1wzJCUjltUNBTZv7UQV5Pb4Rr/KIo4c0wlozApCXRkKlD7 3SnL9vYRRmrNXV7M0MZysBfEtM1/hCc5vQEomLk4bW1qL3o3UBVvG4Kj1sw32KDBJm8e qsSx3e1qCfMBZsNOthb5FFhVdwRQf8GCdaUCOOaJblmUx+sIQG888qzVeQD4JF1/K1EG MtrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6FS2T3CuVsbF37UlIhZKo2KSUxlfa4i+78W1D2SgkOM=; b=Y9FjF7Knezxwjxr8fGmE8OYcjSiiHqqK4yxCpwUDCYdboKv888R1IWEzNo0bzmYC4Z MmeANd6wQP0lv77SLYpOP2/u/A1Z+RsS7Gc/c0kf0CU9jKX80ZEuhCL/4uGOodJbX5bv VES+kvASrhjiIVDoPv98Q6jbpqYFwk0vnQwj3ml0GBfOvIyq6L0WqaPsAK2sey1Bdaqq XQsSQMdRQq/sc0iefLJrYSMRpplTm+5aP6P57aymkpmB+twn1eskzUp4Yns31U0btTXF z6Gc8cOqJcf9zthm33bWfcHJb4olWe7Cwzq95YqOAM7bf7uSJ6XY4hZd2jNp3k5Suj4F 3V+A== X-Gm-Message-State: AOAM530pmm2gN6rS1+S+ZDdHWvcLcLVoj2LoytYS5mwgtpBvZzFQjv0d k+iKD7VXYXLXc/TUXO4eBLYETQ== X-Google-Smtp-Source: ABdhPJy2wFXm0sLo/HaLqvwRd6YTU2HfNsMuYi6izDtMLhvaKIgomOpfJW6OJ0vtsKfDcqPomX/BHQ== X-Received: by 2002:a63:e909:: with SMTP id i9mr5107769pgh.162.1631800388161; Thu, 16 Sep 2021 06:53:08 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.53.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:53:07 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 10/13] mm: memcontrol: use obj_cgroup APIs to charge the LRU pages Date: Thu, 16 Sep 2021 21:47:45 +0800 Message-Id: <20210916134748.67712-11-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 543AE801A8B2 X-Stat-Signature: f43w8mx7k69adx4kmby6yi3zhnu8wcdg Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="jHD/EzyU"; spf=pass (imf06.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800389-429465 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We will reuse the obj_cgroup APIs to charge the LRU pages. Finally, page->memcg_data will have 2 different meanings. - For the slab pages, page->memcg_data points to an object cgroups vector. - For the kmem pages (exclude the slab pages) and the LRU pages, page->memcg_data points to an object cgroup. In this patch, we reuse obj_cgroup APIs to charge LRU pages. In the end, The page cache cannot prevent long-living objects from pinning the original memory cgroup in the memory. At the same time we also changed the rules of page and objcg or memcg binding stability. The new rules are as follows. For a page any of the following ensures page and objcg binding stability: - the page lock - LRU isolation - lock_page_memcg() - exclusive reference Based on the stable binding of page and objcg, for a page any of the following ensures page and memcg binding stability: - css_set_lock - cgroup_mutex - the lruvec lock - the split queue lock (only THP page) If the caller only want to ensure that the page counters of memcg are updated correctly, ensure that the binding stability of page and objcg is sufficient. Signed-off-by: Muchun Song Reported-by: kernel test robot --- include/linux/memcontrol.h | 96 ++++++--------- mm/huge_memory.c | 42 +++++++ mm/memcontrol.c | 296 ++++++++++++++++++++++++++++++++------------- 3 files changed, 292 insertions(+), 142 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 18344c1f4333..3d9691395cf3 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -376,8 +376,6 @@ enum page_memcg_data_flags { #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1) -static inline bool PageMemcgKmem(struct page *page); - /* * After the initialization objcg->memcg is always pointing at * a valid memcg, but can be atomically swapped to the parent memcg. @@ -391,43 +389,19 @@ static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) } /* - * __page_memcg - get the memory cgroup associated with a non-kmem page - * @page: a pointer to the page struct - * - * Returns a pointer to the memory cgroup associated with the page, - * or NULL. This function assumes that the page is known to have a - * proper memory cgroup pointer. It's not safe to call this function - * against some type of pages, e.g. slab pages or ex-slab pages or - * kmem pages. - */ -static inline struct mem_cgroup *__page_memcg(struct page *page) -{ - unsigned long memcg_data = page->memcg_data; - - VM_BUG_ON_PAGE(PageSlab(page), page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page); - - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); -} - -/* - * __page_objcg - get the object cgroup associated with a kmem page + * page_objcg - get the object cgroup associated with page * @page: a pointer to the page struct * * Returns a pointer to the object cgroup associated with the page, * or NULL. This function assumes that the page is known to have a - * proper object cgroup pointer. It's not safe to call this function - * against some type of pages, e.g. slab pages or ex-slab pages or - * LRU pages. + * proper object cgroup pointer. */ -static inline struct obj_cgroup *__page_objcg(struct page *page) +static inline struct obj_cgroup *page_objcg(struct page *page) { unsigned long memcg_data = page->memcg_data; VM_BUG_ON_PAGE(PageSlab(page), page); VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); - VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } @@ -441,23 +415,35 @@ static inline struct obj_cgroup *__page_objcg(struct page *page) * proper memory cgroup pointer. It's not safe to call this function * against some type of pages, e.g. slab pages or ex-slab pages. * - * For a non-kmem page any of the following ensures page and memcg binding - * stability: + * For a page any of the following ensures page and objcg binding stability: * * - the page lock * - LRU isolation * - lock_page_memcg() * - exclusive reference * - * For a kmem page a caller should hold an rcu read lock to protect memcg - * associated with a kmem page from being released. + * Based on the stable binding of page and objcg, for a page any of the + * following ensures page and memcg binding stability: + * + * - css_set_lock + * - cgroup_mutex + * - the lruvec lock + * - the split queue lock (only THP page) + * + * If the caller only want to ensure that the page counters of memcg are + * updated correctly, ensure that the binding stability of page and objcg + * is sufficient. + * + * A caller should hold an rcu read lock (In addition, regions of code across + * which interrupts, preemption, or softirqs have been disabled also serve as + * RCU read-side critical sections) to protect memcg associated with a page + * from being released. */ static inline struct mem_cgroup *page_memcg(struct page *page) { - if (PageMemcgKmem(page)) - return obj_cgroup_memcg(__page_objcg(page)); - else - return __page_memcg(page); + struct obj_cgroup *objcg = page_objcg(page); + + return objcg ? obj_cgroup_memcg(objcg) : NULL; } /* @@ -470,6 +456,8 @@ static inline struct mem_cgroup *page_memcg(struct page *page) * is known to have a proper memory cgroup pointer. It's not safe to call * this function against some type of pages, e.g. slab pages or ex-slab * pages. + * + * The page and objcg or memcg binding rules can refer to page_memcg(). */ static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) { @@ -493,22 +481,20 @@ static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) * or NULL. This function assumes that the page is known to have a * proper memory cgroup pointer. It's not safe to call this function * against some type of pages, e.g. slab pages or ex-slab pages. + * + * The page and objcg or memcg binding rules can refer to page_memcg(). */ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) { unsigned long memcg_data = READ_ONCE(page->memcg_data); + struct obj_cgroup *objcg; VM_BUG_ON_PAGE(PageSlab(page), page); WARN_ON_ONCE(!rcu_read_lock_held()); - if (memcg_data & MEMCG_DATA_KMEM) { - struct obj_cgroup *objcg; - - objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); - return obj_cgroup_memcg(objcg); - } + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return objcg ? obj_cgroup_memcg(objcg) : NULL; } /* @@ -521,16 +507,10 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) * has an associated memory cgroup pointer or an object cgroups vector or * an object cgroup. * - * For a non-kmem page any of the following ensures page and memcg binding - * stability: - * - * - the page lock - * - LRU isolation - * - lock_page_memcg() - * - exclusive reference + * The page and objcg or memcg binding rules can refer to page_memcg(). * - * For a kmem page a caller should hold an rcu read lock to protect memcg - * associated with a kmem page from being released. + * A caller should hold an rcu read lock to protect memcg associated with a + * page from being released. */ static inline struct mem_cgroup *page_memcg_check(struct page *page) { @@ -539,18 +519,14 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page) * for slab pages, READ_ONCE() should be used here. */ unsigned long memcg_data = READ_ONCE(page->memcg_data); + struct obj_cgroup *objcg; if (memcg_data & MEMCG_DATA_OBJCGS) return NULL; - if (memcg_data & MEMCG_DATA_KMEM) { - struct obj_cgroup *objcg; - - objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); - return obj_cgroup_memcg(objcg); - } + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return objcg ? obj_cgroup_memcg(objcg) : NULL; } #ifdef CONFIG_MEMCG_KMEM diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 12950d4988e6..d6738637feae 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -499,6 +499,48 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) } #ifdef CONFIG_MEMCG +static struct shrinker deferred_split_shrinker; + +static void memcg_reparent_split_queue_lock(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + spin_lock(&memcg->deferred_split_queue.split_queue_lock); + spin_lock(&parent->deferred_split_queue.split_queue_lock); +} + +static void memcg_reparent_split_queue_unlock(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + spin_unlock(&parent->deferred_split_queue.split_queue_lock); + spin_unlock(&memcg->deferred_split_queue.split_queue_lock); +} + +static void memcg_reparent_split_queue(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int nid; + struct deferred_split *src, *dst; + + src = &memcg->deferred_split_queue; + dst = &parent->deferred_split_queue; + + if (!src->split_queue_len) + return; + + list_splice_tail_init(&src->split_queue, &dst->split_queue); + dst->split_queue_len += src->split_queue_len; + src->split_queue_len = 0; + + for_each_node(nid) + set_shrinker_bit(parent, nid, deferred_split_shrinker.id); +} + +const struct memcg_reparent_ops split_queue_reparent_ops = { + .lock = memcg_reparent_split_queue_lock, + .unlock = memcg_reparent_split_queue_unlock, + .reparent = memcg_reparent_split_queue, +}; + static inline struct mem_cgroup *split_queue_memcg(struct deferred_split *queue) { if (mem_cgroup_disabled()) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3a73fd192734..3688651d85c2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -75,6 +75,7 @@ struct cgroup_subsys memory_cgrp_subsys __read_mostly; EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; +static struct obj_cgroup *root_obj_cgroup __read_mostly; /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); @@ -261,6 +262,11 @@ struct mem_cgroup *vmpressure_to_memcg(struct vmpressure *vmpr) return container_of(vmpr, struct mem_cgroup, vmpressure); } +static inline bool obj_cgroup_is_root(struct obj_cgroup *objcg) +{ + return objcg == root_obj_cgroup; +} + extern spinlock_t css_set_lock; static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, @@ -333,7 +339,81 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } -static const struct memcg_reparent_ops *memcg_reparent_ops[] = {}; +static void memcg_reparent_lruvec_lock(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int i; + + for_each_node(i) { + spin_lock(&mem_cgroup_lruvec(memcg, NODE_DATA(i))->lru_lock); + spin_lock(&mem_cgroup_lruvec(parent, NODE_DATA(i))->lru_lock); + } +} + +static void memcg_reparent_lruvec_unlock(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int i; + + for_each_node(i) { + spin_unlock(&mem_cgroup_lruvec(parent, NODE_DATA(i))->lru_lock); + spin_unlock(&mem_cgroup_lruvec(memcg, NODE_DATA(i))->lru_lock); + } +} + +static void lruvec_reparent_lru(struct lruvec *src, struct lruvec *dst, + enum lru_list lru) +{ + int zid; + struct mem_cgroup_per_node *mz_src, *mz_dst; + + mz_src = container_of(src, struct mem_cgroup_per_node, lruvec); + mz_dst = container_of(dst, struct mem_cgroup_per_node, lruvec); + + list_splice_tail_init(&src->lists[lru], &dst->lists[lru]); + + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + mz_dst->lru_zone_size[zid][lru] += mz_src->lru_zone_size[zid][lru]; + mz_src->lru_zone_size[zid][lru] = 0; + } +} + +static void memcg_reparent_lruvec(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int i; + + for_each_node(i) { + enum lru_list lru; + struct lruvec *src, *dst; + + src = mem_cgroup_lruvec(memcg, NODE_DATA(i)); + dst = mem_cgroup_lruvec(parent, NODE_DATA(i)); + + dst->anon_cost += src->anon_cost; + dst->file_cost += src->file_cost; + + for_each_lru(lru) + lruvec_reparent_lru(src, dst, lru); + } +} + +static const struct memcg_reparent_ops lruvec_reparent_ops = { + .lock = memcg_reparent_lruvec_lock, + .unlock = memcg_reparent_lruvec_unlock, + .reparent = memcg_reparent_lruvec, +}; + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +extern struct memcg_reparent_ops split_queue_reparent_ops; +#endif + +static const struct memcg_reparent_ops *memcg_reparent_ops[] = { + &lruvec_reparent_ops, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + &split_queue_reparent_ops, +#endif +}; static void memcg_reparent_lock(struct mem_cgroup *memcg, struct mem_cgroup *parent) @@ -2797,18 +2877,18 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) } #endif -static void commit_charge(struct page *page, struct mem_cgroup *memcg) +static void commit_charge(struct page *page, struct obj_cgroup *objcg) { - VM_BUG_ON_PAGE(page_memcg(page), page); + VM_BUG_ON_PAGE(page_objcg(page), page); /* - * Any of the following ensures page's memcg stability: + * Any of the following ensures page's objcg stability: * * - the page lock * - LRU isolation * - lock_page_memcg() * - exclusive reference */ - page->memcg_data = (unsigned long)memcg; + page->memcg_data = (unsigned long)objcg; } static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) @@ -2825,6 +2905,21 @@ static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) return memcg; } +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg) +{ + struct obj_cgroup *objcg = NULL; + + rcu_read_lock(); + for (; memcg; memcg = parent_mem_cgroup(memcg)) { + objcg = rcu_dereference(memcg->objcg); + if (objcg && obj_cgroup_tryget(objcg)) + break; + } + rcu_read_unlock(); + + return objcg; +} + #ifdef CONFIG_MEMCG_KMEM /* * The allocated objcg pointers array is not accounted directly. @@ -2930,12 +3025,15 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) else memcg = mem_cgroup_from_task(current); - for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { - objcg = rcu_dereference(memcg->objcg); - if (objcg && obj_cgroup_tryget(objcg)) - break; + if (mem_cgroup_is_root(memcg)) + goto out; + + objcg = get_obj_cgroup_from_memcg(memcg); + if (obj_cgroup_is_root(objcg)) { + obj_cgroup_put(objcg); objcg = NULL; } +out: rcu_read_unlock(); return objcg; @@ -3078,13 +3176,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct obj_cgroup *objcg; + struct obj_cgroup *objcg = page_objcg(page); unsigned int nr_pages = 1 << order; - if (!PageMemcgKmem(page)) + if (!objcg) return; - objcg = __page_objcg(page); + VM_BUG_ON_PAGE(!PageMemcgKmem(page), page); obj_cgroup_uncharge_pages(objcg, nr_pages); page->memcg_data = 0; obj_cgroup_put(objcg); @@ -3316,23 +3414,20 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) #endif /* CONFIG_MEMCG_KMEM */ /* - * Because page_memcg(head) is not set on tails, set it now. + * Because page_objcg(head) is not set on tails, set it now. */ void split_page_memcg(struct page *head, unsigned int nr) { - struct mem_cgroup *memcg = page_memcg(head); + struct obj_cgroup *objcg = page_objcg(head); int i; - if (mem_cgroup_disabled() || !memcg) + if (mem_cgroup_disabled() || !objcg) return; for (i = 1; i < nr; i++) head[i].memcg_data = head->memcg_data; - if (PageMemcgKmem(head)) - obj_cgroup_get_many(__page_objcg(head), nr - 1); - else - css_get_many(&memcg->css, nr - 1); + obj_cgroup_get_many(objcg, nr - 1); } #ifdef CONFIG_MEMCG_SWAP @@ -5303,6 +5398,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) objcg->memcg = memcg; rcu_assign_pointer(memcg->objcg, objcg); + if (unlikely(mem_cgroup_is_root(memcg))) + root_obj_cgroup = objcg; + /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); css_get(css); @@ -5731,10 +5829,10 @@ static int mem_cgroup_move_account(struct page *page, */ smp_mb(); - css_get(&to->css); - css_put(&from->css); + obj_cgroup_get(to->objcg); + obj_cgroup_put(from->objcg); - page->memcg_data = (unsigned long)to; + page->memcg_data = (unsigned long)to->objcg; __unlock_page_memcg(from); @@ -6206,6 +6304,42 @@ static void mem_cgroup_move_charge(void) mmap_read_unlock(mc.mm); atomic_dec(&mc.from->moving_account); + + /* + * Moving its pages to another memcg is finished. Wait for already + * started RCU-only updates to finish to make sure that the caller + * of lock_page_memcg() can unlock the correct move_lock. The + * possible bad scenario would like: + * + * CPU0: CPU1: + * mem_cgroup_move_charge() + * walk_page_range() + * + * lock_page_memcg(page) + * memcg = page_memcg(page) + * spin_lock_irqsave(&memcg->move_lock) + * memcg->move_lock_task = current + * + * atomic_dec(&mc.from->moving_account) + * + * mem_cgroup_css_offline() + * memcg_offline_kmem() + * memcg_reparent_objcgs() <== reparented + * + * unlock_page_memcg(page) + * memcg = page_memcg(page) <== memcg has been changed + * if (memcg->move_lock_task == current) <== false + * spin_unlock_irqrestore(&memcg->move_lock) + * + * Once mem_cgroup_move_charge() returns (it means that the cgroup_mutex + * would be released soon), the page can be reparented to its parent + * memcg. When the unlock_page_memcg() is called for the page, we will + * miss unlock the move_lock. So using synchronize_rcu to wait for + * already started RCU-only updates to finish before this function + * returns (mem_cgroup_move_charge() and mem_cgroup_css_offline() are + * serialized by cgroup_mutex). + */ + synchronize_rcu(); } /* @@ -6762,21 +6896,26 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, static int charge_memcg(struct page *page, struct mem_cgroup *memcg, gfp_t gfp) { + struct obj_cgroup *objcg; unsigned int nr_pages = thp_nr_pages(page); - int ret; + int ret = 0; - ret = try_charge(memcg, gfp, nr_pages); + objcg = get_obj_cgroup_from_memcg(memcg); + /* Do not account at the root objcg level. */ + if (!obj_cgroup_is_root(objcg)) + ret = try_charge(memcg, gfp, nr_pages); if (ret) goto out; - css_get(&memcg->css); - commit_charge(page, memcg); + obj_cgroup_get(objcg); + commit_charge(page, objcg); local_irq_disable(); mem_cgroup_charge_statistics(memcg, page, nr_pages); memcg_check_events(memcg, page); local_irq_enable(); out: + obj_cgroup_put(objcg); return ret; } @@ -6876,7 +7015,7 @@ void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) } struct uncharge_gather { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; @@ -6891,84 +7030,72 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug) static void uncharge_batch(const struct uncharge_gather *ug) { unsigned long flags; + struct mem_cgroup *memcg; + rcu_read_lock(); + memcg = obj_cgroup_memcg(ug->objcg); if (ug->nr_memory) { - page_counter_uncharge(&ug->memcg->memory, ug->nr_memory); + page_counter_uncharge(&memcg->memory, ug->nr_memory); if (do_memsw_account()) - page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); + page_counter_uncharge(&memcg->memsw, ug->nr_memory); if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); - memcg_oom_recover(ug->memcg); + page_counter_uncharge(&memcg->kmem, ug->nr_kmem); + memcg_oom_recover(memcg); } local_irq_save(flags); - __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); - __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); - memcg_check_events(ug->memcg, ug->dummy_page); + __count_memcg_events(memcg, PGPGOUT, ug->pgpgout); + __this_cpu_add(memcg->vmstats_percpu->nr_page_events, ug->nr_memory); + memcg_check_events(memcg, ug->dummy_page); local_irq_restore(flags); + rcu_read_unlock(); /* drop reference from uncharge_page */ - css_put(&ug->memcg->css); + obj_cgroup_put(ug->objcg); } static void uncharge_page(struct page *page, struct uncharge_gather *ug) { unsigned long nr_pages; - struct mem_cgroup *memcg; struct obj_cgroup *objcg; - bool use_objcg = PageMemcgKmem(page); VM_BUG_ON_PAGE(PageLRU(page), page); /* * Nobody should be changing or seriously looking at - * page memcg or objcg at this point, we have fully - * exclusive access to the page. + * page objcg at this point, we have fully exclusive + * access to the page. */ - if (use_objcg) { - objcg = __page_objcg(page); - /* - * This get matches the put at the end of the function and - * kmem pages do not hold memcg references anymore. - */ - memcg = get_mem_cgroup_from_objcg(objcg); - } else { - memcg = __page_memcg(page); - } - - if (!memcg) + objcg = page_objcg(page); + if (!objcg) return; - if (ug->memcg != memcg) { - if (ug->memcg) { + if (ug->objcg != objcg) { + if (ug->objcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg = memcg; + ug->objcg = objcg; ug->dummy_page = page; - /* pairs with css_put in uncharge_batch */ - css_get(&memcg->css); + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(objcg); } nr_pages = compound_nr(page); - if (use_objcg) { + if (PageMemcgKmem(page)) { ug->nr_memory += nr_pages; ug->nr_kmem += nr_pages; - - page->memcg_data = 0; - obj_cgroup_put(objcg); } else { /* LRU pages aren't accounted at the root level */ - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) ug->nr_memory += nr_pages; ug->pgpgout++; - - page->memcg_data = 0; } - css_put(&memcg->css); + page->memcg_data = 0; + obj_cgroup_put(objcg); } /** @@ -6982,7 +7109,7 @@ void __mem_cgroup_uncharge(struct page *page) struct uncharge_gather ug; /* Don't touch page->lru of any random page, pre-check: */ - if (!page_memcg(page)) + if (!page_objcg(page)) return; uncharge_gather_clear(&ug); @@ -7005,7 +7132,7 @@ void __mem_cgroup_uncharge_list(struct list_head *page_list) uncharge_gather_clear(&ug); list_for_each_entry(page, page_list, lru) uncharge_page(page, &ug); - if (ug.memcg) + if (ug.objcg) uncharge_batch(&ug); } @@ -7022,6 +7149,7 @@ void __mem_cgroup_uncharge_list(struct list_head *page_list) void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) { struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned int nr_pages; unsigned long flags; @@ -7035,32 +7163,34 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage) return; /* Page cache replacement: new page already charged? */ - if (page_memcg(newpage)) + if (page_objcg(newpage)) return; - memcg = get_mem_cgroup_from_page(oldpage); - VM_WARN_ON_ONCE_PAGE(!memcg, oldpage); - if (!memcg) + objcg = page_objcg(oldpage); + VM_WARN_ON_ONCE_PAGE(!objcg, oldpage); + if (!objcg) return; /* Force-charge the new page. The old one will be freed soon */ nr_pages = thp_nr_pages(newpage); - if (!mem_cgroup_is_root(memcg)) { + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + + if (!obj_cgroup_is_root(objcg)) { page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); } - css_get(&memcg->css); - commit_charge(newpage, memcg); + obj_cgroup_get(objcg); + commit_charge(newpage, objcg); local_irq_save(flags); mem_cgroup_charge_statistics(memcg, newpage, nr_pages); memcg_check_events(memcg, newpage); local_irq_restore(flags); - - css_put(&memcg->css); + rcu_read_unlock(); } DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); @@ -7235,6 +7365,7 @@ static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg) void mem_cgroup_swapout(struct page *page, swp_entry_t entry) { struct mem_cgroup *memcg, *swap_memcg; + struct obj_cgroup *objcg; unsigned int nr_entries; unsigned short oldid; @@ -7247,15 +7378,16 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; + objcg = page_objcg(page); + VM_WARN_ON_ONCE_PAGE(!objcg, page); + if (!objcg) + return; + /* * Interrupts should be disabled by the caller (see the comments below), * which can serve as RCU read-side critical sections. */ - memcg = page_memcg(page); - - VM_WARN_ON_ONCE_PAGE(!memcg, page); - if (!memcg) - return; + memcg = obj_cgroup_memcg(objcg); /* * In case the memcg owning these pages has been offlined and doesn't @@ -7274,7 +7406,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) page->memcg_data = 0; - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) page_counter_uncharge(&memcg->memory, nr_entries); if (!cgroup_memory_noswap && memcg != swap_memcg) { @@ -7293,7 +7425,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) mem_cgroup_charge_statistics(memcg, page, -nr_entries); memcg_check_events(memcg, page); - css_put(&memcg->css); + obj_cgroup_put(objcg); } /** From patchwork Thu Sep 16 13:47:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B2AAC433EF for ; Thu, 16 Sep 2021 13:53:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0AE556103B for ; Thu, 16 Sep 2021 13:53:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0AE556103B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A7FDB6B0072; Thu, 16 Sep 2021 09:53:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A542F6B0082; Thu, 16 Sep 2021 09:53:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F4DB6B0083; Thu, 16 Sep 2021 09:53:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 80B386B0072 for ; Thu, 16 Sep 2021 09:53:15 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4005182499A8 for ; Thu, 16 Sep 2021 13:53:15 +0000 (UTC) X-FDA: 78593578350.18.8490E40 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf09.hostedemail.com (Postfix) with ESMTP id E8D493000116 for ; Thu, 16 Sep 2021 13:53:14 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id j16so6000905pfc.2 for ; Thu, 16 Sep 2021 06:53:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EAWQcd3+iSnFgMjA09dHwY1Do+Ap2eL4bKxjuZ/dl64=; b=i4Hw57/+75X3uPhhdK62krbcoa6MFZmG7EarqwjYEVIBwnGoPnQrP1HOd2LNjGVRR3 eTJbiT7+WFUS4Abceq3Dv9M7A5aTFIZIIx21Z/HGy5e472HxhNQjn/BBIxOq2vF2lyQO XEYWshLjAiBL+l3ifJma7fI2uopq09Mdy2W31up2sfZ3ykDItHtg9bt0JzaPKoD/jsEq clJENg3EiutQ41owuQvUoNYosoP22rsblSq3/ym/9/qRqxXonUF0CMpgP/k/HFv/h90a u5W3WSGOHuzDvZ0XNeEK1tlBTa2xTgf3l27NowF0MBmwuoU9Ve+xt1INqMNyaD2zeFDQ YnaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EAWQcd3+iSnFgMjA09dHwY1Do+Ap2eL4bKxjuZ/dl64=; b=RNxYwEeSSuCA29f2KnbNj+YFn+1auHrMGlOP5pAuhon7snvpyVzz/NKeOi9aQZB/RW w+ryX+VUaLF9hNtOzGi5s9XbrLYTxN5qtSJZzy17B16BrUMWV2POxIQ3xdRz25uYPsTS DIF/bYURl4cl/iZ321cYu5ZR5KbHHFxk8Qw5cq2ftg9VVxKZOsyizHeZCLRKHrOtp0rn 4KUemhjGs7odO25QH0Hynyy2qLdST/kALbPWDzldXFpjOSsGQaq8TCvXAWlGf9au5/yT BjEZEhL+jOZCjhyQhwtOYhMn8ULlNL6+B2cdVnC6SrbBgGXuDv2uZTQ34F9t6sQFWIny 1TKw== X-Gm-Message-State: AOAM530ODgbg2yPek1urQXFT6pp6ERZY/UWBEcDqdpw4l08xg9GGfj+r mQOR2nxdNUr712gwE9Pv0Mn3NQ== X-Google-Smtp-Source: ABdhPJzD8IrxHC7Hmk/2ELz/2nWF6uyT8Ufdk9FSeyD8Xh848ebbSqqzQ2tLDOwGFhqDgr06Yglj3Q== X-Received: by 2002:aa7:9731:0:b0:43c:9087:fb40 with SMTP id k17-20020aa79731000000b0043c9087fb40mr5027702pfg.55.1631800393845; Thu, 16 Sep 2021 06:53:13 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.53.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:53:13 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 11/13] mm: memcontrol: rename {un}lock_page_memcg() to {un}lock_page_objcg() Date: Thu, 16 Sep 2021 21:47:46 +0800 Message-Id: <20210916134748.67712-12-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="i4Hw57/+"; spf=pass (imf09.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E8D493000116 X-Stat-Signature: 5aipyy8nf8gphx4hf5k5xecbopj81krz X-HE-Tag: 1631800394-485110 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now the lock_page_memcg() does not lock a page and memcg binding, it actually lock a page and objcg binding. So rename lock_page_memcg() to lock_page_objcg(). This is just code cleanup without any functionality changes. Signed-off-by: Muchun Song --- Documentation/admin-guide/cgroup-v1/memory.rst | 2 +- fs/buffer.c | 8 ++--- include/linux/memcontrol.h | 18 ++++++---- mm/filemap.c | 2 +- mm/huge_memory.c | 4 +-- mm/memcontrol.c | 49 +++++++++++++++----------- mm/page-writeback.c | 26 +++++++------- mm/rmap.c | 14 ++++---- 8 files changed, 69 insertions(+), 54 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 41191b5fb69d..dd582312b91a 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -291,7 +291,7 @@ Lock order is as follows: Page lock (PG_locked bit of page->flags) mm->page_table_lock or split pte_lock - lock_page_memcg (memcg->move_lock) + lock_page_objcg (memcg->move_lock) mapping->i_pages lock lruvec->lru_lock. diff --git a/fs/buffer.c b/fs/buffer.c index 52d257962343..be815d537bdd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -635,14 +635,14 @@ int __set_page_dirty_buffers(struct page *page) * Lock out page's memcg migration to keep PageDirty * synchronized with per-memcg dirty page counters. */ - lock_page_memcg(page); + lock_page_objcg(page); newly_dirty = !TestSetPageDirty(page); spin_unlock(&mapping->private_lock); if (newly_dirty) __set_page_dirty(page, mapping, 1); - unlock_page_memcg(page); + unlock_page_objcg(page); if (newly_dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); @@ -1102,13 +1102,13 @@ void mark_buffer_dirty(struct buffer_head *bh) struct page *page = bh->b_page; struct address_space *mapping = NULL; - lock_page_memcg(page); + lock_page_objcg(page); if (!TestSetPageDirty(page)) { mapping = page_mapping(page); if (mapping) __set_page_dirty(page, mapping, 0); } - unlock_page_memcg(page); + unlock_page_objcg(page); if (mapping) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3d9691395cf3..6c160fa1272a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -415,11 +415,12 @@ static inline struct obj_cgroup *page_objcg(struct page *page) * proper memory cgroup pointer. It's not safe to call this function * against some type of pages, e.g. slab pages or ex-slab pages. * - * For a page any of the following ensures page and objcg binding stability: + * For a page any of the following ensures page and objcg binding stability + * (But the page can be reparented to its parent memcg): * * - the page lock * - LRU isolation - * - lock_page_memcg() + * - lock_page_objcg() * - exclusive reference * * Based on the stable binding of page and objcg, for a page any of the @@ -949,8 +950,8 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); extern bool cgroup_memory_noswap; #endif -void lock_page_memcg(struct page *page); -void unlock_page_memcg(struct page *page); +void lock_page_objcg(struct page *page); +void unlock_page_objcg(struct page *page); void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val); @@ -1120,6 +1121,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, #define MEM_CGROUP_ID_SHIFT 0 #define MEM_CGROUP_ID_MAX 0 +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + return NULL; +} + static inline struct mem_cgroup *page_memcg(struct page *page) { return NULL; @@ -1354,11 +1360,11 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { } -static inline void lock_page_memcg(struct page *page) +static inline void lock_page_objcg(struct page *page) { } -static inline void unlock_page_memcg(struct page *page) +static inline void unlock_page_objcg(struct page *page) { } diff --git a/mm/filemap.c b/mm/filemap.c index dae481293b5d..2df4187cbff7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -112,7 +112,7 @@ * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) * ->inode->i_lock (page_remove_rmap->set_page_dirty) - * ->memcg->move_lock (page_remove_rmap->lock_page_memcg) + * ->memcg->move_lock (page_remove_rmap->lock_page_objcg) * bdi.wb->list_lock (zap_pte_range->set_page_dirty) * ->inode->i_lock (zap_pte_range->set_page_dirty) * ->private_lock (zap_pte_range->__set_page_dirty_buffers) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d6738637feae..74bdc0ebf642 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2227,7 +2227,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, atomic_inc(&page[i]._mapcount); } - lock_page_memcg(page); + lock_page_objcg(page); if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { /* Last compound_mapcount is gone. */ __mod_lruvec_page_state(page, NR_ANON_THPS, @@ -2238,7 +2238,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, atomic_dec(&page[i]._mapcount); } } - unlock_page_memcg(page); + unlock_page_objcg(page); } smp_wmb(); /* make pte visible before pmd */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3688651d85c2..e46fc01c6164 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1286,7 +1286,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, * These functions are safe to use under any of the following conditions: * - page locked * - PageLRU cleared - * - lock_page_memcg() + * - lock_page_objcg() * - page->_refcount is zero */ struct lruvec *lock_page_lruvec(struct page *page) @@ -2097,16 +2097,16 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) } /** - * lock_page_memcg - lock a page and memcg binding + * lock_page_objcg - lock a page and objcg binding * @page: the page * * This function protects unlocked LRU pages from being moved to * another cgroup. * - * It ensures lifetime of the locked memcg. Caller is responsible + * It ensures lifetime of the locked objcg. Caller is responsible * for the lifetime of the page. */ -void lock_page_memcg(struct page *page) +void lock_page_objcg(struct page *page) { struct page *head = compound_head(page); /* rmap on tail pages */ struct mem_cgroup *memcg; @@ -2144,18 +2144,27 @@ void lock_page_memcg(struct page *page) } /* + * The cgroup migration and memory cgroup offlining are serialized by + * cgroup_mutex. If we reach here, it means that we are race with cgroup + * migration (or we are cgroup migration) and the @page cannot be + * reparented to its parent memory cgroup. So during the whole process + * from lock_page_objcg(page) to unlock_page_objcg(page), page_memcg(page) + * and obj_cgroup_memcg(objcg) are stable. + * * When charge migration first begins, we can have multiple * critical sections holding the fast-path RCU lock and one * holding the slowpath move_lock. Track the task who has the - * move_lock for unlock_page_memcg(). + * move_lock for unlock_page_objcg(). */ memcg->move_lock_task = current; memcg->move_lock_flags = flags; } -EXPORT_SYMBOL(lock_page_memcg); +EXPORT_SYMBOL(lock_page_objcg); -static void __unlock_page_memcg(struct mem_cgroup *memcg) +static void __unlock_page_objcg(struct obj_cgroup *objcg) { + struct mem_cgroup *memcg = objcg ? obj_cgroup_memcg(objcg) : NULL; + if (memcg && memcg->move_lock_task == current) { unsigned long flags = memcg->move_lock_flags; @@ -2169,16 +2178,16 @@ static void __unlock_page_memcg(struct mem_cgroup *memcg) } /** - * unlock_page_memcg - unlock a page and memcg binding + * unlock_page_objcg - unlock a page and memcg binding * @page: the page */ -void unlock_page_memcg(struct page *page) +void unlock_page_objcg(struct page *page) { struct page *head = compound_head(page); - __unlock_page_memcg(page_memcg(head)); + __unlock_page_objcg(page_objcg(head)); } -EXPORT_SYMBOL(unlock_page_memcg); +EXPORT_SYMBOL(unlock_page_objcg); struct obj_stock { #ifdef CONFIG_MEMCG_KMEM @@ -2885,7 +2894,7 @@ static void commit_charge(struct page *page, struct obj_cgroup *objcg) * * - the page lock * - LRU isolation - * - lock_page_memcg() + * - lock_page_objcg() * - exclusive reference */ page->memcg_data = (unsigned long)objcg; @@ -5770,7 +5779,7 @@ static int mem_cgroup_move_account(struct page *page, from_vec = mem_cgroup_lruvec(from, pgdat); to_vec = mem_cgroup_lruvec(to, pgdat); - lock_page_memcg(page); + lock_page_objcg(page); if (PageAnon(page)) { if (page_mapped(page)) { @@ -5822,7 +5831,7 @@ static int mem_cgroup_move_account(struct page *page, * with (un)charging, migration, LRU putback, or anything else * that would rely on a stable page's memory cgroup. * - * Note that lock_page_memcg is a memcg lock, not a page lock, + * Note that lock_page_objcg is a memcg lock, not a page lock, * to save space. As soon as we switch page's memory cgroup to a * new memcg that isn't locked, the above state can change * concurrently again. Make sure we're truly done with it. @@ -5834,7 +5843,7 @@ static int mem_cgroup_move_account(struct page *page, page->memcg_data = (unsigned long)to->objcg; - __unlock_page_memcg(from); + __unlock_page_objcg(from->objcg); ret = 0; @@ -6276,7 +6285,7 @@ static void mem_cgroup_move_charge(void) { lru_add_drain_all(); /* - * Signal lock_page_memcg() to take the memcg's move_lock + * Signal lock_page_objcg() to take the memcg's move_lock * while we're moving its pages to another memcg. Then wait * for already started RCU-only updates to finish. */ @@ -6308,14 +6317,14 @@ static void mem_cgroup_move_charge(void) /* * Moving its pages to another memcg is finished. Wait for already * started RCU-only updates to finish to make sure that the caller - * of lock_page_memcg() can unlock the correct move_lock. The + * of lock_page_objcg() can unlock the correct move_lock. The * possible bad scenario would like: * * CPU0: CPU1: * mem_cgroup_move_charge() * walk_page_range() * - * lock_page_memcg(page) + * lock_page_objcg(page) * memcg = page_memcg(page) * spin_lock_irqsave(&memcg->move_lock) * memcg->move_lock_task = current @@ -6326,14 +6335,14 @@ static void mem_cgroup_move_charge(void) * memcg_offline_kmem() * memcg_reparent_objcgs() <== reparented * - * unlock_page_memcg(page) + * unlock_page_objcg(page) * memcg = page_memcg(page) <== memcg has been changed * if (memcg->move_lock_task == current) <== false * spin_unlock_irqrestore(&memcg->move_lock) * * Once mem_cgroup_move_charge() returns (it means that the cgroup_mutex * would be released soon), the page can be reparented to its parent - * memcg. When the unlock_page_memcg() is called for the page, we will + * memcg. When the unlock_page_objcg() is called for the page, we will * miss unlock the move_lock. So using synchronize_rcu to wait for * already started RCU-only updates to finish before this function * returns (mem_cgroup_move_charge() and mem_cgroup_css_offline() are diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 4812a17b288c..68f9619b8d75 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2434,7 +2434,7 @@ EXPORT_SYMBOL(__set_page_dirty_no_writeback); /* * Helper function for set_page_dirty family. * - * Caller must hold lock_page_memcg(). + * Caller must hold lock_page_objcg(). * * NOTE: This relies on being atomic wrt interrupts. */ @@ -2467,7 +2467,7 @@ static void account_page_dirtied(struct page *page, /* * Helper function for deaccounting dirty page without writeback. * - * Caller must hold lock_page_memcg(). + * Caller must hold lock_page_objcg(). */ void account_page_cleaned(struct page *page, struct address_space *mapping, struct bdi_writeback *wb) @@ -2487,7 +2487,7 @@ void account_page_cleaned(struct page *page, struct address_space *mapping, * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. * - * The caller must hold lock_page_memcg(). + * The caller must hold lock_page_objcg(). */ void __set_page_dirty(struct page *page, struct address_space *mapping, int warn) @@ -2518,16 +2518,16 @@ void __set_page_dirty(struct page *page, struct address_space *mapping, */ int __set_page_dirty_nobuffers(struct page *page) { - lock_page_memcg(page); + lock_page_objcg(page); if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); if (!mapping) { - unlock_page_memcg(page); + unlock_page_objcg(page); return 1; } __set_page_dirty(page, mapping, !PagePrivate(page)); - unlock_page_memcg(page); + unlock_page_objcg(page); if (mapping->host) { /* !PageAnon && !swapper_space */ @@ -2535,7 +2535,7 @@ int __set_page_dirty_nobuffers(struct page *page) } return 1; } - unlock_page_memcg(page); + unlock_page_objcg(page); return 0; } EXPORT_SYMBOL(__set_page_dirty_nobuffers); @@ -2659,14 +2659,14 @@ void __cancel_dirty_page(struct page *page) struct bdi_writeback *wb; struct wb_lock_cookie cookie = {}; - lock_page_memcg(page); + lock_page_objcg(page); wb = unlocked_inode_to_wb_begin(inode, &cookie); if (TestClearPageDirty(page)) account_page_cleaned(page, mapping, wb); unlocked_inode_to_wb_end(inode, &cookie); - unlock_page_memcg(page); + unlock_page_objcg(page); } else { ClearPageDirty(page); } @@ -2771,7 +2771,7 @@ int test_clear_page_writeback(struct page *page) struct address_space *mapping = page_mapping(page); int ret; - lock_page_memcg(page); + lock_page_objcg(page); if (mapping && mapping_use_writeback_tags(mapping)) { struct inode *inode = mapping->host; struct backing_dev_info *bdi = inode_to_bdi(inode); @@ -2806,7 +2806,7 @@ int test_clear_page_writeback(struct page *page) dec_zone_page_state(page, NR_ZONE_WRITE_PENDING); inc_node_page_state(page, NR_WRITTEN); } - unlock_page_memcg(page); + unlock_page_objcg(page); return ret; } @@ -2815,7 +2815,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write) struct address_space *mapping = page_mapping(page); int ret, access_ret; - lock_page_memcg(page); + lock_page_objcg(page); if (mapping && mapping_use_writeback_tags(mapping)) { XA_STATE(xas, &mapping->i_pages, page_index(page)); struct inode *inode = mapping->host; @@ -2860,7 +2860,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write) inc_lruvec_page_state(page, NR_WRITEBACK); inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); } - unlock_page_memcg(page); + unlock_page_objcg(page); access_ret = arch_make_page_accessible(page); /* * If writeback has been triggered on a page that cannot be made diff --git a/mm/rmap.c b/mm/rmap.c index 6aebd1747251..d97aeea017ac 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -32,7 +32,7 @@ * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in __set_page_dirty_buffers) - * lock_page_memcg move_lock (in __set_page_dirty_buffers) + * lock_page_objcg move_lock (in __set_page_dirty_buffers) * i_pages lock (widely used) * lruvec->lru_lock (in lock_page_lruvec_irq) * inode->i_lock (in set_page_dirty's __mark_inode_dirty) @@ -1125,7 +1125,7 @@ void do_page_add_anon_rmap(struct page *page, bool first; if (unlikely(PageKsm(page))) - lock_page_memcg(page); + lock_page_objcg(page); else VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -1153,7 +1153,7 @@ void do_page_add_anon_rmap(struct page *page, } if (unlikely(PageKsm(page))) { - unlock_page_memcg(page); + unlock_page_objcg(page); return; } @@ -1213,7 +1213,7 @@ void page_add_file_rmap(struct page *page, bool compound) int i, nr = 1; VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); - lock_page_memcg(page); + lock_page_objcg(page); if (compound && PageTransHuge(page)) { int nr_pages = thp_nr_pages(page); @@ -1244,7 +1244,7 @@ void page_add_file_rmap(struct page *page, bool compound) } __mod_lruvec_page_state(page, NR_FILE_MAPPED, nr); out: - unlock_page_memcg(page); + unlock_page_objcg(page); } static void page_remove_file_rmap(struct page *page, bool compound) @@ -1345,7 +1345,7 @@ static void page_remove_anon_compound_rmap(struct page *page) */ void page_remove_rmap(struct page *page, bool compound) { - lock_page_memcg(page); + lock_page_objcg(page); if (!PageAnon(page)) { page_remove_file_rmap(page, compound); @@ -1384,7 +1384,7 @@ void page_remove_rmap(struct page *page, bool compound) * faster for those pages still in swapcache. */ out: - unlock_page_memcg(page); + unlock_page_objcg(page); } /* From patchwork Thu Sep 16 13:47:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F244CC433EF for ; Thu, 16 Sep 2021 13:53:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5D8161241 for ; Thu, 16 Sep 2021 13:53:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A5D8161241 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 415366B0073; Thu, 16 Sep 2021 09:53:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C46A6B0082; Thu, 16 Sep 2021 09:53:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28BF16B0083; Thu, 16 Sep 2021 09:53:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 1C11A6B0073 for ; Thu, 16 Sep 2021 09:53:21 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D8BA82D3AD for ; Thu, 16 Sep 2021 13:53:20 +0000 (UTC) X-FDA: 78593578560.38.D26A534 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf22.hostedemail.com (Postfix) with ESMTP id 933261904 for ; Thu, 16 Sep 2021 13:53:20 +0000 (UTC) Received: by mail-pg1-f178.google.com with SMTP id e7so6224331pgk.2 for ; Thu, 16 Sep 2021 06:53:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QjeJQ0YTKzcUeZcg+zZ6AwSMvoUSpJAhbMH1V05RNeQ=; b=K1bHsyfWmH1qaHSYncFdFlHW5oAkrOJ7QNUWKGjw16s2Qr0IaVNZdV4PghIUXDGXbb /iu2WOu4QbtPdWZUiQh1puLHVJ6B2KIw1+BwIwfDrnfF2r3C6oc8tuGcE+mvKSDA9H3j LnT2Z/HWp/uDMuN2xuOMHhmsU/k4AzOQa1Wa1HvLcx7p/RQfKNNkAmLB+EpSeeKkBo8A 1IYPpQbt62PZl6HBaMJqVdcgDJibZDVVkhpS+adIkaBKwSTRnOmN/xWP+uTs9pAy0C7r dx+IO7ME2oOHvedSlamFnXcjyLMuFlRd4/BcUUx5mAKPsNfQYI90EiUi9qGVcLyG8IR/ gzgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QjeJQ0YTKzcUeZcg+zZ6AwSMvoUSpJAhbMH1V05RNeQ=; b=ieTITkLPGCcGgCz0bv/HTjaLrjlyyuIVcIGhYc41IlE4QSFx4eDon+iyASsCOLvUBE gYEvWInnjjHUXklD4sZr7yn9Y6AMf0GHZ4hRs+mddrsSPL/wY/ZVjtz5uNmjQQFaZ0fc 41Bd5x+FE6wqZ3UdipUu8WN8R5ESrtSMVoLs9xFpdzIQQuNl9Q+j6PzA2Y6wr0vjkUn6 HgYwUViHa4oUk9oWY6ORDX30v7njMUeJy9HHVfLDJdpYGzAdhT6WI1YodwToepMwT45U S9e7sgkdmg/XiSMWsucwUQHojgkt8tXFXABwYaghD20MQjIJDcfEbDkF20yTqyqRFML4 2wnQ== X-Gm-Message-State: AOAM532UiJcFlsEuprNw8myaebIPrYLFtMP3+/aXT5VrstZstJXowohk GbFB0Dh7sBIXlWyvv494e6cdlg== X-Google-Smtp-Source: ABdhPJzkDf5eikgx9JbA3X/pRE/Z7u02fZ0xZpmaQ8o3VN8/kee21C1WAfd6LGjOPlCFJhJVj4KrYg== X-Received: by 2002:a63:af4b:: with SMTP id s11mr5072323pgo.185.1631800399729; Thu, 16 Sep 2021 06:53:19 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.53.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:53:19 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 12/13] mm: lru: add VM_BUG_ON_PAGE to lru maintenance function Date: Thu, 16 Sep 2021 21:47:47 +0800 Message-Id: <20210916134748.67712-13-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 933261904 X-Stat-Signature: 7ibxad6bf5x7u6npmt1wwnyxojhwbjfr Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=K1bHsyfW; spf=pass (imf22.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800400-10705 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We need to make sure that the page is deleted from or added to the correct lruvec list. So add a VM_BUG_ON_PAGE() to catch invalid users. Signed-off-by: Muchun Song --- include/linux/mm_inline.h | 6 ++++++ mm/vmscan.c | 1 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 355ea1ee32bd..1ca1e2ab8565 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -84,6 +84,8 @@ static __always_inline void add_page_to_lru_list(struct page *page, { enum lru_list lru = page_lru(page); + VM_BUG_ON_PAGE(!page_matches_lruvec(page, lruvec), page); + update_lru_size(lruvec, lru, page_zonenum(page), thp_nr_pages(page)); list_add(&page->lru, &lruvec->lists[lru]); } @@ -93,6 +95,8 @@ static __always_inline void add_page_to_lru_list_tail(struct page *page, { enum lru_list lru = page_lru(page); + VM_BUG_ON_PAGE(!page_matches_lruvec(page, lruvec), page); + update_lru_size(lruvec, lru, page_zonenum(page), thp_nr_pages(page)); list_add_tail(&page->lru, &lruvec->lists[lru]); } @@ -100,6 +104,8 @@ static __always_inline void add_page_to_lru_list_tail(struct page *page, static __always_inline void del_page_from_lru_list(struct page *page, struct lruvec *lruvec) { + VM_BUG_ON_PAGE(!page_matches_lruvec(page, lruvec), page); + list_del(&page->lru); update_lru_size(lruvec, page_lru(page), page_zonenum(page), -thp_nr_pages(page)); diff --git a/mm/vmscan.c b/mm/vmscan.c index 6878a6bff2f8..f38ec21babf3 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2199,7 +2199,6 @@ static unsigned int move_pages_to_lru(struct list_head *list) continue; } - VM_BUG_ON_PAGE(!page_matches_lruvec(page, lruvec), page); add_page_to_lru_list(page, lruvec); nr_pages = thp_nr_pages(page); nr_moved += nr_pages; From patchwork Thu Sep 16 13:47:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12499143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10A11C433EF for ; Thu, 16 Sep 2021 13:53:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AE1B460EB4 for ; Thu, 16 Sep 2021 13:53:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AE1B460EB4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 575F16B0074; Thu, 16 Sep 2021 09:53:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 524066B0082; Thu, 16 Sep 2021 09:53:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39D8E900002; Thu, 16 Sep 2021 09:53:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 275936B0074 for ; Thu, 16 Sep 2021 09:53:27 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DCED82FE06 for ; Thu, 16 Sep 2021 13:53:26 +0000 (UTC) X-FDA: 78593578812.35.F02D501 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf25.hostedemail.com (Postfix) with ESMTP id 967A7B00018B for ; Thu, 16 Sep 2021 13:53:26 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id y17so5900521pfl.13 for ; Thu, 16 Sep 2021 06:53:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bEZAbShFT9/AGDoEdWVnP7By+WvjKrjWlaNhUM27Rf4=; b=3N+GUfr9XPjof0X7F5chysIJDn4foyS7LT7VQVGGK38nKAUyIWdPNtPVOONliIiyQg QhDI5WPPSlAe+bB82BFYrG9+s8QX0XqAl5ktF+7ou0SW6jJBfRIq1RwB/Tf5Rd5A6ehp i/2+oQ5pNkx7icbNgJEtmadfKnR9/ZYiSlRtz795E1jS6fjLntIOw3yVbjqY1sv+SZTf LNcddoncBWbBmLinB64zLewRXzvQcv6TDlb4vlywjk5AuY0QteMrGQGcUmVRF4sCmO2w HPLi6lcZAEQ572sOyF0NsFn2kXqxQCdOgamCGpSvMBRfMAecqNs/BZdK5ikHJy3dePWT K2Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bEZAbShFT9/AGDoEdWVnP7By+WvjKrjWlaNhUM27Rf4=; b=Spu5leeW+L/v/TY7+iMh+Be3MDfIwhsrk1bLdotMEu7rwsEn/EI3c6t8pAbd5XoNno SdyFTco34dawWq6lif08Ox0PbnZ/1PG/GmOAlAe2K0MHmioYC7GI0fmn3GHFg+yGNC0U RFAd2gsRHXdu3XqiM9z7KJrvh+a30LyafmCZaHlIINe3SFtfdUxgr1doLr/K3uyT3B6w P7IoaHqed86WgmOIvAMrQRIB35L0k2L5R7XVI5Sf9GCd79kX4NkrjDEt00ftY5GVO6CM nBC/gGGyuQKjfIjRZ1zksF2vkP0bcZK/qmHp2r4m1D+4W+uPTos5D7NmO2r6IR6XNOUd LOug== X-Gm-Message-State: AOAM531J0+Wy+jgruMtpa3RH0izDVBons/T6DLc8eChaAQWOqVSBURUq NUohkxr8Gz3Jh5XM4HqxBAejSw== X-Google-Smtp-Source: ABdhPJyLFGvZlc7EjoFMDLGWq7+nu0XeRMDLx70MlWsHKWRfFwb/wq5DG37kKh6r7m4bAqo6W+6D9Q== X-Received: by 2002:a62:a20d:0:b029:35b:73da:dc8d with SMTP id m13-20020a62a20d0000b029035b73dadc8dmr5363626pff.54.1631800405700; Thu, 16 Sep 2021 06:53:25 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id o9sm3617443pfh.217.2021.09.16.06.53.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Sep 2021 06:53:25 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v2 13/13] mm: lru: use lruvec lock to serialize memcg changes Date: Thu, 16 Sep 2021 21:47:48 +0800 Message-Id: <20210916134748.67712-14-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210916134748.67712-1-songmuchun@bytedance.com> References: <20210916134748.67712-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 967A7B00018B X-Stat-Signature: tyqnzmbenks3xdgfy9g9hptd58hzaxj7 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=3N+GUfr9; spf=pass (imf25.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1631800406-590148 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As described by commit fc574c23558c ("mm/swap.c: serialize memcg changes in pagevec_lru_move_fn"), TestClearPageLRU() aims to serialize mem_cgroup_move_account() during pagevec_lru_move_fn(). Now lock_page_lruvec*() has the ability to detect whether page memcg has been changed. So we can use lruvec lock to serialize mem_cgroup_move_account() during pagevec_lru_move_fn(). This change is a partial revert of the commit fc574c23558c ("mm/swap.c: serialize memcg changes in pagevec_lru_move_fn"). And pagevec_lru_move_fn() is more hot compare with mem_cgroup_move_account(), removing an atomic operation would be an optimization. Also this change would not dirty cacheline for a page which isn't on the LRU. Signed-off-by: Muchun Song --- mm/compaction.c | 1 + mm/memcontrol.c | 31 +++++++++++++++++++++++++++++++ mm/swap.c | 41 +++++++++++------------------------------ mm/vmscan.c | 9 ++++----- 4 files changed, 47 insertions(+), 35 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index c4ba41de8591..9e74f96f9879 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -529,6 +529,7 @@ static struct lruvec *compact_lock_page_irqsave(struct page *page, spin_lock_irqsave(&lruvec->lru_lock, *flags); out: + /* See the comments in lock_page_lruvec(). */ if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { spin_unlock_irqrestore(&lruvec->lru_lock, *flags); goto retry; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e46fc01c6164..ab65d1c975cc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1298,12 +1298,38 @@ struct lruvec *lock_page_lruvec(struct page *page) lruvec = mem_cgroup_page_lruvec(page); spin_lock(&lruvec->lru_lock); + /* + * The memcg of the page can be changed by any the following routines: + * + * 1) mem_cgroup_move_account() or + * 2) memcg_reparent_objcgs() + * + * The possible bad scenario would like: + * + * CPU0: CPU1: CPU2: + * lruvec = mem_cgroup_page_lruvec() + * + * if (!isolate_lru_page()) + * mem_cgroup_move_account() + * + * memcg_reparent_objcgs() + * + * spin_lock(&lruvec->lru_lock) + * ^^^^^^ + * wrong lock + * + * Either CPU1 or CPU2 can change page memcg, so we need to check + * whether page memcg is changed, if so, we should reacquire the + * new lruvec lock. + */ if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { spin_unlock(&lruvec->lru_lock); goto retry; } /* + * When we reach here, it means that the page_memcg(page) is stable. + * * Preemption is disabled in the internal of spin_lock, which can serve * as RCU read-side critical sections. */ @@ -1321,6 +1347,7 @@ struct lruvec *lock_page_lruvec_irq(struct page *page) lruvec = mem_cgroup_page_lruvec(page); spin_lock_irq(&lruvec->lru_lock); + /* See the comments in lock_page_lruvec(). */ if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { spin_unlock_irq(&lruvec->lru_lock); goto retry; @@ -1341,6 +1368,7 @@ struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) lruvec = mem_cgroup_page_lruvec(page); spin_lock_irqsave(&lruvec->lru_lock, *flags); + /* See the comments in lock_page_lruvec(). */ if (unlikely(lruvec_memcg(lruvec) != page_memcg(page))) { spin_unlock_irqrestore(&lruvec->lru_lock, *flags); goto retry; @@ -5841,7 +5869,10 @@ static int mem_cgroup_move_account(struct page *page, obj_cgroup_get(to->objcg); obj_cgroup_put(from->objcg); + /* See the comments in lock_page_lruvec(). */ + spin_lock(&from_vec->lru_lock); page->memcg_data = (unsigned long)to->objcg; + spin_unlock(&from_vec->lru_lock); __unlock_page_objcg(from->objcg); diff --git a/mm/swap.c b/mm/swap.c index 18d44f978b2e..fa2352f0f9d3 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -189,14 +189,8 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - /* block memcg migration during page moving between lru */ - if (!TestClearPageLRU(page)) - continue; - lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); (*move_fn)(page, lruvec); - - SetPageLRU(page); } if (lruvec) unlock_page_lruvec_irqrestore(lruvec, flags); @@ -206,7 +200,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) { - if (!PageUnevictable(page)) { + if (PageLRU(page) && !PageUnevictable(page)) { del_page_from_lru_list(page, lruvec); ClearPageActive(page); add_page_to_lru_list_tail(page, lruvec); @@ -302,7 +296,7 @@ void lru_note_cost_page(struct page *page) static void __activate_page(struct page *page, struct lruvec *lruvec) { - if (!PageActive(page) && !PageUnevictable(page)) { + if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { int nr_pages = thp_nr_pages(page); del_page_from_lru_list(page, lruvec); @@ -355,12 +349,9 @@ static void activate_page(struct page *page) struct lruvec *lruvec; page = compound_head(page); - if (TestClearPageLRU(page)) { - lruvec = lock_page_lruvec_irq(page); - __activate_page(page, lruvec); - unlock_page_lruvec_irq(lruvec); - SetPageLRU(page); - } + lruvec = lock_page_lruvec_irq(page); + __activate_page(page, lruvec); + unlock_page_lruvec_irq(lruvec); } #endif @@ -515,6 +506,9 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) bool active = PageActive(page); int nr_pages = thp_nr_pages(page); + if (!PageLRU(page)) + return; + if (PageUnevictable(page)) return; @@ -552,7 +546,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { - if (PageActive(page) && !PageUnevictable(page)) { + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { int nr_pages = thp_nr_pages(page); del_page_from_lru_list(page, lruvec); @@ -568,7 +562,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) { - if (PageAnon(page) && PageSwapBacked(page) && + if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { int nr_pages = thp_nr_pages(page); @@ -1033,20 +1027,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) */ void __pagevec_lru_add(struct pagevec *pvec) { - int i; - struct lruvec *lruvec = NULL; - unsigned long flags = 0; - - for (i = 0; i < pagevec_count(pvec); i++) { - struct page *page = pvec->pages[i]; - - lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); - __pagevec_lru_add_fn(page, lruvec); - } - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - release_pages(pvec->pages, pvec->nr); - pagevec_reinit(pvec); + pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn); } /** diff --git a/mm/vmscan.c b/mm/vmscan.c index f38ec21babf3..e9f4a2360465 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4672,18 +4672,17 @@ void check_move_unevictable_pages(struct pagevec *pvec) nr_pages = thp_nr_pages(page); pgscanned += nr_pages; - /* block memcg migration during page moving between lru */ - if (!TestClearPageLRU(page)) + lruvec = relock_page_lruvec_irq(page, lruvec); + + if (!PageLRU(page) || !PageUnevictable(page)) continue; - lruvec = relock_page_lruvec_irq(page, lruvec); - if (page_evictable(page) && PageUnevictable(page)) { + if (page_evictable(page)) { del_page_from_lru_list(page, lruvec); ClearPageUnevictable(page); add_page_to_lru_list(page, lruvec); pgrescued += nr_pages; } - SetPageLRU(page); } if (lruvec) {