From patchwork Wed Nov 23 09:21:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13053287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05AB3C433FE for ; Wed, 23 Nov 2022 09:21:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 968B06B0074; Wed, 23 Nov 2022 04:21:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F20C8E0001; Wed, 23 Nov 2022 04:21:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76BC76B0078; Wed, 23 Nov 2022 04:21:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 651CC6B0074 for ; Wed, 23 Nov 2022 04:21:43 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 30551120461 for ; Wed, 23 Nov 2022 09:21:43 +0000 (UTC) X-FDA: 80164164486.30.FD33709 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf22.hostedemail.com (Postfix) with ESMTP id CC26CC0010 for ; Wed, 23 Nov 2022 09:21:42 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id n9-20020a17090a2bc900b0021010dca313so8655915pje.7 for ; Wed, 23 Nov 2022 01:21:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aeM6HrPa45A9LrLgfXlJ7cGalj3225I7akGQB6U5iS8=; b=VtKptZQ3mFwcqrOi51ztxB+DNgvHVrFNViBXIaddfpS5U7tazXMXjhQi+D0Kb5BOYB kWAX/pxmE4/9q3DRcQTU6r5l/tPiAevFUilKdbrsysVT3IDE0PmkaN5ndghki6ufiH4P N1WA3y4o1ra1Xwt6XDHrQJx8JwWvpvj4ygpSBlCYlACuCqEt2bkTACMt/BaZpCjLomq0 T7rBx6TAjHgH302COt8/S0+/TBk9ZGp+uqj6IW5nN0j9hKQPSLdmZwQT79Gs+kwyrL0G UVfodtsTsXP5K4JI6FPWiS0q3sTEIxzK9lwT2qMViOzun5UhzjHa5z8rp0lumY6eouqQ h2FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aeM6HrPa45A9LrLgfXlJ7cGalj3225I7akGQB6U5iS8=; b=kHTD8gWbx7pxZ7DNlrINqpM3koxtbO4Ux+l5Sr5z/l4zEPdJeVljp3K3qCYFolcTB8 ecE01FoghhiSbpPTmWR85Be0xMniZhpa56EBJpipZLS5gEwqGSADJwGdY606E4ziDp6R PUg3oSsMKN0GyazT8bKMK3rApN3tQeY5TeLY+2hFGgwZ1x8OAz7fNYH29Xuxka6IOXRN ux4HT5Y8iW0zxq/gzp+Zxr3SY6p0pSrobsIbDvW/IMfrYttS8+om4oVLuAo9rBGk8KF/ qIBgPeTjRB9sYfPFMXD0Wzacyf9XVgEiFjsDUwXBs8xzaz8bjuUg+NebfIK3XLB1jnHa w87Q== X-Gm-Message-State: ANoB5pmnnn1LPOW9UNY/OcivqfurQu7ruKcAfC5jmM6QFaGq8Vlty72F 2f1ZQLsb19Vx26smHViBOM/7Vi1mr3hg+9Mb X-Google-Smtp-Source: AA0mqf7fGcAgpowKlMMo+r73S5L06hv/b6o0wgKy1fa3xGSp5eKRBsQn/AzZFtFKxD7Bs7qipf2cPe0Zcmk2hL46 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a62:4e81:0:b0:56c:12c0:a89b with SMTP id c123-20020a624e81000000b0056c12c0a89bmr28895540pfb.40.1669195301704; Wed, 23 Nov 2022 01:21:41 -0800 (PST) Date: Wed, 23 Nov 2022 09:21:30 +0000 In-Reply-To: <20221123092132.2521764-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20221123092132.2521764-1-yosryahmed@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221123092132.2521764-2-yosryahmed@google.com> Subject: [PATCH v2 1/3] mm: memcg: fix stale protection of reclaim target memcg From: Yosry Ahmed To: Shakeel Butt , Roman Gushchin , Johannes Weiner , Michal Hocko , Yu Zhao , Muchun Song Cc: "Matthew Wilcox (Oracle)" , Vasily Averin , Vlastimil Babka , Chris Down , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yosry Ahmed ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669195302; a=rsa-sha256; cv=none; b=T0lMLP0ZYg2tvDNIjKs7xAESjwRMT0XWANWxlAa/Jo8HQ2mr9CI4dpKOzkPK/tQjcR4QZj TSgbUWwRCVdGrvZowC9T42RyqGwyAqHxlwA3GDTbvspdmKiflk0F+SvPDj37K6xYXlu5oE FDK5ZtTBMyxPXOT8ULfngj/vc9e+vJg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VtKptZQ3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3JeZ9YwoKCAwA043Amtyqps00sxq.o0yxuz69-yyw7mow.03s@flex--yosryahmed.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3JeZ9YwoKCAwA043Amtyqps00sxq.o0yxuz69-yyw7mow.03s@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669195302; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aeM6HrPa45A9LrLgfXlJ7cGalj3225I7akGQB6U5iS8=; b=kwXR91T6DVTzTQLPh7RHMAuTuO+QzO0gFu/1kpF7X7lNk0zEVSYmlKCZQgZbAxcp5z2FJG N2odzZHok72waDXykLwaMr02juNNoJJjBph/p62C37zcBcoC5A2SI2EO2IA+4VpugFKwrU AbKnlwKckgL7Yv2BLL2PvAaJ5zkjR2Y= X-Stat-Signature: syffhouu3p4tkjfdkf5crg4fhqczstr4 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CC26CC0010 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VtKptZQ3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3JeZ9YwoKCAwA043Amtyqps00sxq.o0yxuz69-yyw7mow.03s@flex--yosryahmed.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3JeZ9YwoKCAwA043Amtyqps00sxq.o0yxuz69-yyw7mow.03s@flex--yosryahmed.bounces.google.com X-Rspam-User: X-HE-Tag: 1669195302-933657 X-Bogosity: Ham, tests=bogofilter, spamicity=0.002264, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During reclaim, mem_cgroup_calculate_protection() is used to determine the effective protection (emin and elow) values of a memcg. The protection of the reclaim target is ignored, but we cannot set their effective protection to 0 due to a limitation of the current implementation (see comment in mem_cgroup_protection()). Instead, we leave their effective protection values unchaged, and later ignore it in mem_cgroup_protection(). However, mem_cgroup_protection() is called later in shrink_lruvec()->get_scan_count(), which is after the mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a result, the stale effective protection values of the target memcg may lead us to skip reclaiming from the target memcg entirely, before calling shrink_lruvec(). This can be even worse with recursive protection, where the stale target memcg protection can be higher than its standalone protection. See two examples below (a similar version of example (a) is added to test_memcontrol in a later patch). (a) A simple example with proactive reclaim is as follows. Consider the following hierarchy: ROOT | A | B (memory.min = 10M) Consider the following scenario: - B has memory.current = 10M. - The system undergoes global reclaim (or memcg reclaim in A). - In shrink_node_memcgs(): - mem_cgroup_calculate_protection() calculates the effective min (emin) of B as 10M. - mem_cgroup_below_min() returns true for B, we do not reclaim from B. - Now if we want to reclaim 5M from B using proactive reclaim (memory.reclaim), we should be able to, as the protection of the target memcg should be ignored. - In shrink_node_memcgs(): - mem_cgroup_calculate_protection() immediately returns for B without doing anything, as B is the target memcg, relying on mem_cgroup_protection() to ignore B's stale effective min (still 10M). - mem_cgroup_below_min() reads the stale effective min for B and we skip it instead of ignoring its protection as intended, as we never reach mem_cgroup_protection(). (b) An more complex example with recursive protection is as follows. Consider the following hierarchy with memory_recursiveprot: ROOT | A (memory.min = 50M) | B (memory.min = 10M, memory.high = 40M) Consider the following scenario: - B has memory.current = 35M. - The system undergoes global reclaim (target memcg is NULL). - B will have an effective min of 50M (all of A's unclaimed protection). - B will not be reclaimed from. - Now allocate 10M more memory in B, pushing it above it's high limit. - The system undergoes memcg reclaim from B (target memcg is B). - Like example (a), we do nothing in mem_cgroup_calculate_protection(), then call mem_cgroup_below_min(), which will read the stale effective min for B (50M) and skip it. In this case, it's even worse because we are not just considering B's standalone protection (10M), but we are reading a much higher stale protection (50M) which will cause us to not reclaim from B at all. This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") which made mem_cgroup_calculate_protection() only change the state without returning any value. Before that commit, we used to return MEMCG_PROT_NONE for the target memcg, which would cause us to skip the mem_cgroup_below_{min/low}() checks. After that commit we do not return anything and we end up checking the min & low effective protections for the target memcg, which are stale. Update mem_cgroup_supports_protection() to also check if we are reclaiming from the target, and rename it to mem_cgroup_unprotected() (now returns true if we should not protect the memcg, much simpler logic). Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") Signed-off-by: Yosry Ahmed Reviewed-by: Roman Gushchin --- include/linux/memcontrol.h | 31 +++++++++++++++++++++---------- mm/vmscan.c | 11 ++++++----- 2 files changed, 27 insertions(+), 15 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e1644a24009c..d3c8203cab6c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -615,28 +615,32 @@ static inline void mem_cgroup_protection(struct mem_cgroup *root, void mem_cgroup_calculate_protection(struct mem_cgroup *root, struct mem_cgroup *memcg); -static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg) +static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, + struct mem_cgroup *memcg) { /* * The root memcg doesn't account charges, and doesn't support - * protection. + * protection. The target memcg's protection is ignored, see + * mem_cgroup_calculate_protection() and mem_cgroup_protection() */ - return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg); - + return mem_cgroup_disabled() || mem_cgroup_is_root(memcg) || + memcg == target; } -static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg) +static inline bool mem_cgroup_below_low(struct mem_cgroup *target, + struct mem_cgroup *memcg) { - if (!mem_cgroup_supports_protection(memcg)) + if (mem_cgroup_unprotected(target, memcg)) return false; return READ_ONCE(memcg->memory.elow) >= page_counter_read(&memcg->memory); } -static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) +static inline bool mem_cgroup_below_min(struct mem_cgroup *target, + struct mem_cgroup *memcg) { - if (!mem_cgroup_supports_protection(memcg)) + if (mem_cgroup_unprotected(target, memcg)) return false; return READ_ONCE(memcg->memory.emin) >= @@ -1209,12 +1213,19 @@ static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root, { } -static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg) +static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, + struct mem_cgroup *memcg) +{ + return true; +} +static inline bool mem_cgroup_below_low(struct mem_cgroup *target, + struct mem_cgroup *memcg) { return false; } -static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) +static inline bool mem_cgroup_below_min(struct mem_cgroup *target, + struct mem_cgroup *memcg) { return false; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 04d8b88e5216..79ef0fe67518 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4486,7 +4486,7 @@ static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned mem_cgroup_calculate_protection(NULL, memcg); - if (mem_cgroup_below_min(memcg)) + if (mem_cgroup_below_min(NULL, memcg)) return false; need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, swappiness, &nr_to_scan); @@ -5047,8 +5047,9 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control * DEFINE_MAX_SEQ(lruvec); DEFINE_MIN_SEQ(lruvec); - if (mem_cgroup_below_min(memcg) || - (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim)) + if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg) || + (mem_cgroup_below_low(sc->target_mem_cgroup, memcg) && + !sc->memcg_low_reclaim)) return 0; *need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, &nr_to_scan); @@ -6048,13 +6049,13 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) mem_cgroup_calculate_protection(target_memcg, memcg); - if (mem_cgroup_below_min(memcg)) { + if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection. * If there is no reclaimable memory, OOM. */ continue; - } else if (mem_cgroup_below_low(memcg)) { + } else if (mem_cgroup_below_low(target_memcg, memcg)) { /* * Soft protection. * Respect the protection only as long as