From patchwork Thu Feb 10 08:14:34 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Shakeel Butt <shakeelb@google.com>
X-Patchwork-Id: 12741535
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0EB8DC433EF
	for <linux-mm@archiver.kernel.org>; Thu, 10 Feb 2022 08:15:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 20F616B007B; Thu, 10 Feb 2022 03:15:07 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1BF2C6B007D; Thu, 10 Feb 2022 03:15:07 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 0AE766B007E; Thu, 10 Feb 2022 03:15:07 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28])
	by kanga.kvack.org (Postfix) with ESMTP id F35A16B007B
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 03:15:06 -0500 (EST)
Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id BE44AB0C
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:06 +0000 (UTC)
X-FDA: 79126159812.05.E3041C1
Received: from mail-ot1-f74.google.com (mail-ot1-f74.google.com
 [209.85.210.74])
	by imf27.hostedemail.com (Postfix) with ESMTP id 5D71E40002
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:06 +0000 (UTC)
Received: by mail-ot1-f74.google.com with SMTP id
 w16-20020a056830111000b005a3e1a958ccso3031878otq.11
        for <linux-mm@kvack.org>; Thu, 10 Feb 2022 00:15:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=ccL+prLnNui1wQO9OJC//pr1nIoZqpsfs/6QLowjnxA=;
        b=pmWp6X7ryarZ+YnBgBkSYNZxfbT8EUlpu8mZWkUUtaqWZTASUegrZGNcgLJM6LDrfa
         sOyZ5paq6lfQOFy7Maq4sHUcY9hzReQFMlxZLwCDGu/xt0EMaw3WG10U2i4brkjeC1ZN
         cTNPjKYzdbfmwoU3+S2uZkl6/qBNFL5mgkZ5IDt/F1jziJdPtD6JcM5+aDpIlRdfgXeG
         ZwDPGoUDwhz7YwmvtmbB011F5qAf/buznwxzhCM7vdskuMAf5xAw1VCbJL70Q2M0x5Rn
         KdTFwyttN8r6iGaw76vM+QZ7PWhQpOvxdEIg08Ms6pYH71vRsTnDCxQsmfoWxtiRLr65
         uIcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=ccL+prLnNui1wQO9OJC//pr1nIoZqpsfs/6QLowjnxA=;
        b=qWiBX3lKVz1UYOe52hMNqm17SVq6/5HMpI2ldkVlPQQ+imjasyD9vO82bDtWdAXzns
         Orr3E8nRuWrOylEd8+GFhWOpcqTWMQ55d6GmDrM2AXnXRr5OX5GpVoA0jzcp2E7rwGcn
         so+0JoApYSkqfXNqLjrkmtjosRhDTQgxXbFtWHSmg0EiiKtgv7Fd/nc80DAQW9a3hvA0
         PQmRfxWNv9fvnMlqVh6LeVrTxmaPrXXb1DoxOvoCrJ8W1T1iniGvHZefz/24k64sjZBZ
         BQh8uCVLLdMT0KmoCJFrk4GqHE5ARNM3Obr4ubVh3tb/iBz4z21qCF8i3B/tyAr0uCDj
         sqrA==
X-Gm-Message-State: AOAM533zZjtk+0qJubFvU5b/vsQXeJ8Rl4urFlTsGOqDA2Ds2Hi4nyyV
	quC1UFEHU5f8zAbFgAh3IlVYBa7MCZ1KYQ==
X-Google-Smtp-Source: 
 ABdhPJzB3TPZFNIkTQlaHtc3qFPKgIqNHdfKDG4RKgU30NjreLXhCTGWbciK4zFungSiwD0o3q561rgbbNdsbg==
X-Received: from shakeelb.svl.corp.google.com
 ([2620:15c:2cd:202:6801:6774:cb90:c600])
 (user=shakeelb job=sendgmr) by 2002:a05:6808:168a:: with SMTP id
 bb10mr573255oib.246.1644480905495; Thu, 10 Feb 2022 00:15:05 -0800 (PST)
Date: Thu, 10 Feb 2022 00:14:34 -0800
In-Reply-To: <20220210081437.1884008-1-shakeelb@google.com>
Message-Id: <20220210081437.1884008-2-shakeelb@google.com>
Mime-Version: 1.0
References: <20220210081437.1884008-1-shakeelb@google.com>
X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog
Subject: [PATCH 1/4] memcg: refactor mem_cgroup_oom
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.com>,
 Roman Gushchin <guro@fb.com>
Cc: Chris Down <chris@chrisdown.name>,
 Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeelb@google.com>
X-Rspamd-Server: rspam11
X-Rspam-User: 
X-Stat-Signature: 7f5g79w65hbzsgymgk669ddo7fbcn3w4
Authentication-Results: imf27.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=pmWp6X7r;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf27.hostedemail.com: domain of
 3ickEYggKCEY0pismmtjowwotm.kwutqv25-uus3iks.wzo@flex--shakeelb.bounces.google.com
 designates 209.85.210.74 as permitted sender)
 smtp.mailfrom=3ickEYggKCEY0pismmtjowwotm.kwutqv25-uus3iks.wzo@flex--shakeelb.bounces.google.com
X-Rspamd-Queue-Id: 5D71E40002
X-HE-Tag: 1644480906-189467
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

The function mem_cgroup_oom returns enum which has four possible values
but the caller does not care about such values and only care if the
return value is OOM_SUCCESS or not. So, remove the enum altogether and
make mem_cgroup_oom returns a simple bool.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c | 40 +++++++++++++---------------------------
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a0e9d9f12cf5..c40c27822802 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1795,20 +1795,12 @@ static void memcg_oom_recover(struct mem_cgroup *memcg)
 		__wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg);
 }
 
-enum oom_status {
-	OOM_SUCCESS,
-	OOM_FAILED,
-	OOM_ASYNC,
-	OOM_SKIPPED
-};
-
-static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
+static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
 {
-	enum oom_status ret;
-	bool locked;
+	bool locked, ret = false;
 
 	if (order > PAGE_ALLOC_COSTLY_ORDER)
-		return OOM_SKIPPED;
+		return ret;
 
 	memcg_memory_event(memcg, MEMCG_OOM);
 
@@ -1831,14 +1823,13 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 	 * victim and then we have to bail out from the charge path.
 	 */
 	if (memcg->oom_kill_disable) {
-		if (!current->in_user_fault)
-			return OOM_SKIPPED;
-		css_get(&memcg->css);
-		current->memcg_in_oom = memcg;
-		current->memcg_oom_gfp_mask = mask;
-		current->memcg_oom_order = order;
-
-		return OOM_ASYNC;
+		if (current->in_user_fault) {
+			css_get(&memcg->css);
+			current->memcg_in_oom = memcg;
+			current->memcg_oom_gfp_mask = mask;
+			current->memcg_oom_order = order;
+		}
+		return ret;
 	}
 
 	mem_cgroup_mark_under_oom(memcg);
@@ -1849,10 +1840,7 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 		mem_cgroup_oom_notify(memcg);
 
 	mem_cgroup_unmark_under_oom(memcg);
-	if (mem_cgroup_out_of_memory(memcg, mask, order))
-		ret = OOM_SUCCESS;
-	else
-		ret = OOM_FAILED;
+	ret = mem_cgroup_out_of_memory(memcg, mask, order);
 
 	if (locked)
 		mem_cgroup_oom_unlock(memcg);
@@ -2545,7 +2533,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	int nr_retries = MAX_RECLAIM_RETRIES;
 	struct mem_cgroup *mem_over_limit;
 	struct page_counter *counter;
-	enum oom_status oom_status;
 	unsigned long nr_reclaimed;
 	bool passed_oom = false;
 	bool may_swap = true;
@@ -2648,9 +2635,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * a forward progress or bypass the charge if the oom killer
 	 * couldn't make any progress.
 	 */
-	oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
-		       get_order(nr_pages * PAGE_SIZE));
-	if (oom_status == OOM_SUCCESS) {
+	if (mem_cgroup_oom(mem_over_limit, gfp_mask,
+			   get_order(nr_pages * PAGE_SIZE))) {
 		passed_oom = true;
 		nr_retries = MAX_RECLAIM_RETRIES;
 		goto retry;

From patchwork Thu Feb 10 08:14:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Shakeel Butt <shakeelb@google.com>
X-Patchwork-Id: 12741536
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 12D0AC433F5
	for <linux-mm@archiver.kernel.org>; Thu, 10 Feb 2022 08:15:17 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 78DE56B007D; Thu, 10 Feb 2022 03:15:16 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 717016B007E; Thu, 10 Feb 2022 03:15:16 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 590306B0080; Thu, 10 Feb 2022 03:15:16 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25])
	by kanga.kvack.org (Postfix) with ESMTP id 472AA6B007D
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 03:15:16 -0500 (EST)
Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 19EE5204F9
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:16 +0000 (UTC)
X-FDA: 79126160232.05.BCD0283
Received: from mail-oo1-f73.google.com (mail-oo1-f73.google.com
 [209.85.161.73])
	by imf04.hostedemail.com (Postfix) with ESMTP id AA5354000C
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:15 +0000 (UTC)
Received: by mail-oo1-f73.google.com with SMTP id
 v10-20020a4ade8a000000b003177422e81dso3206602oou.3
        for <linux-mm@kvack.org>; Thu, 10 Feb 2022 00:15:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=fMIRROeDCCygpucOKIXnuDRE53blx2Fd4oRcgvywHRM=;
        b=FRPbX4D29kIW+qlbRohcly9kDDAZugoZr12c4pn83sXCQupHVz5i02c232pWifGcJX
         f2VKxeODp/d70mILcDiRehpZ84CcGH2l4Jn8IWMs4foCylEa2D1UArVeTuzjixJC4rGk
         b0lpopdNyBq7R31A1Xpm2gJf9dqmgq86WKoAW4JWyZkT0TVi9iUwsw/k72UOWMw0GRU/
         QwbJNQCB1nO+jRIaSvwfKU42p7UXYBKyV9mgSFEGC+oPAVPT/hlWXU2lWmHOp6OqocJT
         xCfPQ42G1pJE/BGDOmZi69jA2Ce89s0XXfjxxywhmito7sKfPstoFHyxIl1vspcP3YBe
         8Ucw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=fMIRROeDCCygpucOKIXnuDRE53blx2Fd4oRcgvywHRM=;
        b=5NM98k6SlosaiFCyRE9ukr7cZKzDU8IGW8NaeD8o1oKecOriKkh1/mDEKxg0qid8dQ
         DIL9JGRi0STi/99hRg6d+7CLdXpfE8z5LaG4ju+Kk3JNvafs/8xIp/eLDUtQE9wOSJC+
         bbapYozEHe6H1KccN0tkiWT2pCAaJnaOPOATnSNU1v75FfhygaKJyGLsYAAf8z3gcleJ
         Id6xXJe9DjQRN/aIbFb6rkV2GMx7EMT3aCxR+EuPv8IUYa8vePsWyo4GwPY7DuX72+A6
         YC6OkxK9hqGgaTiFzl0WvaEWox3bTsHCBLpEwdhnShrWzs0JxYwI0qywY7nK1X9Drb2+
         N76g==
X-Gm-Message-State: AOAM5308BBv5ynwkxM57+kQQ6oKV2lTF5729TfMrwv9e38/YGETHVZUf
	WIiGOdALMWvXQBiFbrGp7So3pWiUVVUD6g==
X-Google-Smtp-Source: 
 ABdhPJz+on117snT7bOCtSLvD9iOqUc+SsXLOOIUNv5jfVv7ihoMT5nMaQ/B9QQRCQ3v0WFDclrnAWBpAbrChg==
X-Received: from shakeelb.svl.corp.google.com
 ([2620:15c:2cd:202:6801:6774:cb90:c600])
 (user=shakeelb job=sendgmr) by 2002:a05:6870:9514:: with SMTP id
 u20mr427457oal.84.1644480914945; Thu, 10 Feb 2022 00:15:14 -0800 (PST)
Date: Thu, 10 Feb 2022 00:14:35 -0800
In-Reply-To: <20220210081437.1884008-1-shakeelb@google.com>
Message-Id: <20220210081437.1884008-3-shakeelb@google.com>
Mime-Version: 1.0
References: <20220210081437.1884008-1-shakeelb@google.com>
X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog
Subject: [PATCH 2/4] memcg: unify force charging conditions
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.com>,
 Roman Gushchin <guro@fb.com>
Cc: Chris Down <chris@chrisdown.name>,
 Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeelb@google.com>
Authentication-Results: imf04.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=FRPbX4D2;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf04.hostedemail.com: domain of
 3kskEYggKCE89yr1vv2sx55x2v.t532z4BE-331Crt1.58x@flex--shakeelb.bounces.google.com
 designates 209.85.161.73 as permitted sender)
 smtp.mailfrom=3kskEYggKCE89yr1vv2sx55x2v.t532z4BE-331Crt1.58x@flex--shakeelb.bounces.google.com
X-Rspamd-Server: rspam03
X-Rspam-User: 
X-Stat-Signature: 8gbs4ub3rmrjqe9wi6tmzs17tqtchzzq
X-Rspamd-Queue-Id: AA5354000C
X-HE-Tag: 1644480915-949016
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Currently the kernel force charges the allocations which have __GFP_HIGH
flag without triggering the memory reclaim. __GFP_HIGH indicates that
the caller is high priority and since commit 869712fd3de5 ("mm:
memcontrol: fix network errors from failing __GFP_ATOMIC charges") the
kernel let such allocations do force charging. Please note that
__GFP_ATOMIC has been replaced by __GFP_HIGH.

__GFP_HIGH does not tell if the caller can block or can trigger reclaim.
There are separate checks to determine that. So, there is no need to
skip reclaim for __GFP_HIGH allocations. So, handle __GFP_HIGH together
with __GFP_NOFAIL which also does force charging.

Please note that this is a noop change as there are no __GFP_HIGH
allocators in kernel which also have __GFP_ACCOUNT (or SLAB_ACCOUNT) and
does not allow reclaim for now. The reason for this patch is to simplify
the reasoning of the following patches.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com> to the patch.
---
 mm/memcontrol.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c40c27822802..ae73a40818b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2560,15 +2560,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		goto retry;
 	}
 
-	/*
-	 * Memcg doesn't have a dedicated reserve for atomic
-	 * allocations. But like the global atomic pool, we need to
-	 * put the burden of reclaim on regular allocation requests
-	 * and let these go through as privileged allocations.
-	 */
-	if (gfp_mask & __GFP_HIGH)
-		goto force;
-
 	/*
 	 * Prevent unbounded recursion when reclaim operations need to
 	 * allocate memory. This might exceed the limits temporarily,
@@ -2642,7 +2633,13 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		goto retry;
 	}
 nomem:
-	if (!(gfp_mask & __GFP_NOFAIL))
+	/*
+	 * Memcg doesn't have a dedicated reserve for atomic
+	 * allocations. But like the global atomic pool, we need to
+	 * put the burden of reclaim on regular allocation requests
+	 * and let these go through as privileged allocations.
+	 */
+	if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)))
 		return -ENOMEM;
 force:
 	/*

From patchwork Thu Feb 10 08:14:36 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Shakeel Butt <shakeelb@google.com>
X-Patchwork-Id: 12741537
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EC8A3C433EF
	for <linux-mm@archiver.kernel.org>; Thu, 10 Feb 2022 08:15:23 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 89FA36B007E; Thu, 10 Feb 2022 03:15:23 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 827326B0080; Thu, 10 Feb 2022 03:15:23 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6C8C16B0081; Thu, 10 Feb 2022 03:15:23 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com
 [216.40.44.152])
	by kanga.kvack.org (Postfix) with ESMTP id 5A4F26B007E
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 03:15:23 -0500 (EST)
Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com
 [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 14F31813BF82
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:23 +0000 (UTC)
X-FDA: 79126160526.29.2D2F4AA
Received: from mail-ot1-f74.google.com (mail-ot1-f74.google.com
 [209.85.210.74])
	by imf26.hostedemail.com (Postfix) with ESMTP id AFAFE140006
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:22 +0000 (UTC)
Received: by mail-ot1-f74.google.com with SMTP id
 x55-20020a05683040b700b005a08a3347ccso3088935ott.0
        for <linux-mm@kvack.org>; Thu, 10 Feb 2022 00:15:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=v2dUdvpYQ/iwdEjD1AnhdevDWivIB7TWdosyZFRBsLI=;
        b=g+dh5aG43qHs8s9I0Dm5PPC7gF4a4H5V+2N+AaiObReNyWND1Yc0Yljd/vLAqbeRpf
         DCo24Pnebx9fwvyzkzozL3PKlIM891ONrzEsvorNiN2nlwQ3Lr2iWbOuzZfT4tSz5pjt
         z3+VbLfopaHo77vMy0mnVbvK86zpSjcYkEXToNV2O17dBHUO6YeXlUvf3nYcH1uQE27W
         y5DXZb8qKsZcfdXnbT0FHR3p/oDSvD2321eDHiHDrUIJtW0h+1OSXnrTvwQgQniLw/bI
         YUXScMWV9Cc9mPEZHpUYVZsnLLyodPFa21Wgikiv6sNcv8GOJi94G2lwDVFM+Gu0GTTs
         lvLQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=v2dUdvpYQ/iwdEjD1AnhdevDWivIB7TWdosyZFRBsLI=;
        b=nvQ1MRT0IALD+Y1YdfiZhacvZao9KL9MmPV1WUV5zl7+bY74f0DnVpXR1BNmhemwlo
         hVyR9hMHlBaGlA45UVihcuZwj3yEQ1lLl2XsygWUAioL7aNmaqfqQw2qEsBhdNbHqRcJ
         526OlZmnV4Mpnj9Tf2m/bkNTum4QMphUqhuvsLdZ0SpdRnqR1OKBszmeeo0BsTDOgcyu
         +za6apUt9PLI/erGQH2pfaWQOgcLQKhpyrE9u90q+dK0nTApclkgg8biw6o+QXX5+i1b
         2vUXx/Qc2AracyYFdRkrIRQEJSsJzip9vTmsR8xVFSJcwfyTAEqKuEncrHVJOBAHIdw+
         A1lA==
X-Gm-Message-State: AOAM531CIonDAurrfD+Htn2vgpQ7S5cKxClLsfJhhw9lzosqARmhLAIq
	rCPq+P3F57YcrG4FsVPb7wd7FS43gGG6Cw==
X-Google-Smtp-Source: 
 ABdhPJyLGNrEeAMMZ2oJxj8hocb+ozPNlazDugyANf5tJ8pUYPRT8f3ppFHvqK/lbid6C+ztKAoSho8ci8v9Hw==
X-Received: from shakeelb.svl.corp.google.com
 ([2620:15c:2cd:202:6801:6774:cb90:c600])
 (user=shakeelb job=sendgmr) by 2002:a05:6808:bd3:: with SMTP id
 o19mr540049oik.331.1644480922034; Thu, 10 Feb 2022 00:15:22 -0800 (PST)
Date: Thu, 10 Feb 2022 00:14:36 -0800
In-Reply-To: <20220210081437.1884008-1-shakeelb@google.com>
Message-Id: <20220210081437.1884008-4-shakeelb@google.com>
Mime-Version: 1.0
References: <20220210081437.1884008-1-shakeelb@google.com>
X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog
Subject: [PATCH 3/4] selftests: memcg: test high limit for single entry
 allocation
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.com>,
 Roman Gushchin <guro@fb.com>
Cc: Chris Down <chris@chrisdown.name>,
 Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeelb@google.com>
X-Rspamd-Server: rspam11
X-Rspam-User: 
X-Stat-Signature: 3jkjnz8g6a4k4ccqddiy4hki3gzuf3j1
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=g+dh5aG4;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf26.hostedemail.com: domain of
 3mskEYggKCFcH6z933A05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--shakeelb.bounces.google.com
 designates 209.85.210.74 as permitted sender)
 smtp.mailfrom=3mskEYggKCFcH6z933A05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--shakeelb.bounces.google.com
X-Rspamd-Queue-Id: AFAFE140006
X-HE-Tag: 1644480922-870507
X-Bogosity: Ham, tests=bogofilter, spamicity=0.076413, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Test the enforcement of memory.high limit for large amount of memory
allocation within a single kernel entry. There are valid use-cases
where the application can trigger large amount of memory allocation
within a single syscall e.g. mlock() or mmap(MAP_POPULATE). Make sure
memory.high limit enforcement works for such use-cases.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 tools/testing/selftests/cgroup/cgroup_util.c  | 15 +++-
 tools/testing/selftests/cgroup/cgroup_util.h  |  1 +
 .../selftests/cgroup/test_memcontrol.c        | 78 +++++++++++++++++++
 3 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index 0cf7e90c0052..dbaa7aabbb4a 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -583,7 +583,7 @@ int clone_into_cgroup_run_wait(const char *cgroup)
 	return 0;
 }
 
-int cg_prepare_for_wait(const char *cgroup)
+static int __prepare_for_wait(const char *cgroup, const char *filename)
 {
 	int fd, ret = -1;
 
@@ -591,8 +591,7 @@ int cg_prepare_for_wait(const char *cgroup)
 	if (fd == -1)
 		return fd;
 
-	ret = inotify_add_watch(fd, cg_control(cgroup, "cgroup.events"),
-				IN_MODIFY);
+	ret = inotify_add_watch(fd, cg_control(cgroup, filename), IN_MODIFY);
 	if (ret == -1) {
 		close(fd);
 		fd = -1;
@@ -601,6 +600,16 @@ int cg_prepare_for_wait(const char *cgroup)
 	return fd;
 }
 
+int cg_prepare_for_wait(const char *cgroup)
+{
+	return __prepare_for_wait(cgroup, "cgroup.events");
+}
+
+int memcg_prepare_for_wait(const char *cgroup)
+{
+	return __prepare_for_wait(cgroup, "memory.events");
+}
+
 int cg_wait_for(int fd)
 {
 	int ret = -1;
diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h
index 4f66d10626d2..628738532ac9 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -55,4 +55,5 @@ extern int clone_reap(pid_t pid, int options);
 extern int clone_into_cgroup_run_wait(const char *cgroup);
 extern int dirfd_open_opath(const char *dir);
 extern int cg_prepare_for_wait(const char *cgroup);
+extern int memcg_prepare_for_wait(const char *cgroup);
 extern int cg_wait_for(int fd);
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index c19a97dd02d4..36ccf2322e21 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -16,6 +16,7 @@
 #include <netinet/in.h>
 #include <netdb.h>
 #include <errno.h>
+#include <sys/mman.h>
 
 #include "../kselftest.h"
 #include "cgroup_util.h"
@@ -628,6 +629,82 @@ static int test_memcg_high(const char *root)
 	return ret;
 }
 
+static int alloc_anon_mlock(const char *cgroup, void *arg)
+{
+	size_t size = (size_t)arg;
+	void *buf;
+
+	buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
+		   0, 0);
+	if (buf == MAP_FAILED)
+		return -1;
+
+	mlock(buf, size);
+	munmap(buf, size);
+	return 0;
+}
+
+/*
+ * This test checks that memory.high is able to throttle big single shot
+ * allocation i.e. large allocation within one kernel entry.
+ */
+static int test_memcg_high_sync(const char *root)
+{
+	int ret = KSFT_FAIL, pid, fd = -1;
+	char *memcg;
+	long pre_high, pre_max;
+	long post_high, post_max;
+
+	memcg = cg_name(root, "memcg_test");
+	if (!memcg)
+		goto cleanup;
+
+	if (cg_create(memcg))
+		goto cleanup;
+
+	pre_high = cg_read_key_long(memcg, "memory.events", "high ");
+	pre_max = cg_read_key_long(memcg, "memory.events", "max ");
+	if (pre_high < 0 || pre_max < 0)
+		goto cleanup;
+
+	if (cg_write(memcg, "memory.swap.max", "0"))
+		goto cleanup;
+
+	if (cg_write(memcg, "memory.high", "30M"))
+		goto cleanup;
+
+	if (cg_write(memcg, "memory.max", "140M"))
+		goto cleanup;
+
+	fd = memcg_prepare_for_wait(memcg);
+	if (fd < 0)
+		goto cleanup;
+
+	pid = cg_run_nowait(memcg, alloc_anon_mlock, (void *)MB(200));
+	if (pid < 0)
+		goto cleanup;
+
+	cg_wait_for(fd);
+
+	post_high = cg_read_key_long(memcg, "memory.events", "high ");
+	post_max = cg_read_key_long(memcg, "memory.events", "max ");
+	if (post_high < 0 || post_max < 0)
+		goto cleanup;
+
+	if (pre_high == post_high || pre_max != post_max)
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (fd >= 0)
+		close(fd);
+	cg_destroy(memcg);
+	free(memcg);
+
+	return ret;
+}
+
 /*
  * This test checks that memory.max limits the amount of
  * memory which can be consumed by either anonymous memory
@@ -1180,6 +1257,7 @@ struct memcg_test {
 	T(test_memcg_min),
 	T(test_memcg_low),
 	T(test_memcg_high),
+	T(test_memcg_high_sync),
 	T(test_memcg_max),
 	T(test_memcg_oom_events),
 	T(test_memcg_swap_max),

From patchwork Thu Feb 10 08:14:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Shakeel Butt <shakeelb@google.com>
X-Patchwork-Id: 12741538
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7550AC433FE
	for <linux-mm@archiver.kernel.org>; Thu, 10 Feb 2022 08:15:31 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id F2DCD6B0080; Thu, 10 Feb 2022 03:15:30 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id EB7046B0081; Thu, 10 Feb 2022 03:15:30 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D315F6B0082; Thu, 10 Feb 2022 03:15:30 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com
 [216.40.44.39])
	by kanga.kvack.org (Postfix) with ESMTP id C03596B0080
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 03:15:30 -0500 (EST)
Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com
 [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 851F096769
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:30 +0000 (UTC)
X-FDA: 79126160820.18.DC567A9
Received: from mail-ot1-f73.google.com (mail-ot1-f73.google.com
 [209.85.210.73])
	by imf26.hostedemail.com (Postfix) with ESMTP id 24998140002
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 08:15:30 +0000 (UTC)
Received: by mail-ot1-f73.google.com with SMTP id
 l1-20020a9d7341000000b0059c2046f9edso3066427otk.3
        for <linux-mm@kvack.org>; Thu, 10 Feb 2022 00:15:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=0q8eaD2o+1BiQuwmTsW0+szf3blwzzBLhBN3MbGm/Z4=;
        b=jyKlPCDkTnjAVJRYaFvJfM0UrXF8oPSNP+C9pRbjAFL/2kYmZym/QI9V7bMI4CCfqM
         raQnGFMt5g12hmn2xl5bXPKm5iaY1T1yYxOO4aN+Qq3GfWTrvtO0TsG3hIPy98ECusbG
         f/Q5XzELsMUr/WWsjDh11Ve7lVAg+83rLxiKXZWhc3ssbCKLAOgHKTqI1F/M7IZmq1ag
         8wo3/fj1rPCuAO4mOHsujBMF56nJjl6LT08SnzfcS8qJtgGjlcjamO0m/wXUlTbPRjxQ
         pOZYt7iJheAFqmgQbabWmhfv8dK4je3ejK2v+scJR/f33hH/RKwQOQIvUSBdwhwziGnJ
         PZfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=0q8eaD2o+1BiQuwmTsW0+szf3blwzzBLhBN3MbGm/Z4=;
        b=VjIn6r8UNCFXiAdZP/3dUUyKKO6g1PYebE0GHXltXSP5IOUunlRTevg8FRT02qaOr9
         CWh8DhNObMW8h/Y1ZYjby0gyxMg6+oD3hHiuQaehxFY0sBW0yTcRoGa7VNf0rKyn0qDj
         BKHqbbml0XAQIq+5qv54CmFs8GoEOToXgJBaX34nueNemHGHMVntOOVH35DU2JC8hooE
         LjpqYyEEn94CPJSFy1WoumRCp7kWW3YmgPx1K1Etw5h7A0BT5gqowKbXvioZYiu9/xxw
         TCrBLq8vcsPDmFFuFn1c9P1RYA1xUtOR99Z6EEPBXfNGqgBXOqDZ5t2ZRmBhj42kUlZe
         CbhQ==
X-Gm-Message-State: AOAM533eeaYbHu+GhZrm9JPCAZmD7U035BcD9wYwpuS88DkARXI08fKL
	C+H7Nt5QE/wXLdGyBCyxtX9Mkvtm40K/tw==
X-Google-Smtp-Source: 
 ABdhPJxDDi5uTSPIim6lU9N97Hq03MAhHQZld9yjwTN/pWq6cQe/8ubNsnlSt32z7xLxxYfl+98lUBj5CYNvJw==
X-Received: from shakeelb.svl.corp.google.com
 ([2620:15c:2cd:202:6801:6774:cb90:c600])
 (user=shakeelb job=sendgmr) by 2002:a05:6871:506:: with SMTP id
 s6mr454478oal.62.1644480929414; Thu, 10 Feb 2022 00:15:29 -0800 (PST)
Date: Thu, 10 Feb 2022 00:14:37 -0800
In-Reply-To: <20220210081437.1884008-1-shakeelb@google.com>
Message-Id: <20220210081437.1884008-5-shakeelb@google.com>
Mime-Version: 1.0
References: <20220210081437.1884008-1-shakeelb@google.com>
X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog
Subject: [PATCH 4/4] memcg: synchronously enforce memory.high
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.com>,
 Roman Gushchin <guro@fb.com>
Cc: Chris Down <chris@chrisdown.name>,
 Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeelb@google.com>
X-Rspamd-Queue-Id: 24998140002
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=jyKlPCDk;
	spf=pass (imf26.hostedemail.com: domain of
 3ockEYggKCF4OD6GAAH7CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--shakeelb.bounces.google.com
 designates 209.85.210.73 as permitted sender)
 smtp.mailfrom=3ockEYggKCF4OD6GAAH7CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--shakeelb.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Stat-Signature: nna9ckder1mn93jix7g4io63jyz788r1
X-Rspam-User: 
X-Rspamd-Server: rspam10
X-HE-Tag: 1644480930-76823
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

The high limit is used to throttle the workload without invoking the
oom-killer. Recently we tried to use the high limit to right size our
internal workloads. More specifically dynamically adjusting the limits
of the workload without letting the workload get oom-killed. However due
to the limitation of the implementation of high limit enforcement, we
observed the mechanism fails for some real workloads.

The high limit is enforced on return-to-userspace i.e. the kernel let
the usage goes over the limit and when the execution returns to
userspace, the high reclaim is triggered and the process can get
throttled as well. However this mechanism fails for workloads which do
large allocations in a single kernel entry e.g. applications that
mlock() a large chunk of memory in a single syscall. Such applications
bypass the high limit and can trigger the oom-killer.

To make high limit enforcement more robust, this patch make the limit
enforcement synchronous. However there are couple of open questions to
enforce high limit synchronously. What should be the behavior of
__GFP_NORETRY allocaion on hitting high limit? Similar question arise
for allocations which do not allow blocking. This patch took the
approach to keep the previous behavior i.e. let such allocations not
throttle synchronously but rely on the return-to-userspace mechanism to
throttle processes triggering such allocations.

This patch does not remove the return-to-userspace high limit
enforcement due to the reason mentioned in the previous para. Also the
allocations where the memory usage is below high limit but the swap
usage is above swap's high limit, such allocators are throttled in the
return-to-userspace.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 include/linux/page_counter.h |  10 +++
 mm/memcontrol.c              | 124 ++++++++++++++++++++++-------------
 mm/page_counter.c            |  59 +++++++++++++----
 3 files changed, 132 insertions(+), 61 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 679591301994..08413a5c73f9 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -60,6 +60,16 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages);
 bool page_counter_try_charge(struct page_counter *counter,
 			     unsigned long nr_pages,
 			     struct page_counter **fail);
+
+enum charge_status {
+	CHG_SUCCESS,
+	CHG_FAILED_HIGH,
+	CHG_FAILED_MAX
+};
+enum charge_status page_counter_try_charge_high(struct page_counter *counter,
+						unsigned long nr_pages,
+						struct page_counter **fail);
+
 void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages);
 void page_counter_set_min(struct page_counter *counter, unsigned long nr_pages);
 void page_counter_set_low(struct page_counter *counter, unsigned long nr_pages);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae73a40818b0..97833cade59e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1290,18 +1290,20 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 /**
  * mem_cgroup_margin - calculate chargeable space of a memory cgroup
  * @memcg: the memory cgroup
+ * @high: check high limit instead of max
  *
  * Returns the maximum amount of memory @mem can be charged with, in
  * pages.
  */
-static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
+static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg, bool high)
 {
 	unsigned long margin = 0;
 	unsigned long count;
 	unsigned long limit;
 
 	count = page_counter_read(&memcg->memory);
-	limit = READ_ONCE(memcg->memory.max);
+	limit = high ? READ_ONCE(memcg->memory.high) :
+			READ_ONCE(memcg->memory.max);
 	if (count < limit)
 		margin = limit - count;
 
@@ -1607,7 +1609,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (mutex_lock_killable(&oom_lock))
 		return true;
 
-	if (mem_cgroup_margin(memcg) >= (1 << order))
+	if (mem_cgroup_margin(memcg, false) >= (1 << order))
 		goto unlock;
 
 	/*
@@ -2443,6 +2445,39 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
 	return penalty_jiffies * nr_pages / MEMCG_CHARGE_BATCH;
 }
 
+static unsigned long calculate_penalty_jiffies(struct mem_cgroup *memcg,
+					       unsigned long nr_pages)
+{
+	unsigned long penalty_jiffies;
+
+	/*
+	 * memory.high is breached and reclaim is unable to keep up. Throttle
+	 * allocators proactively to slow down excessive growth.
+	 */
+	penalty_jiffies = calculate_high_delay(memcg, nr_pages,
+					       mem_find_max_overage(memcg));
+
+	penalty_jiffies += calculate_high_delay(memcg, nr_pages,
+						swap_find_max_overage(memcg));
+
+	/*
+	 * Clamp the max delay per usermode return so as to still keep the
+	 * application moving forwards and also permit diagnostics, albeit
+	 * extremely slowly.
+	 */
+	penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES);
+
+	/*
+	 * Don't sleep if the amount of jiffies this memcg owes us is so low
+	 * that it's not even worth doing, in an attempt to be nice to those who
+	 * go only a small amount over their memory.high value and maybe haven't
+	 * been aggressively reclaimed enough yet.
+	 */
+	if (penalty_jiffies <= HZ / 100)
+		return 0;
+	return penalty_jiffies;
+}
+
 /*
  * Scheduled by try_charge() to be executed from the userland return path
  * and reclaims memory over the high limit.
@@ -2477,30 +2512,8 @@ void mem_cgroup_handle_over_high(void)
 				    in_retry ? SWAP_CLUSTER_MAX : nr_pages,
 				    GFP_KERNEL);
 
-	/*
-	 * memory.high is breached and reclaim is unable to keep up. Throttle
-	 * allocators proactively to slow down excessive growth.
-	 */
-	penalty_jiffies = calculate_high_delay(memcg, nr_pages,
-					       mem_find_max_overage(memcg));
-
-	penalty_jiffies += calculate_high_delay(memcg, nr_pages,
-						swap_find_max_overage(memcg));
-
-	/*
-	 * Clamp the max delay per usermode return so as to still keep the
-	 * application moving forwards and also permit diagnostics, albeit
-	 * extremely slowly.
-	 */
-	penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES);
-
-	/*
-	 * Don't sleep if the amount of jiffies this memcg owes us is so low
-	 * that it's not even worth doing, in an attempt to be nice to those who
-	 * go only a small amount over their memory.high value and maybe haven't
-	 * been aggressively reclaimed enough yet.
-	 */
-	if (penalty_jiffies <= HZ / 100)
+	penalty_jiffies = calculate_penalty_jiffies(memcg, nr_pages);
+	if (!penalty_jiffies)
 		goto out;
 
 	/*
@@ -2534,6 +2547,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct mem_cgroup *mem_over_limit;
 	struct page_counter *counter;
 	unsigned long nr_reclaimed;
+	enum charge_status status;
 	bool passed_oom = false;
 	bool may_swap = true;
 	bool drained = false;
@@ -2545,7 +2559,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 
 	if (!do_memsw_account() ||
 	    page_counter_try_charge(&memcg->memsw, batch, &counter)) {
-		if (page_counter_try_charge(&memcg->memory, batch, &counter))
+		status = page_counter_try_charge_high(&memcg->memory, batch,
+						      &counter);
+		if (status == CHG_SUCCESS)
 			goto done_restock;
 		if (do_memsw_account())
 			page_counter_uncharge(&memcg->memsw, batch);
@@ -2553,6 +2569,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	} else {
 		mem_over_limit = mem_cgroup_from_counter(counter, memsw);
 		may_swap = false;
+		status = CHG_FAILED_MAX;
 	}
 
 	if (batch > nr_pages) {
@@ -2575,14 +2592,15 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (!gfpflags_allow_blocking(gfp_mask))
 		goto nomem;
 
-	memcg_memory_event(mem_over_limit, MEMCG_MAX);
+	memcg_memory_event(mem_over_limit,
+			   status == CHG_FAILED_MAX ? MEMCG_MAX : MEMCG_HIGH);
 
 	psi_memstall_enter(&pflags);
 	nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages,
 						    gfp_mask, may_swap);
 	psi_memstall_leave(&pflags);
 
-	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
+	if (mem_cgroup_margin(mem_over_limit, status == CHG_FAILED_HIGH) >= nr_pages)
 		goto retry;
 
 	if (!drained) {
@@ -2614,23 +2632,34 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (nr_retries--)
 		goto retry;
 
-	if (gfp_mask & __GFP_RETRY_MAYFAIL)
-		goto nomem;
+	if (status == CHG_FAILED_MAX) {
+		if (gfp_mask & __GFP_RETRY_MAYFAIL)
+			goto nomem;
 
-	/* Avoid endless loop for tasks bypassed by the oom killer */
-	if (passed_oom && task_is_dying())
-		goto nomem;
+		/* Avoid endless loop for tasks bypassed by the oom killer */
+		if (passed_oom && task_is_dying())
+			goto nomem;
 
-	/*
-	 * keep retrying as long as the memcg oom killer is able to make
-	 * a forward progress or bypass the charge if the oom killer
-	 * couldn't make any progress.
-	 */
-	if (mem_cgroup_oom(mem_over_limit, gfp_mask,
-			   get_order(nr_pages * PAGE_SIZE))) {
-		passed_oom = true;
-		nr_retries = MAX_RECLAIM_RETRIES;
-		goto retry;
+		/*
+		 * keep retrying as long as the memcg oom killer is able to make
+		 * a forward progress or bypass the charge if the oom killer
+		 * couldn't make any progress.
+		 */
+		if (mem_cgroup_oom(mem_over_limit, gfp_mask,
+				   get_order(nr_pages * PAGE_SIZE))) {
+			passed_oom = true;
+			nr_retries = MAX_RECLAIM_RETRIES;
+			goto retry;
+		}
+	} else {
+		unsigned long penalty_jiffies = calculate_penalty_jiffies(memcg,
+								nr_pages);
+
+		if (penalty_jiffies) {
+			psi_memstall_enter(&pflags);
+			schedule_timeout_killable(penalty_jiffies);
+			psi_memstall_leave(&pflags);
+		}
 	}
 nomem:
 	/*
@@ -2639,7 +2668,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * put the burden of reclaim on regular allocation requests
 	 * and let these go through as privileged allocations.
 	 */
-	if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)))
+	if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)) && status != CHG_FAILED_HIGH)
 		return -ENOMEM;
 force:
 	/*
@@ -2651,7 +2680,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (do_memsw_account())
 		page_counter_charge(&memcg->memsw, nr_pages);
 
-	return 0;
+	if (status != CHG_FAILED_HIGH)
+		return 0;
 
 done_restock:
 	if (batch > nr_pages)
diff --git a/mm/page_counter.c b/mm/page_counter.c
index eb156ff5d603..35c9360af334 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -86,19 +86,11 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
 	}
 }
 
-/**
- * page_counter_try_charge - try to hierarchically charge pages
- * @counter: counter
- * @nr_pages: number of pages to charge
- * @fail: points first counter to hit its limit, if any
- *
- * Returns %true on success, or %false and @fail if the counter or one
- * of its ancestors has hit its configured limit.
- */
-bool page_counter_try_charge(struct page_counter *counter,
-			     unsigned long nr_pages,
-			     struct page_counter **fail)
+static enum charge_status __page_counter_try_charge(
+			struct page_counter *counter, unsigned long nr_pages,
+			struct page_counter **fail, bool check_high)
 {
+	enum charge_status status = CHG_SUCCESS;
 	struct page_counter *c;
 
 	for (c = counter; c; c = c->parent) {
@@ -127,6 +119,12 @@ bool page_counter_try_charge(struct page_counter *counter,
 			 */
 			data_race(c->failcnt++);
 			*fail = c;
+			status = CHG_FAILED_MAX;
+			goto failed;
+		} else if (check_high && new > c->high) {
+			atomic_long_sub(nr_pages, &c->usage);
+			*fail = c;
+			status = CHG_FAILED_HIGH;
 			goto failed;
 		}
 		propagate_protected_usage(c, new);
@@ -137,13 +135,46 @@ bool page_counter_try_charge(struct page_counter *counter,
 		if (new > READ_ONCE(c->watermark))
 			WRITE_ONCE(c->watermark, new);
 	}
-	return true;
+	return status;
 
 failed:
 	for (c = counter; c != *fail; c = c->parent)
 		page_counter_cancel(c, nr_pages);
 
-	return false;
+	return status;
+}
+
+/**
+ * page_counter_try_charge - try to hierarchically charge pages
+ * @counter: counter
+ * @nr_pages: number of pages to charge
+ * @fail: points first counter to hit its limit, if any
+ *
+ * Returns %true on success, or %false and @fail if the counter or one
+ * of its ancestors has hit its configured limit.
+ */
+bool page_counter_try_charge(struct page_counter *counter,
+			     unsigned long nr_pages,
+			     struct page_counter **fail)
+{
+	return __page_counter_try_charge(counter, nr_pages, fail, false) ==
+		CHG_SUCCESS;
+}
+
+/**
+ * page_counter_try_charge_high - try to hierarchically charge pages
+ * @counter: counter
+ * @nr_pages: number of pages to charge
+ * @fail: points first counter to hit its limit, if any
+ *
+ * Returns CHG_SUCESS on success, and if the counter or one of its ancestors
+ * has hit its configured max or high limit, return corresponding failure.
+ */
+enum charge_status page_counter_try_charge_high(struct page_counter *counter,
+			     unsigned long nr_pages,
+			     struct page_counter **fail)
+{
+	return __page_counter_try_charge(counter, nr_pages, fail, true);
 }
 
 /**