From patchwork Fri Apr 17 01:06:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 11493987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E92192C for ; Fri, 17 Apr 2020 01:06:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EFE3720776 for ; Fri, 17 Apr 2020 01:06:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="ZIC1YsSJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EFE3720776 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CD218E0005; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 82C8D8E0001; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 768D38E0005; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 5BFBD8E0001 for ; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0B542181AF5C1 for ; Fri, 17 Apr 2020 01:06:44 +0000 (UTC) X-FDA: 76715557128.17.eyes90_57d8280316b1c X-Spam-Summary: 2,0,0,50bbb84b5107ecdc,d41d8cd98f00b204,kuba@kernel.org,,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1543:1711:1730:1747:1777:1792:1969:2195:2199:2393:2559:2562:2898:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:4321:4605:5007:6261:6653:7875:7903:9592:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13869:13894:14096:14181:14394:14721:21080:21451:21627:21740:21990:30005:30034:30054:30070,0,RBL:198.145.29.99:@kernel.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: eyes90_57d8280316b1c X-Filterd-Recvd-Size: 4739 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 01:06:43 +0000 (UTC) Received: from kicinski-fedora-PC1C0HJN.thefacebook.com (unknown [163.114.132.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 77B0E221F9; Fri, 17 Apr 2020 01:06:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587085602; bh=wwP2ZQQOucH3tqgJbeyLtNHf8l02/Io71xBhQiVmaBA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZIC1YsSJ56PA7bAPrRkZRXL+0/4wHL4BFl+1q2ZEiONJb6PIaA9gqf2hymFlLIAMm dT+PvUNy0y0PAfq5T9eig0Iaxpiv96x+ggDJfNnS8wJeBi3AtRtTH/tO2C3XsUddcD 1sebrVtBC/9YA1jPA9aXD0sg4XSmSfrWAWt4sg7Y= From: Jakub Kicinski To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, Jakub Kicinski Subject: [PATCH 1/3] mm: prepare for swap over-high accounting and penalty calculation Date: Thu, 16 Apr 2020 18:06:15 -0700 Message-Id: <20200417010617.927266-2-kuba@kernel.org> X-Mailer: git-send-email 2.25.2 In-Reply-To: <20200417010617.927266-1-kuba@kernel.org> References: <20200417010617.927266-1-kuba@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Slice the memory overage calculation logic a little bit so we can reuse it to apply a similar penalty to the swap. The logic which accesses the memory-specific fields (use and high values) has to be taken out of calculate_high_delay(). Signed-off-by: Jakub Kicinski --- change log including private exchanges with Johannes: v0.4: calculate max outside of calculate_overage() --- mm/memcontrol.c | 62 ++++++++++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 27 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 75a978307863..9bfb7dcb3489 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2319,41 +2319,48 @@ static void high_work_func(struct work_struct *work) #define MEMCG_DELAY_PRECISION_SHIFT 20 #define MEMCG_DELAY_SCALING_SHIFT 14 -/* - * Get the number of jiffies that we should penalise a mischievous cgroup which - * is exceeding its memory.high by checking both it and its ancestors. - */ -static unsigned long calculate_high_delay(struct mem_cgroup *memcg, - unsigned int nr_pages) +static u64 calculate_overage(unsigned long usage, unsigned long high) { - unsigned long penalty_jiffies; - u64 max_overage = 0; - - do { - unsigned long usage, high; - u64 overage; + u64 overage; - usage = page_counter_read(&memcg->memory); - high = READ_ONCE(memcg->high); + if (usage <= high) + return 0; - if (usage <= high) - continue; + /* + * Prevent division by 0 in overage calculation by acting as if + * it was a threshold of 1 page + */ + high = max(high, 1UL); - /* - * Prevent division by 0 in overage calculation by acting as if - * it was a threshold of 1 page - */ - high = max(high, 1UL); + overage = usage - high; + overage <<= MEMCG_DELAY_PRECISION_SHIFT; + return div64_u64(overage, high); +} - overage = usage - high; - overage <<= MEMCG_DELAY_PRECISION_SHIFT; - overage = div64_u64(overage, high); +static u64 mem_find_max_overage(struct mem_cgroup *memcg) +{ + u64 overage, max_overage = 0; - if (overage > max_overage) - max_overage = overage; + do { + overage = calculate_overage(page_counter_read(&memcg->memory), + READ_ONCE(memcg->high)); + max_overage = max(overage, max_overage); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); + return max_overage; +} + +/* + * Get the number of jiffies that we should penalise a mischievous cgroup which + * is exceeding its memory.high by checking both it and its ancestors. + */ +static unsigned long calculate_high_delay(struct mem_cgroup *memcg, + unsigned int nr_pages, + u64 max_overage) +{ + unsigned long penalty_jiffies; + if (!max_overage) return 0; @@ -2409,7 +2416,8 @@ void mem_cgroup_handle_over_high(void) * memory.high is breached and reclaim is unable to keep up. Throttle * allocators proactively to slow down excessive growth. */ - penalty_jiffies = calculate_high_delay(memcg, nr_pages); + penalty_jiffies = calculate_high_delay(memcg, nr_pages, + mem_find_max_overage(memcg)); /* * Don't sleep if the amount of jiffies this memcg owes us is so low From patchwork Fri Apr 17 01:06:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 11493989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8495592C for ; Fri, 17 Apr 2020 01:06:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47B8C20776 for ; Fri, 17 Apr 2020 01:06:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="EwVvT4+5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47B8C20776 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EBED58E0006; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E95C68E0001; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D35518E0006; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id B562C8E0001 for ; Thu, 16 Apr 2020 21:06:44 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 72AEC5DD7 for ; Fri, 17 Apr 2020 01:06:44 +0000 (UTC) X-FDA: 76715557128.16.cast33_57eb7a633ec24 X-Spam-Summary: 2,0,0,93f702e602244f95,d41d8cd98f00b204,kuba@kernel.org,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:2693:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3871:3872:3874:4321:4605:5007:6261:6653:7875:8660:9592:10004:11026:11473:11658:11914:12043:12114:12220:12296:12297:12438:12517:12519:12555:12895:12986:13069:13148:13230:13311:13357:13869:13894:14096:14181:14384:14394:14721:21080:21627:21740:21990:30005:30054,0,RBL:198.145.29.99:@kernel.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: cast33_57eb7a633ec24 X-Filterd-Recvd-Size: 3175 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 01:06:44 +0000 (UTC) Received: from kicinski-fedora-PC1C0HJN.thefacebook.com (unknown [163.114.132.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E3ED52223C; Fri, 17 Apr 2020 01:06:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587085603; bh=atn+uG1raVqgcSHz1jVAI4WGiZvnKZIhSS07qsE9/Rs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EwVvT4+5fMSy+zo4OcC+zP8T8hJMd156TkTFdpONYQ2O01c++xS+x7ytIq2AYo0fm 587NHboBDyfT71UTbfftaFUVy+Voc1xJkSF2Y7eo4IVs5/cj16Z+J5sCr4clpJJ79g F/URS5LCdTMeCqzr5HCjnlMkYF+NSXJqaniVW3P8= From: Jakub Kicinski To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, Jakub Kicinski Subject: [PATCH 2/3] mm: move penalty delay clamping out of calculate_high_delay() Date: Thu, 16 Apr 2020 18:06:16 -0700 Message-Id: <20200417010617.927266-3-kuba@kernel.org> X-Mailer: git-send-email 2.25.2 In-Reply-To: <20200417010617.927266-1-kuba@kernel.org> References: <20200417010617.927266-1-kuba@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We will want to call calculate_high_delay() twice - once for memory and once for swap, and we should apply the clamp value to sum of the penalties. Clamping has to be applied outside of calculate_high_delay(). Signed-off-by: Jakub Kicinski --- mm/memcontrol.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9bfb7dcb3489..90266c04fd76 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2384,14 +2384,7 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg, * MEMCG_CHARGE_BATCH pages is nominal, so work out how much smaller or * larger the current charge patch is than that. */ - penalty_jiffies = penalty_jiffies * nr_pages / MEMCG_CHARGE_BATCH; - - /* - * Clamp the max delay per usermode return so as to still keep the - * application moving forwards and also permit diagnostics, albeit - * extremely slowly. - */ - return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + return penalty_jiffies * nr_pages / MEMCG_CHARGE_BATCH; } /* @@ -2419,6 +2412,13 @@ void mem_cgroup_handle_over_high(void) penalty_jiffies = calculate_high_delay(memcg, nr_pages, mem_find_max_overage(memcg)); + /* + * Clamp the max delay per usermode return so as to still keep the + * application moving forwards and also permit diagnostics, albeit + * extremely slowly. + */ + penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + /* * Don't sleep if the amount of jiffies this memcg owes us is so low * that it's not even worth doing, in an attempt to be nice to those who From patchwork Fri Apr 17 01:06:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 11493991 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB63692C for ; Fri, 17 Apr 2020 01:06:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6EEB320776 for ; Fri, 17 Apr 2020 01:06:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="SlgrJ96P" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6EEB320776 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 98DF68E0007; Thu, 16 Apr 2020 21:06:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 93FD88E0001; Thu, 16 Apr 2020 21:06:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 854818E0007; Thu, 16 Apr 2020 21:06:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id 66FFF8E0001 for ; Thu, 16 Apr 2020 21:06:45 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 16112181AEF3F for ; Fri, 17 Apr 2020 01:06:45 +0000 (UTC) X-FDA: 76715557170.26.doll42_57fdf6478873c X-Spam-Summary: 2,0,0,3e420333c0f066aa,d41d8cd98f00b204,kuba@kernel.org,,RULES_HIT:2:41:355:379:541:800:960:973:981:988:989:1260:1311:1314:1345:1359:1437:1500:1515:1535:1605:1730:1747:1777:1792:1969:2195:2197:2199:2200:2393:2559:2562:2740:2890:2898:2904:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4042:4050:4119:4321:4605:5007:6261:6653:7875:7903:9036:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13894:14394:21080:21212:21450:21611:21627:21740:21990:30001:30029:30034:30054:30056:30070,0,RBL:198.145.29.99:@kernel.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: doll42_57fdf6478873c X-Filterd-Recvd-Size: 8866 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 01:06:44 +0000 (UTC) Received: from kicinski-fedora-PC1C0HJN.thefacebook.com (unknown [163.114.132.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5CCF122240; Fri, 17 Apr 2020 01:06:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587085603; bh=z8dvPSqCuLXycqOZm3YrJ2RMQ4vjRFoRhEV3QE2tF7I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SlgrJ96PogaG1hFj+GpnIm0mOl2aCkubvXl+TuwiHy01pBrMo6qxVWzgoCU3XRTPS cfji15bBicBAtBhVz/AIhs754weTH2W7WWwrRaPPrXsjCWKq8jx1qQSPGxpjvO7X6/ BlOaym2K4Tw5y3BVsH62E3AWdmrylqQop+VpWtGQ= From: Jakub Kicinski To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, Jakub Kicinski Subject: [PATCH 3/3] mm: automatically penalize tasks with high swap use Date: Thu, 16 Apr 2020 18:06:17 -0700 Message-Id: <20200417010617.927266-4-kuba@kernel.org> X-Mailer: git-send-email 2.25.2 In-Reply-To: <20200417010617.927266-1-kuba@kernel.org> References: <20200417010617.927266-1-kuba@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a memory.swap.high knob, which functions in a similar way to memory.high, but with a less steep slope. Use one counter for number of pages allocated under pressure to save struct task space and avoid two separate hierarchy walks on the hot path. Use swap.high when deciding if swap is full. Perform reclaim and count memory over high events. Signed-off-by: Jakub Kicinski --- change log including private exchanges with Johannes: v0.4: - count the events only for the highest offender - don't schedule_work for swap - take high into account in mem_cgroup_swap_full() v0.3: - rebase - use one nr_pages - s/overage_shf/cost_shift/ v0.2: - move to counting overage in try_charge() - add comment about slope being more gradual --- include/linux/memcontrol.h | 4 ++ mm/memcontrol.c | 94 +++++++++++++++++++++++++++++++++++--- 2 files changed, 91 insertions(+), 7 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d630af1a4e17..b52f7dcddfa1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -45,6 +45,7 @@ enum memcg_memory_event { MEMCG_MAX, MEMCG_OOM, MEMCG_OOM_KILL, + MEMCG_SWAP_HIGH, MEMCG_SWAP_MAX, MEMCG_SWAP_FAIL, MEMCG_NR_MEMORY_EVENTS, @@ -218,6 +219,9 @@ struct mem_cgroup { /* Upper bound of normal memory consumption range */ unsigned long high; + /* Upper bound of swap consumption range */ + unsigned long swap_high; + /* Range enforcement for interrupt charges */ struct work_struct high_work; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90266c04fd76..7d0895c0d903 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2351,12 +2351,34 @@ static u64 mem_find_max_overage(struct mem_cgroup *memcg) return max_overage; } +static u64 swap_find_max_overage(struct mem_cgroup *memcg) +{ + u64 overage, max_overage = 0; + struct mem_cgroup *max_cg; + + do { + overage = calculate_overage(page_counter_read(&memcg->swap), + READ_ONCE(memcg->swap_high)); + if (overage > max_overage) { + max_overage = overage; + max_cg = memcg; + } + } while ((memcg = parent_mem_cgroup(memcg)) && + !mem_cgroup_is_root(memcg)); + + if (max_overage) + memcg_memory_event(max_cg, MEMCG_SWAP_HIGH); + + return max_overage; +} + /* * Get the number of jiffies that we should penalise a mischievous cgroup which * is exceeding its memory.high by checking both it and its ancestors. */ static unsigned long calculate_high_delay(struct mem_cgroup *memcg, unsigned int nr_pages, + unsigned char cost_shift, u64 max_overage) { unsigned long penalty_jiffies; @@ -2364,6 +2386,9 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg, if (!max_overage) return 0; + if (cost_shift) + max_overage >>= cost_shift; + /* * We use overage compared to memory.high to calculate the number of * jiffies to sleep (penalty_jiffies). Ideally this value should be @@ -2409,9 +2434,16 @@ void mem_cgroup_handle_over_high(void) * memory.high is breached and reclaim is unable to keep up. Throttle * allocators proactively to slow down excessive growth. */ - penalty_jiffies = calculate_high_delay(memcg, nr_pages, + penalty_jiffies = calculate_high_delay(memcg, nr_pages, 0, mem_find_max_overage(memcg)); + /* + * Make the swap curve more gradual, swap can be considered "cheaper", + * and is allocated in larger chunks. We want the delays to be gradual. + */ + penalty_jiffies += calculate_high_delay(memcg, nr_pages, 2, + swap_find_max_overage(memcg)); + /* * Clamp the max delay per usermode return so as to still keep the * application moving forwards and also permit diagnostics, albeit @@ -2602,12 +2634,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * reclaim, the cost of mismatch is negligible. */ do { - if (page_counter_read(&memcg->memory) > READ_ONCE(memcg->high)) { - /* Don't bother a random interrupted task */ - if (in_interrupt()) { + bool mem_high, swap_high; + + mem_high = page_counter_read(&memcg->memory) > + READ_ONCE(memcg->high); + swap_high = page_counter_read(&memcg->swap) > + READ_ONCE(memcg->swap_high); + + /* Don't bother a random interrupted task */ + if (in_interrupt()) { + if (mem_high) { schedule_work(&memcg->high_work); break; } + continue; + } + + if (mem_high || swap_high) { current->memcg_nr_pages_over_high += batch; set_notify_resume(current); break; @@ -5071,6 +5114,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) WRITE_ONCE(memcg->high, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; + WRITE_ONCE(memcg->swap_high, PAGE_COUNTER_MAX); if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; @@ -5224,6 +5268,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css) page_counter_set_low(&memcg->memory, 0); WRITE_ONCE(memcg->high, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; + WRITE_ONCE(memcg->swap_high, PAGE_COUNTER_MAX); memcg_wb_domain_size_changed(memcg); } @@ -7137,10 +7182,13 @@ bool mem_cgroup_swap_full(struct page *page) if (!memcg) return false; - for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) - if (page_counter_read(&memcg->swap) * 2 >= - READ_ONCE(memcg->swap.max)) + for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { + unsigned long usage = page_counter_read(&memcg->swap); + + if (usage * 2 >= READ_ONCE(memcg->swap_high) || + usage * 2 >= READ_ONCE(memcg->swap.max)) return true; + } return false; } @@ -7170,6 +7218,30 @@ static u64 swap_current_read(struct cgroup_subsys_state *css, return (u64)page_counter_read(&memcg->swap) * PAGE_SIZE; } +static int swap_high_show(struct seq_file *m, void *v) +{ + unsigned long high = READ_ONCE(mem_cgroup_from_seq(m)->swap_high); + + return seq_puts_memcg_tunable(m, high); +} + +static ssize_t swap_high_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned long high; + int err; + + buf = strstrip(buf); + err = page_counter_memparse(buf, "max", &high); + if (err) + return err; + + WRITE_ONCE(memcg->swap_high, high); + + return nbytes; +} + static int swap_max_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -7197,6 +7269,8 @@ static int swap_events_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + seq_printf(m, "high %lu\n", + atomic_long_read(&memcg->memory_events[MEMCG_SWAP_HIGH])); seq_printf(m, "max %lu\n", atomic_long_read(&memcg->memory_events[MEMCG_SWAP_MAX])); seq_printf(m, "fail %lu\n", @@ -7211,6 +7285,12 @@ static struct cftype swap_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = swap_current_read, }, + { + .name = "swap.high", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = swap_high_show, + .write = swap_high_write, + }, { .name = "swap.max", .flags = CFTYPE_NOT_ON_ROOT,