From patchwork Thu Mar 12 18:02:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Down X-Patchwork-Id: 11435161 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5F2F14B4 for ; Thu, 12 Mar 2020 18:02:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5E09420663 for ; Thu, 12 Mar 2020 18:02:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chrisdown.name header.i=@chrisdown.name header.b="Haae2SBu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E09420663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chrisdown.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 714796B0003; Thu, 12 Mar 2020 14:02:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6C5626B0006; Thu, 12 Mar 2020 14:02:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B3606B0007; Thu, 12 Mar 2020 14:02:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 409326B0003 for ; Thu, 12 Mar 2020 14:02:58 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F40AD9888 for ; Thu, 12 Mar 2020 18:02:57 +0000 (UTC) X-FDA: 76587481194.14.knee87_19d8decf38314 X-Spam-Summary: 2,0,0,905296714babeafd,d41d8cd98f00b204,chris@chrisdown.name,,RULES_HIT:41:355:379:800:960:973:988:989:1260:1277:1312:1313:1314:1345:1431:1437:1516:1518:1519:1534:1541:1593:1594:1595:1596:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3352:3865:3866:3867:3871:3872:3873:3874:5007:6261:6653:7875:10004:10400:10450:10455:11026:11473:11658:11914:12043:12297:12438:12517:12519:12555:12679:12895:13069:13161:13221:13229:13255:13311:13357:13439:13846:13895:14096:14097:14181:14394:14721:19904:19999:21080:21444:21451:21627:21939:30054:30064,0,RBL:209.85.221.68:@chrisdown.name:.lbl8.mailshell.net-62.2.0.100 66.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: knee87_19d8decf38314 X-Filterd-Recvd-Size: 3920 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Mar 2020 18:02:57 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id l18so8641978wru.11 for ; Thu, 12 Mar 2020 11:02:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:mime-version:content-disposition; bh=CRd8IUcL1IWZS04LIYFbzj4WsXVZJS20ANjjPe46vbU=; b=Haae2SBurX0dFdW2FA0odm7mGIYVgXxbkX+kRhvt4ySMuKlguBzaUtjNrkfEaiED2N mbGRZT1Dix/XuTr0AVw4HxlQ1lhNxqlheMxxT7Vxz3Cz1I+Ulb5n/EARnD4G3g00Lwwn 6gqr2BrDtb+6qJLFLYJHzj2AY9s6hHt0U7yk0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition; bh=CRd8IUcL1IWZS04LIYFbzj4WsXVZJS20ANjjPe46vbU=; b=RPMSZDfA4b1SLtAyImMsOwlrn2eTZsjQdG5+pLd6u9Ws8L+HWO8y4BIzzGagefAVEz VoXnjb2mUsrOlSCZh9ipPM/3YFPjzeoHQUAs4BydXs7wwA4Mgj9fNRSCb/Tz+tkCQKqp bhfFFuUkqrGktIuDbH2aXoMkgpxb7IblLTp1uNcIKQsJVuay8bK3csvCmh4/1Qvp13nn tkISxyBELJ0XV6fLWKjnNUxSKY3P9AHreIng99CCeQe2o94Mi+0QvmBW4mDaJbu4DB91 K55OvK/eTfRbihACD4/qHInfTiQojI0RWwPao9q+zpP+JYcd+sdRoB56rZi56diD6Uel GZyw== X-Gm-Message-State: ANhLgQ0cLWoKbXHhPewW968ECTT2WXzX9dvCX2MGU2B3tTFrgeUvBxUN fHtbf/SHXYZ5OSXRCxDudHqQ/Q== X-Google-Smtp-Source: ADFU+vsZkrnnVJ1X3zqzCkFEciPWmkOFQhOVZNvmD7Ds5YPo0PFp30zOhwIn2DsIoCNmjexoQjXayA== X-Received: by 2002:a5d:4f0e:: with SMTP id c14mr12216319wru.100.1584036176193; Thu, 12 Mar 2020 11:02:56 -0700 (PDT) Received: from localhost ([89.32.122.5]) by smtp.gmail.com with ESMTPSA id i6sm11878144wru.40.2020.03.12.11.02.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2020 11:02:55 -0700 (PDT) Date: Thu, 12 Mar 2020 18:02:54 +0000 From: Chris Down To: Andrew Morton Cc: Johannes Weiner , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Message-ID: <80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name> MIME-Version: 1.0 Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 0e4b01df8659 had a bunch of fixups to use the right division method. However, it seems that after all that it still wasn't right -- div_u64 takes a 32-bit divisor. The headroom is still large (2^32 pages), so on mundane systems you won't hit this, but this should definitely be fixed. Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner Signed-off-by: Chris Down Cc: Andrew Morton Cc: Tejun Heo Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-team@fb.com Cc: stable@vger.kernel.org # 5.4.x Acked-by: Johannes Weiner --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 63bb6a2aab81..a70206e516fe 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2339,7 +2339,7 @@ void mem_cgroup_handle_over_high(void) */ clamped_high = max(high, 1UL); - overage = div_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT, + overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT, clamped_high); penalty_jiffies = ((u64)overage * overage * HZ) From patchwork Thu Mar 12 18:03:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Down X-Patchwork-Id: 11435163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33F35913 for ; Thu, 12 Mar 2020 18:03:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E859F20663 for ; Thu, 12 Mar 2020 18:03:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chrisdown.name header.i=@chrisdown.name header.b="tmWupmQn" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E859F20663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chrisdown.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2A9306B0007; Thu, 12 Mar 2020 14:03:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 231616B0008; Thu, 12 Mar 2020 14:03:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FA156B000A; Thu, 12 Mar 2020 14:03:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id E968E6B0007 for ; Thu, 12 Mar 2020 14:03:07 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9FFB0181AEF10 for ; Thu, 12 Mar 2020 18:03:07 +0000 (UTC) X-FDA: 76587481614.16.oven95_1b43633656d1c X-Spam-Summary: 2,0,0,83dca32441cc01a5,d41d8cd98f00b204,chris@chrisdown.name,,RULES_HIT:2:41:69:355:379:800:960:973:988:989:1260:1277:1312:1313:1314:1345:1359:1431:1437:1516:1518:1519:1535:1593:1594:1595:1596:1605:1606:1730:1747:1777:1792:1969:2195:2199:2393:2559:2562:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4119:4321:5007:6261:6653:7875:7903:9036:9592:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12679:12683:12895:13146:13221:13229:13230:13439:13846:13869:13895:14096:14097:14394:21080:21444:21451:21627:21740:21990:30005:30034:30054:30064:30070,0,RBL:209.85.128.68:@chrisdown.name:.lbl8.mailshell.net-62.2.0.100 66.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: oven95_1b43633656d1c X-Filterd-Recvd-Size: 8079 Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Mar 2020 18:03:07 +0000 (UTC) Received: by mail-wm1-f68.google.com with SMTP id 25so7116623wmk.3 for ; Thu, 12 Mar 2020 11:03:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=db1bUIPese8tjEPtKXx5DKELzL2SGDYpGP6uaK+iPYk=; b=tmWupmQn6aBGlxS345WRGAZeHC8WXMQPju9i7p6QnNUOSx21Xp5MO9v9Bmms9+plRJ aJHMDf/lddi7gNlBlJeMQXSy5Wu0fY67UMHLwg5PDMHqLMs3SJUAGd11Z4nrIo5kdDkN sNB9XXE0s93VkX6/77RkeFloCCdRWi7p6Kq7s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=db1bUIPese8tjEPtKXx5DKELzL2SGDYpGP6uaK+iPYk=; b=NdkGkAv8jG6uWFnlyNt9wwsToktx/oYrsKOzP+La8GyZ0mSovNUWmxQgIMY47kZC/f vcT6PcLWS7DPluZGDPqXCrFjmxG7Ua9OmdQ9rya4hNWG4a9u00FzEtmmC0UNBqrwbOab b7tKbNIxqXssspkwD7sr+mUJvaxjqPabE1Bscdf7gespQbaHlxFHnwhJhm+PjFi8Z9yJ HN5FEEcAPEjTayEaNCArQ7J4OFOXdDyi6K5fXtjySkVCM6U8CPZJWf9bu5NTyYkPGL/8 7fGmzVo+02S14Ri+LoqM6YSMdZIiwDCgMhQeld4GBF0Um0ctoA1zzyiK9yjKjbQ7941w iL4g== X-Gm-Message-State: ANhLgQ1oO11re4xH1SP2Wh8ByuIVWp9o/efQcFToDR8lv2eR/kO0NnZe jrQ7kTG7gPJYS+XhxLCgRA1WNQ== X-Google-Smtp-Source: ADFU+vutvYGFAV1tB5uHMNz9p4y1YcFFQjkvM+h/eK2G2W26UXo6YFXBHS4vT87Cr98Fz+Y5NXotTA== X-Received: by 2002:a1c:b4d4:: with SMTP id d203mr6075251wmf.85.1584036185206; Thu, 12 Mar 2020 11:03:05 -0700 (PDT) Received: from localhost ([89.32.122.5]) by smtp.gmail.com with ESMTPSA id b141sm14147822wme.2.2020.03.12.11.03.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2020 11:03:04 -0700 (PDT) Date: Thu, 12 Mar 2020 18:03:04 +0000 From: Chris Down To: Andrew Morton Cc: Johannes Weiner , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high Message-ID: <8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name> References: <80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prior to this commit, we only directly check the affected cgroup's memory.high against its usage. However, it's possible that we are being reclaimed as a result of hitting an ancestor memory.high and should be penalised based on that, instead. This patch changes memory.high overage throttling to use the largest overage in its ancestors when considering how many penalty jiffies to charge. This makes sure that we penalise poorly behaving cgroups in the same way regardless of at what level of the hierarchy memory.high was breached. Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner Signed-off-by: Chris Down Cc: Andrew Morton Cc: Tejun Heo Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-team@fb.com Cc: stable@vger.kernel.org # 5.4.x Acked-by: Johannes Weiner --- mm/memcontrol.c | 93 ++++++++++++++++++++++++++++++------------------- 1 file changed, 58 insertions(+), 35 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a70206e516fe..46d649241a21 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2297,28 +2297,41 @@ static void high_work_func(struct work_struct *work) #define MEMCG_DELAY_SCALING_SHIFT 14 /* - * Scheduled by try_charge() to be executed from the userland return path - * and reclaims memory over the high limit. + * Get the number of jiffies that we should penalise a mischievous cgroup which + * is exceeding its memory.high by checking both it and its ancestors. */ -void mem_cgroup_handle_over_high(void) +static unsigned long calculate_high_delay(struct mem_cgroup *memcg, + unsigned int nr_pages) { - unsigned long usage, high, clamped_high; - unsigned long pflags; - unsigned long penalty_jiffies, overage; - unsigned int nr_pages = current->memcg_nr_pages_over_high; - struct mem_cgroup *memcg; + unsigned long penalty_jiffies; + u64 max_overage = 0; - if (likely(!nr_pages)) - return; + do { + unsigned long usage, high; + u64 overage; - memcg = get_mem_cgroup_from_mm(current->mm); - reclaim_high(memcg, nr_pages, GFP_KERNEL); - current->memcg_nr_pages_over_high = 0; + usage = page_counter_read(&memcg->memory); + high = READ_ONCE(memcg->high); + + /* + * Prevent division by 0 in overage calculation by acting as if + * it was a threshold of 1 page + */ + high = max(high, 1UL); + + overage = usage - high; + overage <<= MEMCG_DELAY_PRECISION_SHIFT; + overage = div64_u64(overage, high); + + if (overage > max_overage) + max_overage = overage; + } while ((memcg = parent_mem_cgroup(memcg)) && + !mem_cgroup_is_root(memcg)); + + if (!max_overage) + return 0; /* - * memory.high is breached and reclaim is unable to keep up. Throttle - * allocators proactively to slow down excessive growth. - * * We use overage compared to memory.high to calculate the number of * jiffies to sleep (penalty_jiffies). Ideally this value should be * fairly lenient on small overages, and increasingly harsh when the @@ -2326,24 +2339,9 @@ void mem_cgroup_handle_over_high(void) * its crazy behaviour, so we exponentially increase the delay based on * overage amount. */ - - usage = page_counter_read(&memcg->memory); - high = READ_ONCE(memcg->high); - - if (usage <= high) - goto out; - - /* - * Prevent division by 0 in overage calculation by acting as if it was a - * threshold of 1 page - */ - clamped_high = max(high, 1UL); - - overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT, - clamped_high); - - penalty_jiffies = ((u64)overage * overage * HZ) - >> (MEMCG_DELAY_PRECISION_SHIFT + MEMCG_DELAY_SCALING_SHIFT); + penalty_jiffies = max_overage * max_overage * HZ; + penalty_jiffies >>= MEMCG_DELAY_PRECISION_SHIFT; + penalty_jiffies >>= MEMCG_DELAY_SCALING_SHIFT; /* * Factor in the task's own contribution to the overage, such that four @@ -2360,7 +2358,32 @@ void mem_cgroup_handle_over_high(void) * application moving forwards and also permit diagnostics, albeit * extremely slowly. */ - penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); +} + +/* + * Scheduled by try_charge() to be executed from the userland return path + * and reclaims memory over the high limit. + */ +void mem_cgroup_handle_over_high(void) +{ + unsigned long penalty_jiffies; + unsigned long pflags; + unsigned int nr_pages = current->memcg_nr_pages_over_high; + struct mem_cgroup *memcg; + + if (likely(!nr_pages)) + return; + + memcg = get_mem_cgroup_from_mm(current->mm); + reclaim_high(memcg, nr_pages, GFP_KERNEL); + current->memcg_nr_pages_over_high = 0; + + /* + * memory.high is breached and reclaim is unable to keep up. Throttle + * allocators proactively to slow down excessive growth. + */ + penalty_jiffies = calculate_high_delay(memcg, nr_pages); /* * Don't sleep if the amount of jiffies this memcg owes us is so low