From patchwork Thu Feb 27 19:56:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 11409579 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79784930 for ; Thu, 27 Feb 2020 19:56:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3356A246A3 for ; Thu, 27 Feb 2020 19:56:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="ghhFE02M" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3356A246A3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 319226B0005; Thu, 27 Feb 2020 14:56:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2CA906B0006; Thu, 27 Feb 2020 14:56:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DE986B0007; Thu, 27 Feb 2020 14:56:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 078B96B0005 for ; Thu, 27 Feb 2020 14:56:14 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D713349961A for ; Thu, 27 Feb 2020 19:56:13 +0000 (UTC) X-FDA: 76536963426.12.beast26_3649b5d86d546 X-Spam-Summary: 2,0,0,830e01fa77d26509,d41d8cd98f00b204,hannes@cmpxchg.org,,RULES_HIT:41:69:355:379:541:966:973:988:989:1042:1260:1311:1314:1345:1437:1515:1535:1543:1711:1730:1747:1777:1792:1801:2196:2199:2393:2553:2559:2562:2693:2897:2916:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:3871:3872:3874:4117:4250:4385:4605:5007:6119:6261:6653:6755:7875:7903:8784:9040:10004:11232:11658:11914:12043:12219:12297:12438:12517:12519:12660:12683:13161:13229:13894:14096:14181:14394:14721:21080:21222:21444:21627:21740:21772:21795:21810:30012:30051:30054:30074:30090,0,RBL:209.85.160.194:@cmpxchg.org:.lbl8.mailshell.net-62.2.0.100 66.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:40,LUA_SUMMARY:none X-HE-Tag: beast26_3649b5d86d546 X-Filterd-Recvd-Size: 6410 Received: from mail-qt1-f194.google.com (mail-qt1-f194.google.com [209.85.160.194]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Thu, 27 Feb 2020 19:56:12 +0000 (UTC) Received: by mail-qt1-f194.google.com with SMTP id e20so235702qto.5 for ; Thu, 27 Feb 2020 11:56:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=pHZKSP3VYnp46GXjDiGlb2OsfVMbe+Fm/ufXK0JaZec=; b=ghhFE02MoIHofcbDHVRBX9U/7eYfI4j70g1jWmFx0XEAoGpx7H47Gm0IWki1DRpFWc AqetnajsS/aYstusDRdzWPmEqTkl4wXNs/GAvuHu+DaaQM59ogrAZGzjP0MCSo19sPDL Poe40EqlTElaA/+u1CqJaB2Z6SxWrAQw6x9MBcfaUuZoc51ePJ+a3oYqcvBWc0OBDBry h8ctnmZ6CZdjFXHF2seaePaRSJrTPH8gi7/+I0suTK/cROfvShjCwDl5XMmcxabuH+Jq JbVgIl+KZ5Ygbamyx4ahqc89JV07ytB/RvEZSH6GcE+40H9BMjdM4i0yhnwIUqYm/xka U1pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=pHZKSP3VYnp46GXjDiGlb2OsfVMbe+Fm/ufXK0JaZec=; b=Co5JJnwobNP8K0pNq9PWOs1H8+hzgrlkyglfrNSn91/JRttQilhgt1N2FbvXQOUc8H NMuibm+d8qzzRI6Mo7+wobK9PxARWokQHAnMN4tdKgIogjEQTx1GW2jxNJmYCU2Oy0si 5ZmX/vTQ98/7aDK8V4V2af/7Z6r10JH+bogb30Brix39d6jhFXCit8hZoHdPUbT9Bu5j crdy/7Tny4OmPGUI6QXRFYN8pElQGbETcExOfK8oUbqIpLlFmHvdXgo7X001k8UKhJP+ uUsEkLo3i4OImgs+qMB8Y/HrpucRvis3rcjW2xU5TLqk5z2mLUdCnNnsiQOkqqjprJLu /Scw== X-Gm-Message-State: APjAAAWPEQmOyj/ai4J9X7XE6r02wrM+mv6aXQUNjWjAt8H6mOBZ2dO+ belNI7ymSwOzGVY952lD76pxWg== X-Google-Smtp-Source: APXvYqzh0jqRZctAqyIQSv9JTnGx+dvR+KkZIYZ3fOMX0S/d4oZuqtuZNihdMoguFDPnTcO4mHEgzQ== X-Received: by 2002:ac8:1a30:: with SMTP id v45mr958788qtj.80.1582833371894; Thu, 27 Feb 2020 11:56:11 -0800 (PST) Received: from localhost ([2620:10d:c091:500::3:2450]) by smtp.gmail.com with ESMTPSA id o17sm3788427qtj.80.2020.02.27.11.56.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 11:56:11 -0800 (PST) From: Johannes Weiner To: Andrew Morton Cc: Roman Gushchin , Michal Hocko , Tejun Heo , Chris Down , =?utf-8?q?Mic?= =?utf-8?q?hal_Koutn=C3=BD?= , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 0/3] mm: memcontrol: recursive memory.low protection Date: Thu, 27 Feb 2020 14:56:03 -0500 Message-Id: <20200227195606.46212-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changes since v2: - Changelog & documentation updates (Michal Hocko, Michal Koutny) Changes since v1: - improved Changelogs based on the discussion with Roman. Thanks! - fix div0 when recursive & fixed protection is combined - fix an unused compiler warning The current memory.low (and memory.min) semantics require protection to be assigned to a cgroup in an untinterrupted chain from the top-level cgroup all the way to the leaf. In practice, we want to protect entire cgroup subtrees from each other (system management software vs. workload), but we would like the VM to balance memory optimally *within* each subtree, without having to make explicit weight allocations among individual components. The current semantics make that impossible. They also introduce unmanageable complexity into more advanced resource trees. For example: host root `- system.slice `- rpm upgrades `- logging `- workload.slice `- a container `- system.slice `- workload.slice `- job A `- component 1 `- component 2 `- job B From a host-level perspective, we would like to protect the outer workload.slice subtree as a whole from rpm upgrades, logging etc. But for that to be effective, right now we'd have to propagate it down through the container, the inner workload.slice, into the job cgroup and ultimately the component cgroups where memory is actually, physically allocated. This may cross several tree delegation points and namespace boundaries, which make such a setup near impossible. CPU and IO on the other hand are already distributed recursively. The user would simply configure allowances at the host level, and they would apply to the entire subtree without any downward propagation. To enable the above-mentioned usecases and bring memory in line with other resource controllers, this patch series extends memory.low/min such that settings apply recursively to the entire subtree. Users can still assign explicit shares in subgroups, but if they don't, any ancestral protection will be distributed such that children compete freely amongst each other - as if no memory control were enabled inside the subtree - but enjoy protection from neighboring trees. In the above example, the user would then be able to configure shares of CPU, IO and memory at the host level to comprehensively protect and isolate the workload.slice as a whole from system.slice activity. Patch #1 fixes an existing bug that can give a cgroup tree more protection than it should receive as per ancestor configuration. Patch #2 simplifies and documents the existing code to make it easier to reason about the changes in the next patch. Patch #3 finally implements recursive memory protection semantics. Because of a risk of regressing legacy setups, the new semantics are hidden behind a cgroup2 mount option, 'memory_recursiveprot'. More details in patch #3. Documentation/admin-guide/cgroup-v2.rst | 11 ++ include/linux/cgroup-defs.h | 5 + kernel/cgroup/cgroup.c | 17 ++- mm/memcontrol.c | 220 +++++++++++++++++------------- mm/page_counter.c | 12 +- 5 files changed, 160 insertions(+), 105 deletions(-)