From patchwork Tue Jul 2 08:44:04 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Huan Yang
X-Patchwork-Id: 13719116
Return-Path:
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
by smtp.lore.kernel.org (Postfix) with ESMTP id 4544BC3064D
for ; Tue, 2 Jul 2024 08:47:58 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
id D72836B0095; Tue, 2 Jul 2024 04:47:57 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
id CFA636B0096; Tue, 2 Jul 2024 04:47:57 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
id AFF226B0098; Tue, 2 Jul 2024 04:47:57 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com
[216.40.44.11])
by kanga.kvack.org (Postfix) with ESMTP id 888746B0095
for ; Tue, 2 Jul 2024 04:47:57 -0400 (EDT)
Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1])
by unirelay06.hostedemail.com (Postfix) with ESMTP id 49ED9A19C3
for ; Tue, 2 Jul 2024 08:47:57 +0000 (UTC)
X-FDA: 82294184994.30.F164F32
Received: from APC01-TYZ-obe.outbound.protection.outlook.com
(mail-tyzapc01on2051.outbound.protection.outlook.com [40.107.117.51])
by imf05.hostedemail.com (Postfix) with ESMTP id 4405B10000A
for ; Tue, 2 Jul 2024 08:47:53 +0000 (UTC)
Authentication-Results: imf05.hostedemail.com;
dkim=pass header.d=vivo.com header.s=selector2 header.b=Uleg6qOv;
spf=pass (imf05.hostedemail.com: domain of link@vivo.com designates
40.107.117.51 as permitted sender) smtp.mailfrom=link@vivo.com;
dmarc=pass (policy=quarantine) header.from=vivo.com;
arc=pass ("microsoft.com:s=arcselector9901:i=1")
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed;
d=hostedemail.com;
s=arc-20220608; t=1719910057;
h=from:from:sender:reply-to:subject:subject:date:date:
message-id:message-id:to:to:cc:cc:mime-version:mime-version:
content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=HdNNIIt5i57KrDH2G5inTTWNfKhy1MKf1/9NPo25L0U=;
b=Scu0Ji0dsi2aX4spTDvvq7zyzZn166GgxGR5CRKXA6Qmdxlj5fOoAttuJLbrIr9WfrSyvc
EhRD2KazwkrEgDswbbDmvT5sFYEouZGWbOjDnjyiiv56NZlGUJz4She3N1AH2ROUpRJ8Lg
P2DPKn21Vx+kO84PrTamQpWwIhL/g6k=
ARC-Authentication-Results: i=2;
imf05.hostedemail.com;
dkim=pass header.d=vivo.com header.s=selector2 header.b=Uleg6qOv;
spf=pass (imf05.hostedemail.com: domain of link@vivo.com designates
40.107.117.51 as permitted sender) smtp.mailfrom=link@vivo.com;
dmarc=pass (policy=quarantine) header.from=vivo.com;
arc=pass ("microsoft.com:s=arcselector9901:i=1")
ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1719910057; a=rsa-sha256;
cv=pass;
b=C1UEG2J177Bp3W2WHE4uOY6Np984F0C7GKAVzlOXGmTnOPUJxI5tvEEVOSE334GBjE8jqV
1ROcT3tkzAv2iO6UXykj5zYnspcwrb+6WuR5sUkNvn6rhbeDdPFEqkkg64L9aZSHjDkF9o
6twgdUEq0vjWb0CduZbx3Om+DqH4iH4=
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=PArtcsHgRAOSoH4wlQV0BTnJwYMYtPxo9TNyjMxlu+Io0MD6KyPqi640mghyMjGo2MhaESfmWTiFGAP3zFvPZvVhnhSbxOszrRKQpJ+WoUFnm+DmLJcwqc0Ut3wxe5+6B/SXC6WqZAvAan4tnspWxMiznXHbwWr23D/f3LimnOU9gHMwlyEp92zD+ZnkcM6+YbbuVtloeAB7KmO6e3uGVpnkWdyLuPMBzmq93mckW11oLse4pdXgIjLz5OTbzYugTaWqaisa205A2VSG2GERD/xMqJABcTnTyyYA1mD6T1FUYeiX+yeJ9IIwyZCk4DjOsYJdb4E7lkDiYMFMHeeaAg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=HdNNIIt5i57KrDH2G5inTTWNfKhy1MKf1/9NPo25L0U=;
b=ZyE/rjfdNJvt1dGaNUOndoRB4Z3p7ZGQPywyq5lCS1muakE8fnop1g5OJpxtlWwFjlAOT5k1OwAq3jCXTuvCZlD4lxntLbfe/0gU+Y9F/R9u7+L9H6NcP+mHmG5HT3MPvLMGOnZE69lXo9kni0LaXA5quJScAwHPP0wr3qLDAiqs/94/mKc5z979BNV8CRY7Zbs77D6uWFJmfYiNRaC7LkJr7UFDcQUa1auIaBqDMgxvK+cUBU1u1mK5iRKKh2fNinbrvQpMNSR27Z7rYxlts9IVm/legA9w4M51x1Ry//O5C9X1lW9Hp4mRf+1CdKC+WbYZwuDKgzWTVbfVbHDurg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=vivo.com; dmarc=pass action=none header.from=vivo.com;
dkim=pass header.d=vivo.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vivo.com; s=selector2;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=HdNNIIt5i57KrDH2G5inTTWNfKhy1MKf1/9NPo25L0U=;
b=Uleg6qOvvtBmqW/bniZ1yZvAX/tErIBcHdw7gJV6hKmLW1fQrnwPboNr2XAIdhL35QqJoJ6d0G6wHcQunSRsrZdclBX7MPPO6lQOyvT65Y1ox07rY69CkadkOuzUXL+BhhVac08YoHfq5L2KeCWDQVGLP4lMAaQjAX5FVECqxui+/2vqkkQuSfP3mNmEjhldD/RELjPfjYchPRh2SFugas4ZQWrtbqCXHV2iTSnWGcNWJ4e3qYhiCkIdZfLeaINF2cFOc+PzJiKJ6QLmL40by/5aL/Gck3j3UPOr7NOg2aEw9mv0H/+04uOQlimQamPRNbyAc29U0eulwOXbv0A0BQ==
Received: from PUZPR06MB5676.apcprd06.prod.outlook.com (2603:1096:301:f8::10)
by SEYPR06MB5790.apcprd06.prod.outlook.com (2603:1096:101:b9::12) with
Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.23; Tue, 2 Jul
2024 08:46:05 +0000
Received: from PUZPR06MB5676.apcprd06.prod.outlook.com
([fe80::a00b:f422:ac44:636f]) by PUZPR06MB5676.apcprd06.prod.outlook.com
([fe80::a00b:f422:ac44:636f%6]) with mapi id 15.20.7719.028; Tue, 2 Jul 2024
08:46:05 +0000
From: Huan Yang
To: Johannes Weiner ,
Michal Hocko ,
Roman Gushchin ,
Shakeel Butt ,
Muchun Song ,
Andrew Morton ,
"Matthew Wilcox (Oracle)" ,
David Hildenbrand ,
Ryan Roberts ,
Chris Li ,
Dan Schatzberg ,
Huan Yang ,
Kairui Song ,
cgroups@vger.kernel.org,
linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Christian Brauner
Cc: opensource.kernel@vivo.com
Subject: [RFC PATCH 1/4] mm: memcg: pmc framework
Date: Tue, 2 Jul 2024 16:44:04 +0800
Message-ID: <20240702084423.1717904-2-link@vivo.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20240702084423.1717904-1-link@vivo.com>
References: <20240702084423.1717904-1-link@vivo.com>
X-ClientProxiedBy: SG2PR01CA0148.apcprd01.prod.exchangelabs.com
(2603:1096:4:8f::28) To PUZPR06MB5676.apcprd06.prod.outlook.com
(2603:1096:301:f8::10)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: PUZPR06MB5676:EE_|SEYPR06MB5790:EE_
X-MS-Office365-Filtering-Correlation-Id: 84119b26-1c3e-469d-ac40-08dc9a736905
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam:
BCL:0;ARA:13230040|1800799024|52116014|7416014|376014|366016|921020|38350700014;
X-Microsoft-Antispam-Message-Info:
YMOCAppgicIGMw9JCerPdCkLRRDdwAiX/jpU7zGN5fSPeyM2dr7gvAh2gb+rJQUTogBf8Qu4Lg58CjBNUiZqFjjhZ9wsRHzLo4a2GaEiZW/aW/THCANTGtHdUyuQPOoQ54TNsRLXci3Ysv0vJjZuNY4nz+wNlBIwo9JXBYBoJ71xORkQwRzdZ/7y4Z0BhkbCcFTHoprPE0bjnBnjKGTZcVntGQSAwemHESM6zQxQm1F8QS/AdHbObv6J4oujjaHQC81XiURMmR6O3tw0/yszKQ2aTFUTKUMWTLXX9pC5RX3hAG+zyJQBWjIUqkhu9AFmubFIRPPArBuiz7b24Z1xiXzVX9/K5iTCV9Tq6dSRv55chyoW8F+3WRRVheZaMY5hE38enCer91bEipc1jYHDwpFTZKYvzNF0V0YUisOUI54XiTkKYo/q7wOmzl6bvNHC61bBIyO3PAEssYiRZkHvaKoSVDaW3taoLoqlf7FexSV5jWFE8q9lBhjJvNlyrXccPs4eXxZouAzgVG6wiVHEJK+EKsEleVmXBsWH9AzytpQScUI1qkaDzGXZ6mtz5aVQgzyN2P188R0bsyv1pjSQvBkdS8xQnfqddxkCQd/TsNtBwW43zbytHnTWFwC6xH2CPCs8fQSJeuWZsAuf2G/NVtU1sgyCEk9Cjcekm+TLWDcq8SAth/D3SBP3YKoeOE13nO1zwZho66vUzbBGc/VfUGABv0RkQix06lckJ1Whc6cfGCchZBRDBrRhh6q3qn60r4Z34EvdXvYHUnokPiXC/WLKq1O/mbT2YbYVa0iolF4CyNxLaQGEa69Jb47EFJfwlw7q/ffzOQMwkEONygG5JBPuRPsV/1R2vTRWsLI/s4hyOH/l6VTE/zqUwmQUe1W6yR8X2xVW8EsnvQBLHS/G9CU7z321yWW/lZl4JBxKCNDdKejdDWNAK8YfHAmalIe48HAY5CYaiSI2W17Vk8bu4sU3o0IBKkxidMlWZTAo5uuQNKkc5o4pOXdWM4MquGDLNaxep8xLCBvRajDZTNTBGZwH1fj7PBlDUwwiAXPdZCiNlSuP0UBZwEPMs/wkdSxWUK/DAw7bQ4URDjJqlpVyrY0uXUj/05MopO0w5XN8vfIrMZdEcvHOyUZPU1uU9451UmcuoYeCxvBgILQ5k2glpu3j1IWr/xsnbzamrKDtrGWMscbOk0GlmXIHH3jgu6GmtC1GLpnxjjsxiCxw4YQas1HhazlbB8LtU3M3lJqz9Ius59+x0D0ZeQ1UQ9BSyAyO4nRI4ZRxbzqMjzI5H7HeuCVY5u67xCbGl1tUvO+JBtb2f+2mou5lJO0qi6V5ezQ4ppZPBbJPd66NjRPlSaLYvgNs3hrBYNLLB4GKqHqjV/dPSPzHb56H4w2EwUsX2k6vnpM95HdMGQzvQ6B4OwDCgE+R/gf2p7b66qznO7v0dyI=
X-Forefront-Antispam-Report:
CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PUZPR06MB5676.apcprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(52116014)(7416014)(376014)(366016)(921020)(38350700014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0:
rMuw7FBY94xkJq2Gp8p50HP7cnuz+T1pVbvGAQeSpi/dSPjeGNCWmnUa9kp4YAoBdGOCXnqcQECybIG/4nLJmki/AgeEny7/DmHre4+0E/OUYNxcxG0sbQkFRDmN4xHmWHICms4rdYQX4opsULKFiCAAy8HntHF6ERbxCbvaAHZeFKQO9GVWQxToq7pXfp37+KvcTS5PJTrbblJ79Qopi+LTf4Hf8bfjKRG2ugNIOVXK1E8u3YIY53iXdXQcFeW97grJRqXemV3b1F4sVEBY5arw+iywX6zLNXEghOcp1mqkgNGWzMBUaG2ARKsxUw3f9/NBvqkpZq2nhOmyIpApK3jcBB56Tmr/+avQY/BoQ5kZY+aZQ5MaOfNLAnFoQ1gegXZcQ1RhHVGRMRQuT2CMPRMxCyJIurIhPSeJUNKuOxZqzDc8E5kSPW47poBzOf086PLkJvvfRQ5hIhMpzcX7VlF+Tv0R16bgGHzCLvq0/gHRLvOb0rK94CA2ummYLW4QQr11fqEB42ZmoIS/vT7nLfZCBKP6bNb6cLcTQuG1IEmtZcNx462srUyqZq5pEuOtbCaDoJuh7+3DzkaOyyIZS3Kb235fTJ6ALHnvFf3u72bnjKoO/8hrq50xMyCuOSiqE8WDqgWEpv2gz9sSMqXuzqel5GG2TCRALWIDGDqlyE70wr/W05V2uG37F7w/xMbI9HxzmQMktU/aazSn0kCnUyRgV6m/p5LbBuADxhUs2MC+T8Ow8XfMYHxwImky1JElXXOHoIDrmyvxn4pw8Q1eYg5orXFTm6avKxb4KX0mxzf34g2zBevOI3hO2AnTXH2TcwPLV9q6ZhkffIsKc88Ny9r4OZ3sNWQ+wIHNGxoLLLkKrxXEmSgwDTqxlgcQes64yMyZpuBRKNy+WICLJpvvUqKpSe3Xx8rAEFViGPzfZSoZ62LwXkPJTxZkCRzvhRrCjkCBcuFP09gl90WwJiRAW2nMMdYIHxpxdSJ4s/ecS3oDsVhJfTHf1IHZLU2/gMaXJv9ni9lVURo1kCjin6lUjZjBkuITyR7LYWK7A0+r8/aMBJNQ/aaulWKBbT3lgjJjud+NBdidOPK1ryjXjmqrofvihhgItMZCpHPjh7RCwS3917pE8lLMWLwttgZaiar9L5jGGErl6Rzdm1u3QuH/iRI9+ZO4bJP/h3Ti9tFNBADvSTenyvBD2mMaNoQzE65vU4osPACP7jWOa0hg+2ifDFg03wJ36PxIXn9mF3pepXKLytCYf/snEvh9HNJD55Et58dveKPSzSO8dHb7qRjgDbtdYh+fdbxPuJvHwoCwHSiB516nTriR+S8mx9Iwsn8bzrYTkUAl72FzgbE44VYBu7fKpYhgbpAQeheT5+Maym+lYBwLJRtmvvjgL/sGvfxdsNlEgR4MQ3yRnlma5X4uARkB1hAZFKsiKzN4KtVjthzYqgYyvZ90VmGaFSXTN8j6ucnayRJmMtLhvoIpdKKrAnc2Un516jIy3faEKkBpFeGXk0p2I1mfeZcbtNkxmwh8zWcVcSJi3KNAD9rw008N6xUVBhuig6T8YTvLhkmIdcpZte0SlEfdOZOD/T/a+cE2
X-OriginatorOrg: vivo.com
X-MS-Exchange-CrossTenant-Network-Message-Id:
84119b26-1c3e-469d-ac40-08dc9a736905
X-MS-Exchange-CrossTenant-AuthSource: PUZPR06MB5676.apcprd06.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jul 2024 08:46:05.5322
(UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 923e42dc-48d5-4cbe-b582-1a797a6412ed
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName:
E1kgVzoS6lowAWmEcHfN2rn34YWoi5y3uowseeJVne2cYk0TPp806UqtaQ7TgD+uM1NR3Kp7oeBa6MM+OXaptg==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEYPR06MB5790
X-Stat-Signature: jhm3zf3guxrop4zrczjdngt9heod9k5a
X-Rspam-User:
X-Rspamd-Queue-Id: 4405B10000A
X-Rspamd-Server: rspam02
X-HE-Tag: 1719910073-512725
X-HE-Meta:
U2FsdGVkX1/kNaAeMhy7HKV8WfowEre+lnr7dbtstDPbD1iPocNd6rXfW6yErrsjIAzQALG92+rHJt4l43zfaxHHDkJmeMMEW4H7SMsDjU9p63iIk8e3VN/y/et4r+7AEcruH0HJmYnvMeenl6MTCNP5TOOHSxkLar54EnkPjyER6RKatQPUfrlwZ/HfkOU6v8LTwg2l135h9F/aT4mXJlZw4XtU4pUpPwequU1KFcT7zwqUKLmZ7PP26pFD/N72uAO3yCxn+HNf8ELMFggd+PKVV7MGvwOjneyFldyO6oJE9fjiZ00h1vTfTN+eOZ41OB9Li8kxrCp/EMkdNdMsBW+mMqDyIsEb3XxN9VO1aTA9CoFx1YboJTduBDuduS1JLVD/H+hfb0V4oQe3v+SUk0X0TUa+rKkIzEHBnrWpFHGrDy9jGnt28l59es5opsMfd1ykgDsBfxP74V4hDckXIlwyiYvYRjDcGuttUdmdn49NN4eVIT7/Q+JawpXGg3GzFqb23h/XsnvgiE+6Jj4UP13UltSv+QosoOVk+2a7SeSoxRszD13OLFB08ocLgByoTkBvVU1vCQN/gpqvfJ77atXgVeokn/jJvoyIaai7JN7GEgWg5fS6EMFdSMKL2vKu/e943LMynq9KmrjKBed/0ry7ToaB/gHH8H1SaXvOcLw8itPUA72R6Efk8yI8Lin2LLCyGhIpul1u8/X+Ikrrq9QRa5L09DtKAZAp3aQKpGP0lQaeGWmUsp+nBuFEXqQ0ox02qAN+W5ApCorvZW90ZXM5llS1NVIjcm9L8HQqfTjWGxqSuHVzX+J3iuZV8pcGiNB1md5bJHjS3x1Gpx/LbKnAkmoJ20EHmBO8Lpnn+1PqE1YBexbezohWkIdy4/KbQVbpWH4kumEIBHIaFDPMEwyP7Hbs8wEF9wsh2HbTqB4urKtu9XvxijYywgxcChxy2fARfH/8ZZxYA+ipd5q
XnosuuFR
vDnmOpAD94O19XjM7elfpxxXzo/ajc810F4arv56nQmq8bHT+X2xEMcPM4K4SaRDaM9ZZNxMok7iKEIhk2+k1/WHOoa6pwGleaRi5+aWDwOOjW0zRldBLNABIignV6Z5ezz4ySyTzL1OaewwH2zhhWTntCqXqHsRtXHah419NJK1Ev76tVdpK4YsL2l/YZ7t25GhklIZF8VZ4JU6Wuk2SOXo3BrGkrTjKwNKAjotD4O9BxsZleweC1E1y7MljKsceuvxxgiLLJdX1fMdzyOwOIYNS5XVNxOsa3lGD8dBglBQsy5LzmUN3megcE+OHZgIKeybg0+W8BkR6PP6HaUs25WzIPpYeaa6v1DjFNBAv94H8zK0hERICTYbzvA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID:
List-Subscribe:
List-Unsubscribe:
pmc - per memcg cache
This patch add a feature pmc in each memcg unless root memcg.
User can enable pmc in a target memcg, so all task in this memcg
will share a cache pool, the alloc/free order 0 page will high
priority turn in this cache pool.
Signed-off-by: Huan Yang
---
include/linux/memcontrol.h | 41 +++++++
include/linux/mmzone.h | 25 ++++
include/linux/swap.h | 1 +
mm/memcontrol.c | 237 +++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 146 +++++++++++++++++++++++
5 files changed, 450 insertions(+)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8f332b4ae84c..5ec4c64bc515 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -130,6 +130,7 @@ struct mem_cgroup_per_node {
bool on_tree;
struct mem_cgroup *memcg; /* Back pointer, we cannot */
/* use container_of */
+ struct mem_cgroup_per_node_cache *cachep;
};
struct mem_cgroup_threshold {
@@ -336,6 +337,8 @@ struct mem_cgroup {
struct lru_gen_mm_list mm_list;
#endif
+ bool cache_enabled;
+
struct mem_cgroup_per_node *nodeinfo[];
};
@@ -557,6 +560,8 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob
return memcg;
}
+extern struct static_key_true pmc_key;
+
#ifdef CONFIG_MEMCG_KMEM
/*
* folio_memcg_kmem - Check if the folio has the memcg_kmem flag set.
@@ -1185,6 +1190,25 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
gfp_t gfp_mask,
unsigned long *total_scanned);
+static inline bool pmc_disabled(void)
+{
+ return static_branch_likely(&pmc_key);
+}
+
+static inline bool mem_cgroup_cache_disabled(struct mem_cgroup *memcg)
+{
+ return !READ_ONCE(memcg->cache_enabled);
+}
+
+
+static inline struct mem_cgroup_per_node_cache *
+mem_cgroup_get_node_cachep(struct mem_cgroup *memcg, int nid)
+{
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+
+ return nodeinfo->cachep;
+}
+
#else /* CONFIG_MEMCG */
#define MEM_CGROUP_ID_SHIFT 0
@@ -1648,6 +1672,23 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
{
return 0;
}
+
+static inline bool pmc_disabled(void)
+{
+ return true;
+}
+
+static inline bool mem_cgroup_cache_disabled(struct mem_cgroup *memcg)
+{
+ return true;
+}
+
+
+static inline struct mem_cgroup_per_node_cache *
+mem_cgroup_get_node_cachep(struct mem_cgroup *memcg, int nid)
+{
+ return NULL;
+}
#endif /* CONFIG_MEMCG */
/*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c11b7cde81ef..773b89e214c9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -603,6 +603,31 @@ static inline void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid)
#endif /* CONFIG_LRU_GEN */
+struct mem_cgroup_zone_cache {
+ /* cache pages, current only hold order 0 */
+ struct list_head pages;
+ spinlock_t pages_lock;
+ atomic_t nr_pages;
+ atomic_t nr_alloced;
+};
+
+struct mem_cgroup_per_node_cache {
+ /* per zone cache */
+ struct mem_cgroup_zone_cache zone_cachep[MAX_NR_ZONES];
+ struct mem_cgroup *memcg;
+
+ /* max number to hold page, unit page, default 100MB */
+#define DEFAULT_PMC_HOLD_LIMIX ((100 << 20) >> PAGE_SHIFT)
+ unsigned int hold_limit;
+
+#define DEFAULT_PMC_GAP_WATERMARK ((50 << 20) >> PAGE_SHIFT)
+ /*
+ * Only when zone free pages above high+allow watermark, can hold cache,
+ * unit page, default 50MB
+ */
+ unsigned int allow_watermark;
+};
+
struct lruvec {
struct list_head lists[NR_LRU_LISTS];
/* per lruvec lru_lock for memcg */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 11c53692f65f..d7b5e0a8317c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -420,6 +420,7 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
long remove_mapping(struct address_space *mapping, struct folio *folio);
+extern int mem_cgroup_release_cache(struct mem_cgroup_per_node_cache *fc);
#ifdef CONFIG_NUMA
extern int node_reclaim_mode;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1b3c3394a2ba..404fcb96bf68 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -95,6 +95,15 @@ static bool cgroup_memory_nokmem __ro_after_init;
/* BPF memory accounting disabled? */
static bool cgroup_memory_nobpf __ro_after_init;
+/*
+ * How many memcg enabled cache? If none, static branch will enable
+ * so none task free/alloc will into PMC path.
+ * Else, hold/free cache in target memcg, disable static branch.
+ */
+static atomic_t pmc_nr_enabled;
+DEFINE_STATIC_KEY_TRUE(pmc_key);
+
+
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
#endif
@@ -5738,6 +5747,8 @@ static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
lru_gen_release_memcg(memcg);
}
+static int __disable_mem_cgroup_cache(struct mem_cgroup *memcg);
+
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
@@ -5762,6 +5773,8 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
cancel_work_sync(&memcg->high_work);
mem_cgroup_remove_from_trees(memcg);
free_shrinker_info(memcg);
+ if (READ_ONCE(memcg->cache_enabled))
+ __disable_mem_cgroup_cache(memcg);
mem_cgroup_free(memcg);
}
@@ -7088,6 +7101,223 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
return nbytes;
}
+static int __enable_mem_cgroup_cache(struct mem_cgroup *memcg)
+{
+ int nid, idx;
+
+ if (!mem_cgroup_cache_disabled(memcg))
+ return -EINVAL;
+
+ for_each_node(nid) {
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+ struct mem_cgroup_per_node_cache *p = kvzalloc_node(
+ sizeof(struct mem_cgroup_per_node_cache),
+ GFP_KERNEL, nid);
+
+ if (unlikely(!p))
+ goto fail;
+
+ nodeinfo->cachep = p;
+ }
+
+ for_each_node(nid) {
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+ pg_data_t *pgdat = NODE_DATA(nid);
+ struct mem_cgroup_per_node_cache *p = nodeinfo->cachep;
+
+ for (idx = 0; idx < MAX_NR_ZONES; idx++) {
+ struct zone *z = &pgdat->node_zones[idx];
+ struct mem_cgroup_zone_cache *zc;
+
+ if (!populated_zone(z))
+ continue;
+
+ zc = &p->zone_cachep[idx];
+
+ INIT_LIST_HEAD(&zc->pages);
+ spin_lock_init(&zc->pages_lock);
+ }
+
+ p->memcg = memcg;
+ p->hold_limit = DEFAULT_PMC_HOLD_LIMIX;
+ p->allow_watermark = DEFAULT_PMC_GAP_WATERMARK;
+
+ atomic_inc(&pmc_nr_enabled);
+ }
+
+ if (static_branch_likely(&pmc_key))
+ static_branch_disable(&pmc_key);
+
+ //online
+ smp_wmb();
+ WRITE_ONCE(memcg->cache_enabled, true);
+ atomic_inc(&pmc_nr_enabled);
+
+ return 0;
+
+fail:
+ for_each_node(nid) {
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+
+ if (nodeinfo->cachep) {
+ kvfree(nodeinfo->cachep);
+ nodeinfo->cachep = NULL;
+ }
+ }
+
+ return -ENOMEM;
+}
+
+static int __disable_mem_cgroup_cache(struct mem_cgroup *memcg)
+{
+ int nid;
+
+ if (unlikely(mem_cgroup_cache_disabled(memcg)))
+ return -EINVAL;
+
+ //offline
+ WRITE_ONCE(memcg->cache_enabled, false);
+
+ for_each_node(nid) {
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+ struct mem_cgroup_per_node_cache *p;
+
+ p = nodeinfo->cachep;
+
+ mem_cgroup_release_cache(p);
+
+ kfree(p);
+ }
+
+ if (atomic_dec_and_test(&pmc_nr_enabled))
+ static_branch_enable(&pmc_key);
+
+ return 0;
+}
+
+static int mem_cgroup_cache_show(struct seq_file *m, void *v)
+{
+ struct mem_cgroup *memcg;
+ int nid;
+
+ if (static_branch_likely(&pmc_key))
+ return -EINVAL;
+
+ memcg = mem_cgroup_from_seq(m);
+ if (!READ_ONCE(memcg->cache_enabled))
+ return -EINVAL;
+
+ seq_printf(m, "%4s %16s %16s\n", "NODE", "WATERMARK", "HOLD_LIMIT");
+ for_each_online_node(nid) {
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+ struct mem_cgroup_per_node_cache *p;
+
+ p = nodeinfo->cachep;
+ if (!p)
+ continue;
+
+ seq_printf(m, "%4d %14uKB %14uKB\n", nid,
+ (READ_ONCE(p->allow_watermark) << (PAGE_SHIFT - 10)),
+ (READ_ONCE(p->hold_limit) << (PAGE_SHIFT - 10)));
+ }
+
+ seq_puts(m, "===========\n");
+ seq_printf(m, "%4s %16s %16s %16s\n", "NODE", "ZONE", "CACHE", "HIT");
+
+ for_each_online_node(nid) {
+ struct mem_cgroup_per_node *nodeinfo = memcg->nodeinfo[nid];
+ struct mem_cgroup_per_node_cache *p;
+ pg_data_t *pgdat = NODE_DATA(nid);
+ int idx;
+
+ p = nodeinfo->cachep;
+ if (!p)
+ continue;
+
+ for (idx = 0; idx < MAX_NR_ZONES; idx++) {
+ struct mem_cgroup_zone_cache *zc;
+ struct zone *z = &pgdat->node_zones[idx];
+
+ if (!populated_zone(z))
+ continue;
+
+ zc = &p->zone_cachep[idx];
+ seq_printf(m, "%4d %16s %14dKB %14dKB\n", nid, z->name,
+ (atomic_read(&zc->nr_pages)
+ << (PAGE_SHIFT - 10)),
+ (atomic_read(&zc->nr_alloced)
+ << (PAGE_SHIFT - 10)));
+ }
+ }
+
+ return 0;
+}
+
+enum {
+ OPT_CTRL_ENABLE,
+ OPT_CTRL_ERR,
+ OPR_CTRL_NR = OPT_CTRL_ERR,
+};
+
+static const match_table_t ctrl_tokens = {
+ { OPT_CTRL_ENABLE, "enable=%s" },
+ { OPT_CTRL_ERR, NULL } };
+
+/**
+ * This function can control target memcg's cache. include enable\keys set.
+ * To enable\disable this cache, by `echo enable=[y|n] > memory.cace`
+ * in target memcg.
+ */
+static ssize_t mem_cgroup_cache_control(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ bool enable;
+ bool opt_enable_set = false;
+ int err = 0;
+ char *sub;
+ struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+
+ buf = strstrip(buf);
+ if (!strlen(buf))
+ return -EINVAL;
+
+ while ((sub = strsep(&buf, " ")) != NULL) {
+ int token;
+ substring_t args[MAX_OPT_ARGS];
+ char tbuf[256];
+
+ sub = strstrip(sub);
+
+ token = match_token(sub, ctrl_tokens, args);
+ switch (token) {
+ case OPT_CTRL_ENABLE:
+ if (match_strlcpy(tbuf, &args[0], sizeof(tbuf)) >=
+ sizeof(tbuf))
+ return -EINVAL;
+
+ err = kstrtobool(tbuf, &enable);
+ if (err)
+ return -EINVAL;
+ opt_enable_set = true;
+ break;
+ case OPT_CTRL_ERR:
+ default:
+ return -EINVAL;
+ }
+ }
+
+ if (opt_enable_set) {
+ if (enable) {
+ __enable_mem_cgroup_cache(memcg);
+ } else {
+ __disable_mem_cgroup_cache(memcg);
+ return nbytes;
+ }
+ }
+
+ return err ? err : nbytes;
+}
+
static struct cftype memory_files[] = {
{
.name = "current",
@@ -7156,6 +7386,13 @@ static struct cftype memory_files[] = {
.flags = CFTYPE_NS_DELEGATABLE,
.write = memory_reclaim,
},
+ /* free cache field */
+ {
+ .name = "cache",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .write = mem_cgroup_cache_control,
+ .seq_show = mem_cgroup_cache_show,
+ },
{ } /* terminate */
};
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1beb56f75319..54c4d00c2506 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -530,6 +530,14 @@ static inline int pindex_to_order(unsigned int pindex)
return order;
}
+/**
+ * Per memcg cache currently only allow order 0.
+ */
+static inline bool pmc_allow_order(unsigned int order)
+{
+ return !order;
+}
+
static inline bool pcp_allowed_order(unsigned int order)
{
if (order <= PAGE_ALLOC_COSTLY_ORDER)
@@ -1271,6 +1279,43 @@ void __free_pages_core(struct page *page, unsigned int order)
__free_pages_ok(page, order, FPI_TO_TAIL);
}
+int mem_cgroup_release_cache(struct mem_cgroup_per_node_cache *nodep)
+{
+ LIST_HEAD(temp_list);
+ int zid, num = 0;
+
+ for (zid = 0; zid < MAX_NR_ZONES; ++zid) {
+ struct mem_cgroup_zone_cache *zc = &nodep->zone_cachep[zid];
+ int i = 0;
+
+ if (!atomic_read(&zc->nr_pages))
+ continue;
+
+ spin_lock(&zc->pages_lock);
+ list_splice_init(&zc->pages, &temp_list);
+ spin_unlock(&zc->pages_lock);
+
+ while (!list_empty(&temp_list)) {
+ struct page *page =
+ list_first_entry(&temp_list, struct page, lru);
+ struct zone *zone = page_zone(page);
+ unsigned long pfn = page_to_pfn(page);
+
+ list_del(&page->lru);
+
+
+ // is good to put into pcp?
+ free_one_page(zone, page, pfn, 0, FPI_NONE);
+ ++i;
+ }
+
+ num += i;
+ atomic_sub(i, &zc->nr_pages);
+ }
+
+ return num;
+}
+
/*
* Check that the whole (or subset of) a pageblock given by the interval of
* [start_pfn, end_pfn) is valid and within the same zone, before scanning it
@@ -2603,6 +2648,41 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
}
}
+static bool free_unref_page_to_pmc(struct page *page, struct zone *zone,
+ int order)
+{
+ struct mem_cgroup *memcg;
+ struct mem_cgroup_per_node_cache *cachp;
+ struct mem_cgroup_zone_cache *zc;
+ unsigned long flags;
+ bool ret = false;
+
+ if (pmc_disabled())
+ return false;
+
+ memcg = get_mem_cgroup_from_current();
+ if (!memcg || mem_cgroup_is_root(memcg) ||
+ mem_cgroup_cache_disabled(memcg))
+ goto out;
+
+ cachp = mem_cgroup_get_node_cachep(memcg, page_to_nid(page));
+ zc = &cachp->zone_cachep[page_zonenum(page)];
+
+ if (high_wmark_pages(zone) + READ_ONCE(cachp->allow_watermark) >=
+ zone_page_state(zone, NR_FREE_PAGES))
+ goto out;
+
+ spin_lock_irqsave(&zc->pages_lock, flags);
+ list_add(&page->lru, &zc->pages);
+ spin_unlock_irqrestore(&zc->pages_lock, flags);
+ atomic_inc(&zc->nr_pages);
+
+ ret = true;
+out:
+ mem_cgroup_put(memcg);
+ return ret;
+}
+
/*
* Free a pcp page
*/
@@ -2634,6 +2714,17 @@ void free_unref_page(struct page *page, unsigned int order)
}
zone = page_zone(page);
+
+ /**
+ * This function can cache release page before free into pcp if current
+ * memcg enabled cache feature.
+ * Compared to PCP, PMC is unique, only processes in PMC can access it.
+ * So, if the conditions are met, it should be prioritized to be
+ * released to PMC before being released to the public CPU cache.
+ */
+ if (pmc_allow_order(order) && free_unref_page_to_pmc(page, zone, order))
+ return;
+
pcp_trylock_prepare(UP_flags);
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (pcp) {
@@ -3012,6 +3103,49 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
return page;
}
+static struct page *rmqueue_mem_cgroup_cache(struct zone *preferred_zone,
+ struct zone *zone,
+ unsigned int order,
+ int migratetype)
+{
+ struct mem_cgroup *memcg;
+ struct mem_cgroup_per_node_cache *cachp;
+ struct mem_cgroup_zone_cache *zc;
+ unsigned long flags;
+ int nid = zone->zone_pgdat->node_id;
+ struct page *page = NULL;
+
+ if (pmc_disabled())
+ return NULL;
+
+ memcg = get_mem_cgroup_from_current();
+ if (!memcg || mem_cgroup_is_root(memcg) ||
+ mem_cgroup_cache_disabled(memcg))
+ goto out;
+
+ cachp = mem_cgroup_get_node_cachep(memcg, nid);
+
+ zc = &cachp->zone_cachep[zone_idx(zone)];
+ if (!atomic_read(&zc->nr_pages))
+ goto out;
+
+ spin_lock_irqsave(&zc->pages_lock, flags);
+ if (list_empty(&zc->pages)) {
+ spin_unlock_irqrestore(&zc->pages_lock, flags);
+ goto out;
+ }
+ page = list_first_entry(&zc->pages, struct page, lru);
+ list_del(&page->lru);
+ spin_unlock_irqrestore(&zc->pages_lock, flags);
+
+ atomic_dec(&zc->nr_pages);
+ atomic_inc(&zc->nr_alloced);
+
+out:
+ mem_cgroup_put(memcg);
+ return page;
+}
+
/*
* Allocate a page from the given zone.
* Use pcplists for THP or "cheap" high-order allocations.
@@ -3038,6 +3172,18 @@ struct page *rmqueue(struct zone *preferred_zone,
*/
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
+ /*
+ * Before disturb public pcp or buddy, current may in a memcg
+ * which already enabled cache feature.
+ * If that's true, first get page from private pool can boost alloc.
+ */
+ if (pmc_allow_order(order)) {
+ page = rmqueue_mem_cgroup_cache(preferred_zone, zone, order,
+ migratetype);
+ if (page)
+ goto out;
+ }
+
if (likely(pcp_allowed_order(order))) {
page = rmqueue_pcplist(preferred_zone, zone, order,
migratetype, alloc_flags);