From patchwork Tue Mar 25 17:19:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikhil Dhama X-Patchwork-Id: 14029291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0991C36008 for ; Tue, 25 Mar 2025 17:20:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 66F1F280014; Tue, 25 Mar 2025 13:20:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F364280005; Tue, 25 Mar 2025 13:20:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D3B9280014; Tue, 25 Mar 2025 13:20:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 189DA280005 for ; Tue, 25 Mar 2025 13:20:01 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BB9C1120982 for ; Tue, 25 Mar 2025 17:20:01 +0000 (UTC) X-FDA: 83260736202.02.CEDA4BD Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2050.outbound.protection.outlook.com [40.107.244.50]) by imf25.hostedemail.com (Postfix) with ESMTP id C1534A0004 for ; Tue, 25 Mar 2025 17:19:58 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=Ax3En+ji; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf25.hostedemail.com: domain of Nikhil.Dhama@amd.com designates 40.107.244.50 as permitted sender) smtp.mailfrom=Nikhil.Dhama@amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742923199; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=waJ/61+UogqXE+ow9jtD+wXAY4KjJphAcR3bp2zZsEA=; b=EL1RG1+stf1rFd+fkMHuT9x8su3PZ3/a358Vu8bh4a5TMchPsYCW1rdPIO7ska+/QryhD7 lkffPSPYSPIBInH30XaU0xR2uRdY0Uh/Sv+I77YtsplAqGOL9viM8K0TbB+OH++Px09q54 FW73xJ7ngodqdaY+l2/b6M7In5gznrY= ARC-Authentication-Results: i=2; imf25.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=Ax3En+ji; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf25.hostedemail.com: domain of Nikhil.Dhama@amd.com designates 40.107.244.50 as permitted sender) smtp.mailfrom=Nikhil.Dhama@amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1742923199; a=rsa-sha256; cv=pass; b=YQUiXiN/LghrQQ2HqTBsWHvrCtxqkiEuSX5qYUCpGcElk5IYUd5F5Iu3oO6j6YhbLFERQW n5TlPuRQenEyenEvm0a1oa4FmtuQhe/piyrErXN1ClDXN8bzZDTwwByiWElucSXnmipiTZ 6lXfNudXubR38F1U5M6YfS8YRcanb7I= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=fGLlvq2nHgg6qJufIOnLlL0MHL0FNWDXvPVvNN7OrTZnVLibV7Bb6CPVybGsZa76HDGp01Gpxsp9MWmhzkA///GPBb7+r+Jzk+9sjq5h5rM+NrWq9KDo0f3+7CUTDN0zJbXtWcqSptWYKVZ5D40+mli9UtYEsc5seOCyDeHQ2qfXLr4m2e5wk/c08mRJ+edW1aMlQO+rRD/OiSH/t0CaRHFQlFn6Z5Kb0qiTTxTY4ik/EYG6WjdHK+xXF/TK/XB9BUiQJPU+0TUQ7J++0IRdbjEZ2SEpGJ75qxitru3OxGrrbRCpPN6YIbIWVKrzlyChASFR+7pOVlvL2eryr33ZXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=waJ/61+UogqXE+ow9jtD+wXAY4KjJphAcR3bp2zZsEA=; b=YbHlBtkOtnhd6a7Ly+KzZLGDYU/drlwi7pCU0RAI4LaK6M3T+uI2HZ6+Ir1q5snNlnLKtI+rQLFZw2vVEaWeFTpwf9tK53Jl+PITiVNxlWmgII9VgjJ7AiY7T3vf5z3Ppzat3hRxKft69PV8bHN0hD59YcdmTAQcMocJH6oxTPLP+cm9+LKvgThzf3cdNJyJspZD5GAe2OqIgK/3ovyORF+Eze9A7mT6pgv4E4mhJD0dT9fIubSuchHzlZh6hkaSJ9KC+7Y8Zy0QMpYbZicwmLndlt246CTZzA6C/DFmfx3UEYcP3l5u2TLrl+v6JSz7s17XAdlPK1Qs+VMmoLhnDw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linux-foundation.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=waJ/61+UogqXE+ow9jtD+wXAY4KjJphAcR3bp2zZsEA=; b=Ax3En+jiBQzWUVE2RL3vdjxoB4p6p3wf4S7MyliShe9k3+NQJVAvnGHCIuxc3SEWdc6UHqtLKFcAHTrxqNHfrA/6Py6ZQRSszfyyGbP7a/3nsr1/eRmP29ISe56sRd8iGCUHeDitGIovlRzfskIgcT0sBVcHS50oYy2uyFOqgWk= Received: from BN9PR03CA0260.namprd03.prod.outlook.com (2603:10b6:408:ff::25) by DS7PR12MB8419.namprd12.prod.outlook.com (2603:10b6:8:e9::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8534.42; Tue, 25 Mar 2025 17:19:52 +0000 Received: from BN2PEPF000044A4.namprd02.prod.outlook.com (2603:10b6:408:ff:cafe::48) by BN9PR03CA0260.outlook.office365.com (2603:10b6:408:ff::25) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8534.42 via Frontend Transport; Tue, 25 Mar 2025 17:19:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN2PEPF000044A4.mail.protection.outlook.com (10.167.243.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8534.20 via Frontend Transport; Tue, 25 Mar 2025 17:19:51 +0000 Received: from spgblr-titan-01.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 25 Mar 2025 12:19:49 -0500 From: Nikhil Dhama To: , CC: Nikhil Dhama , Ying Huang , , , Bharata B Rao , Raghavendra Subject: [PATCH] mm: pcp: scale batch to reduce number of high order pcp flushes on deallocation Date: Tue, 25 Mar 2025 22:49:15 +0530 Message-ID: <20250325171915.14384-1-nikhil.dhama@amd.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN2PEPF000044A4:EE_|DS7PR12MB8419:EE_ X-MS-Office365-Filtering-Correlation-Id: ebbbe1be-9fb4-4440-5794-08dd6bc140e6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|1800799024|376014; X-Microsoft-Antispam-Message-Info: qpHdIgZb7pnc3cq/U/GscsT0xY+TV6EreGyLHBv2y8XGeE9OXVpk+5sp68YTNL8zNhG3GabklPewUtlcrCmII6MXzeUtOP2/QBbtlzNUVPnI8naQa1Js44+a/NAgxT8pt7GQbuHx2uDhnNesjyuCIpyRkgkcILVlK1F+jgD73iYbYUj3fp05suvP5aBJGaMsC65XkfDmvv2SzMlR8YUUnXN+kbg0PC1Wuyk9kNuImbl3GgkZMOF0FTXn1oKmuhO1iULcNlGZSfwKbStCvTQ98SWqfNtLu5rhDSxQnYAMIQXEzSSkg77BluqtMkZ1gPjROLBCzzljwA+/XwOsKA5pqWkZjzv3HlfTQz1XsvC3dyNGVKOTQ8xO2LDwQsM58Tz8JDfl6sFdyvqB2zLHQIIFWhJRR/dR2t0nXoOzkqk0BjGJqRiOHRmemJ2eWWom8BM+oJdxdVS/56yespCf91iHGk7DAoUuKdslQBkR+2L9RLfkBxDwkLMrZe7wUXqs0Ru3U3TylMt57xVtVANrMd8qCjw/w6IBoawB/Wvi3ES2w+XVPWQBo2WmnBK7M/5zs4ErO6CbrX8WxRwzT91VgxtMfxvczMARVSplP7GOa64X11diyQBdhFpvOSNw/nxQw+W7cWijQjMnBwW/H55tOZ3Ow+kkZQHQNqmQ4/AGp6EBRn34uaIPLqtIcLQoiwNxWoKpk8v0VVeCZ52RH8c0fgWanUAgovG8hUeO4xjjCYLVpIiCTGyCLVhEgU82VKxe+lZ/XfsTww/QE/D4rPlLOimp2nd5oNz26DXpru5Mze+pTbX0f7vzaSD2m6ci2AHQirXuk3x67gG2Of804+nJ9E63jLTnc2SLB4rloHFy6XBruGgC/8FK7hqIvnqQOGR6gh4yQ5vhB6mLnhb5irMYhAFpdHFCAvs2eduA9DAhnan0X6I7he/twSY0tJ1WaFKRmoHufRocamFNncJhSWmMUFZZfcrtZPtoH/5NHQUs3T7HqU/fFBkmSsJL9xXSI8Q6qMAHOXFq1W0A5PkHLd/JanQjcz+sONH7WMrEXD9AknIJ8erlEYkFDuGB2vs9rIuhyQtPM0j2fhTP1N9kjrzp+I+2BqF1F86ah8RH5tTLHfz0XMMJ43Qqj4f6htJLsUyH1R1nr6RR9CMorosAtHHXpRTO8PbUa1C45ZrL67vaWIXJkFev6MFnhtvg48c1UB07hKubtu+X65fkBlmMCuyId+jiZZlEyKBZKuYTzf08qQhSaRUIQPkzSlaSzRCgtKCKqjHEwdJOQBQseV9Ad49oZ7Tuax89xZE8JdMql0MNF4p7j5XYyEWjUEUDI3ZjqfSixFlKtY0B97KlYTPJIMM2Okz8gVa15eORIaUMRd6qBZw0AlXOv586wZdDbCJXbIZxepnUyk0uYQRaOPu8rsADE4704AAJnypfHipOrB6/KXt29tJWoNBgWDqLHhJ7MrW1LwyqxmuYZMdBa+i+mUnhEdWA3R4mK/5zQmdGmOHvnX6F1PY= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(1800799024)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Mar 2025 17:19:51.7707 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ebbbe1be-9fb4-4440-5794-08dd6bc140e6 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN2PEPF000044A4.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB8419 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C1534A0004 X-Stat-Signature: fobpuiutkmb14gsau4bxx8xucdnub8f5 X-Rspam-User: X-HE-Tag: 1742923198-668569 X-HE-Meta: U2FsdGVkX1+11KLessHZczovyNR5qCRFyxvDij42Ep/At8uxGq5sdoZeknllkdQUF5YnQSuRF9l5w26hV9DQQxpW3TLR9qXAht5Lix9maBacSEIs3ZZLAT1qRi9wMR44I2/9+KHW0FS5iem8i4FC3zToC45g8r30TeAQuoeuvFVGNvfbGjGLH8nvmgjeAWPZmftCoKGVBF3l3Q9ggXvk9Wt3i1SgMwbhKmIdfX0MyaIkBBW+QZIerhTTTwP73JH+hycxAW57r0WMAd/NmCTm/1c/HXZFmj+mQ8bvZzHD9dPA/nD2jtFH724bhPHE9Lu6Y+C8ES0CIJfnOAhxsxoqL7NiaF/xUDIlRrpwG9dmn4enMz+YaWWJgdd1qMrhhoNPxu9LhqBoAWTiQecu8FXZwZ1HVSbKqiDwRz2ORc7cnK1VNE8Ih6WmXYSeFlkh/L5pJsQcMxSQBT+QVgp4sgXGb1RKptbTa5o9s8gF8TyMENrMsKinYAEYLD1PSPjeSi9Tyg70ewsifSSJR44RghoKI8to3drZS6XRk6+pCnxT5aFf8qQc/qe0FmJVrVsxNx59I0zj8oIcvAoR+LdgTKYItgNBFePSsiVowMlAtHPpK2HPnNRuIIyN0wPiXgUWPqm6JM7MIgSDyoEmV8WPMrCsto2IsIFe0hQWOsKhUt1Acr4ZehxA9RGkuKYc0JuB3EwBWOzPVeVq8/w6uyfD4edZY7LmABfMx97LhPdShxVPWp1RLZj1bbJQVjvYM3Q6MJfLMazdqrfLVhFsZakT7J8/mvhDTfNsmhBMjgzGs5J5lyZ6B+tJYSoaszb+5KCZc02HNGlL9xm5lP2yA/6Cx324yJs9ppGHkKyt8Qrd3XqZZYHDFmePC0Ys2SMAfnxPjN0vQpgns2IrGIHqQiOoqYQaO3Nm4CMKotxYxucciqvN9eNgb6uF6MdJJE/pg0KwW1Pe61i8REgRa8Jve5vfpF/ 8bE6PSFy CZfyTfGNKgz19i1Uvz15j9t9XnBK1KGyyu0zYh3f2fmgz6BnFsslNV7iuVosgYbP7Dvk08x9v7TlmSA+C2IT6cEB57ZktCGgW2i905iWakUjNBdR71bvTU9+3nn1VSJTiuTvNoRT7PUH6hziztELlkmtyd6vLh7PX6+6nPaDf8UEkqfM7uqEgE1SHlnL6LZ3m9KugypsQB1dgEvtIpaROmxJ14Y97EfqXnoJmz8O4bVqCVL+HhKyiZ6r1OGQEYc0flX1iKAMrUzoHAIAwNUHcSP5JSUyUTolM6JX/2hq8uR/h2Gm24Fa17yrlzfCzCJPukcTQ+UXVWg+RH6vqmMZkJ4yIKfXpxh2WmRu5+zkxwKY5ffr+nWtPFsKjG7Gc6k+Kkwy7Zw3pmFIsJw7XuKyAMDO042Sr3ocU+RlhfeZYDrpd9NPIXOR2iawcvr5+60HKih3ISna7SGNiM7WiIB6wxvKKB3UlJyjzdV0VdMU16fxWtlELqY/fJaGF+FNfPhvaG/F3xIRcFfmJZx4TZcKouzayw/CElCVP45wdYgzEA8uV0na9RbESU4Mk1GBK4H1fHJfXdaqMudBLej4zV8lvuKBqBd6bjtCCAeKm+V8baOjT0hghu2qJTirEoxDy02onZM3pSCyAouSTmr7/mhaUTn9f67i0lkQ3BgDqpcPmmaLXrxpybtXBIcoq6j7880TH4Zvs3iyrZ3bYaIOBZBW8iX3mluBidP8uQtXlfXKXEpmPY1w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In old pcp design, pcp->free_factor gets incremented in nr_pcp_free() which is invoked by free_pcppages_bulk(). So, it used to increase free_factor by 1 only when we try to reduce the size of pcp list or flush for high order. and free_high used to trigger only for order > 0 and order < costly_order and free_factor > 0. and free_factor used to scale down by a factor of 2 on every successful allocation. for iperf3 I noticed that with older design in kernel v6.6, pcp list was drained mostly when pcp->count > high (more often when count goes above 530). and most of the time free_factor was 0, triggering very few high order flushes. Whereas in the current design, free_factor is changed to free_count to keep track of the number of pages freed contiguously, and with this design for iperf3, pcp list is getting flushed more frequently because free_high heuristics is triggered more often now. In current design, free_count is incremented on every deallocation, irrespective of whether pcp list was reduced or not. And logic to trigger free_high is if free_count goes above batch (which is 63) and there are two contiguous page free without any allocation. (and with cache slice optimisation). With this design, I observed that high order pcp list is drained as soon as both count and free_count goes about 63. and due to this more aggressive high order flushing, applications doing contiguous high order allocation will require to go to global list more frequently. On a 2-node AMD machine with 384 vCPUs on each node, connected via Mellonox connectX-7, I am seeing a ~30% performance reduction if we scale number of iperf3 client/server pairs from 32 to 64. So, though this new design reduced the time to detect high order flushes, but for application which are allocating high order pages more frequently it may be flushing the high order list pre-maturely. This motivates towards tuning on how late or early we should flush high order lists. for free_high heuristics. I tried to scale batch and tune it, which will delay the free_high flushes. score # free_high ----------- ----- ----------- v6.6 (base) 100 4 v6.12 (batch*1) 69 170 batch*2 69 150 batch*4 74 101 batch*5 100 53 batch*6 100 36 batch*8 100 3 scaling batch for free_high heuristics with a factor of 5 or above restores the performance, as it is reducing the number of high order flushes. On 2-node AMD server with 384 vCPUs each,score for other benchmarks with patch v2 along with iperf3 are as follows: iperf3 lmbench3 netperf kbuild (AF_UNIX) (SCTP_STREAM_MANY) ------- --------- ----------------- ------ v6.6 (base) 100 100 100 100 v6.12 69 113 98.5 98.8 v6.12 with patch 100 112.5 100.1 99.6 for network workloads, clients and server are running on different machines conneted via Mellanox Connect-7 NIC. number of free_high: iperf3 lmbench3 netperf kbuild (AF_UNIX) (SCTP_STREAM_MANY) ------- --------- ----------------- ------ v6.6 (base) 5 12 6 2 v6.12 170 11 92 2 v6.12 with patch 58 11 34 2 Signed-off-by: Nikhil Dhama Cc: Andrew Morton Cc: Ying Huang Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Bharata B Rao Cc: Raghavendra --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b6958333054d..326d5fbae353 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, * stops will be drained from vmstat refresh context. */ if (order && order <= PAGE_ALLOC_COSTLY_ORDER) { - free_high = (pcp->free_count >= batch && + free_high = (pcp->free_count >= (batch*5) && (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) && (!(pcp->flags & PCPF_FREE_HIGH_BATCH) || pcp->count >= READ_ONCE(batch)));