From patchwork Wed Mar 19 08:14:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikhil Dhama X-Patchwork-Id: 14022242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C948C35FFF for ; Wed, 19 Mar 2025 08:14:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5412280003; Wed, 19 Mar 2025 04:14:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2B6F280001; Wed, 19 Mar 2025 04:14:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA673280003; Wed, 19 Mar 2025 04:14:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A561B280001 for ; Wed, 19 Mar 2025 04:14:56 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 74EC31A1A98 for ; Wed, 19 Mar 2025 08:14:57 +0000 (UTC) X-FDA: 83237589834.17.53AB0A1 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2080.outbound.protection.outlook.com [40.107.244.80]) by imf02.hostedemail.com (Postfix) with ESMTP id 6F48B80006 for ; Wed, 19 Mar 2025 08:14:54 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=y9TZ6TzM; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf02.hostedemail.com: domain of Nikhil.Dhama@amd.com designates 40.107.244.80 as permitted sender) smtp.mailfrom=Nikhil.Dhama@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1742372094; a=rsa-sha256; cv=pass; b=Yf39/6G5+huGV7qxDwiQFFV+n+1Pak7gvgmE2Pj21qcaRfRUUDf+r9AcrxmdyG398vSkB1 IyshT0Wl5K3IYVB8vnfKjjrR29djNUBBe/v5CjpaHejPk4aMb0KbFrc8O8JyDtCIVm/c5x GnwHMusjAqkOBVP0irC/FhZ2VnhLOdU= ARC-Authentication-Results: i=2; imf02.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=y9TZ6TzM; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf02.hostedemail.com: domain of Nikhil.Dhama@amd.com designates 40.107.244.80 as permitted sender) smtp.mailfrom=Nikhil.Dhama@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742372094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w7NhgrxPEkWCRFnLmBgMH6va5kjrHWzHEh9+SwXYnYg=; b=O2wXoEJYa+vN37LdG81yJHZ1Ag2m2jepO4pw8doS7H2H29N6uqf9lCyJLLci32mivvuBJL ngqnC7B3/jH5RxH5utYeq6aA9kNcpkGjjy3iOhWlMkQahR/Rp73mzO/uLVf6bjzwNf0KYI 9RsnnQ9YQ6AX3uajqvEuHvG73g/Kuu8= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=T9xB4LngceMTLJ+A3SPixMH48PJAQftRZp9DgTCcDWNTXB5YKzrZbTdNgyfKAzkTMpWcHFAF1D+htEPNNtMqkTF3psW730sKNZZgEe0ND/n44lLAMzVJHqvdgoytaKCg3e/XYQMY60ygOYY/wkFAh5q5Q9Zj9a+B3kkEoJoW6zLfk5VwXHHPnlEkJfwqbZ5APqn86vCpMBDdvm98B3zDrPr1yXPqBbwoi6nsvxOX/Ceq9kpH6wneVGpt3pyJJ7jpPp19NPszquVe2/f9qNuePE5iJmyVtxnkuByYunb2lOCK1eL8GEThmLZkbux4/Q+RMVDMg1i5msTfK5eQIAOZzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=w7NhgrxPEkWCRFnLmBgMH6va5kjrHWzHEh9+SwXYnYg=; b=m1GvzFXS+IlV1F5s7N2mC6dCsztfJIT+Uox+A/Tv0dlfc1cjW+K0/Wod/x7nbwcgkuRruv9TMZclcZ0DmJehSQf49yPekyt3rzlsuJYGAo+pPWdR0peCDc8uqtfulrT6yoCgDZda6sfYDJGZGZaoB++V/yn8slIMfWk8mDwj/fiKt8N5WTr5I6dyMWSGBYMTHlzyaKS7miItgPiG37DRUsjomweJf1mRgT8O49BioaOSz6al2Cm3nLrLENB4eKpHCkXJLBOH5XMqQE+rQ2qTdiz0ziebjXcCDCgn6GJpowAYnGuTP+Yx/7ocVWS7iNAZXUQIQfYKUb85Y7kpyK5gXQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linux-foundation.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w7NhgrxPEkWCRFnLmBgMH6va5kjrHWzHEh9+SwXYnYg=; b=y9TZ6TzMJ823N0ZQLOK7BUST4+H3IYrgCf49wq670MDFOYprzsBiEzu8Vq2UUHkmR5ugl9m5xR+URNLDAeew3QwYrGJgmqVZf+M2UEbpoUNifysVPjd6Rqf6I2fQ4UXo41ffLa9wC8WE1hoJVAeoqwu4FQWWYlhNYu4ppMPeJbc= Received: from DM6PR04CA0019.namprd04.prod.outlook.com (2603:10b6:5:334::24) by SA1PR12MB8842.namprd12.prod.outlook.com (2603:10b6:806:378::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8534.33; Wed, 19 Mar 2025 08:14:48 +0000 Received: from CY4PEPF0000EE38.namprd03.prod.outlook.com (2603:10b6:5:334:cafe::c2) by DM6PR04CA0019.outlook.office365.com (2603:10b6:5:334::24) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8534.34 via Frontend Transport; Wed, 19 Mar 2025 08:14:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE38.mail.protection.outlook.com (10.167.242.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8534.20 via Frontend Transport; Wed, 19 Mar 2025 08:14:48 +0000 Received: from spgblr-titan-01.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 19 Mar 2025 03:14:45 -0500 From: Nikhil Dhama To: , CC: Nikhil Dhama , Ying Huang , , , Bharata B Rao , Raghavendra Subject: [PATCH -V2] mm: pcp: scale batch to reduce number of high order pcp flushes on deallocation Date: Wed, 19 Mar 2025 13:44:32 +0530 Message-ID: <20250319081432.18130-1-nikhil.dhama@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <871pw33695.fsf@DESKTOP-5N7EMDA> References: <871pw33695.fsf@DESKTOP-5N7EMDA> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE38:EE_|SA1PR12MB8842:EE_ X-MS-Office365-Filtering-Correlation-Id: c906c7fb-1afa-4854-1ce0-08dd66be1db3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|1800799024|376014|7053199007; X-Microsoft-Antispam-Message-Info: oacccStbXeDy76IQoxTrcm7F94e19ZpTODMCaxbllnIDmimofeDI5kjAnAhakge33Za6xU9l68lC3oWGzjNttPJh08tMuqTlThg3LHGqFAupJalTW3sShnUBfcaHgP2A4zR4Ggjlc+YxFO+3jVOPyGHUVTL89clb8ipb92lIou9cFYuLfvBd/aqiW+wcZqVz5GXZRcPvphHaN/mx4sUqBH4Pmqcno/+9tC4qDr8re6Kyrx+3IOYWxPODLdnLYyZQezgLRhJgQL66+75P7MzdLB+NVlWpXgRVI9K7WwGgI6sCyDy5+I5RrGkL1Jq9oanL+IKp1Dwij0wSZJAtZYRmyPl8KF9caX2J0pOWyMuq8xsPg0nFT3EP3F+YnIQbLjKUsg2zpKr3ugz1WjK1oK4sMvsM+sgS7HSSXqHIZzFGsWTUEyYUDEyRwtvK4c4e7O1//yA8vVO/Lv/Jtf5Vx62vZ2Bqp4PUY8AgFxyElULfKrt87h1C3GD5iPL4Zd6EiBNE9d9T/0hYjd5BLnyOsnsDmD/vRW9x1jPCSWjHz3l0wCuax4JVKJpHFwpfyuzZ8fe3lhnbMErNw1tjMqD5ykRjXw2dfb0JHgKmLXiBgcqurrKMrR/jEcxhMEKzKT+Z0FPdNKrm3PLbp3USCOwycYNp8M2YBmHn6FYd9yCv5yVGgebiuJYj6dMjls86eoqOzFImsrCv8mmjqyDdnaMTjFLQt+kR4cYlnnHc8eyRG0QSYKX0dQpMLuCg8bkkZTh7CFhXOXKAYk5DwBzDo3Y+fpMoA/GdAJGBS4id9rfMFHkxMTzqRcMJxabvEzk/16sOneE1gBK+j6CFl6vR6wz88yIbw5nXB/HmJCIyVJuRCVIhTclUHjVP8JPukhfw8bqxs2crJb5S5giXY/T1sAwX/mHp92SQVIFKTF9/Ncv7Aw6wYWvA8+Rktqt/TGCdIiCbkpBqJGJbYnbEtzKoLfhxJqco8BKF+OZ7f5JTYaLwgkVVmvegE/GDhdxrvTx60uoEZYigMCOx+B+OliwE5sfgmbOa3YEEoPUsRVx4uQi7MEDodog+Vh5I7cgnKbXUyU0rPTiWFMebKY4AgElfHIFvEL+lyenG+Z0eoueSI1oQ9RopE+pJIvLHsIKdXP3W6AMW0BgST5Xs1kfWaPr13gbDw4ZDtVDveFQ3IVMRJqckhW/SyQZTFshLijVLFNzrx9BS8Xt9V+j2NJ0jD+a4DB1Cl0VznjjrWPHEPKZM6NfO3zrQT532J+IFxE0u0CFtDXwrk0kV8fxl5NK+Age5VPYh+9xeQikAvYYgm5fwyPHciE4J7kAC+7Y7oz0w2rItWxuUXZfX59+OCElOtYCx6UNcRye3E7H1OC8S4XJ4CEOlCv2//iThlikxodJROjJFtU0koHVIohiZxrt7iso930JE4ySXTNpFdepCxlunKaUkLQrSX5DQmyKXHUpmAH0DCnV1gNuOCXHPro3z/StpCPF0IEnccXkbDbVJsaWibsYsLfysJfU= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(1800799024)(376014)(7053199007);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Mar 2025 08:14:48.3169 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c906c7fb-1afa-4854-1ce0-08dd66be1db3 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE38.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB8842 X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: derzby7eksnigjfbumi51m1cpx445ug8 X-Rspamd-Queue-Id: 6F48B80006 X-HE-Tag: 1742372094-399670 X-HE-Meta: U2FsdGVkX18MoBKhSD3ytK8fsfRLdibAeXNszqhBYSK71b7AoKm9TlaZ9apkowPue7MayN3zFY+6zaFNdS1bAWzPOzOWYHU2iO1ECuAAzcDJlEtMoHo0JTeCgqn+Vor+78+ST5FbZlOHq/prHm3fz51yRbzOq00nuakCcPdHGekYgB3wo7tYsHxYiBWmGwUzLkVsLNn/FkfJ7fS64qJNqjZcZaBG5UEkRVOrEOI+a5J+p7FJ5t6TCmRlsdD681HU3mQvrazklDZ8rOwa8yROyw1zote/i2+kdebc9qlH992qxR1teJBXi6bAv8YQ1fEIpA2uEnVXgonpAqm7uVYsWbnDRZYlCzTeZRavCfvKKg0Kwz0YpptmMqWVFWIG6Ca27lRXIDnYS3dpi+IiGXYVvSO45geQvbSa+T951KL8yMrEQGGTyKwxfjgKVs8j5noTiURURAqDHe2GNOkAZ20ssnUE/I9Wv/KQjSZwNFe2qJS6kXfW2gkIDyhi50FpAbMZs7DqhJDPCAcRfdjKaiFM+CyHtd0jBQ+tdo4qlqnqxRsavi6VeynBvx6o0f/hRUF+8qVp4bjM9/plI8Od9yUBhwaK+moYdUDVqMQeps+jNET7C3F7/5d2laMv83dI4FDe2Gnd4xuq/EW65HNxCMZP4cyCD8z88SxahVADYPwtxMAt3vZcvkAdNa6PXWIDwGnbeDWZag20Q+0eWoizCdCZDYyK1roAfPHt4WTZ27+T+plF/8n4isa4yhFhb9/ys9Y29nkqTMmMSsbeSVMH+TFHCB/CP8OqY1IsAvznoTywu5vBYg/PiY/wK3jzCyouHjdWQNWeUJxRp9Zsun47/Udttcac6AHcBWReRtnluzM7rpGeNMr0GL4RM4tQZatuj428zfhL1AKoNOteclL4+/F46jTpdQmBg/yde/5YJVWWokkcPAFanmo91gZFysGusIrx2YrN32tg5UgYKKs9MCj En0bNKDE h1snZ4pddvE57VXGYa670FJ4LOMMuvcLpPR/c7/dBBy4TGu1ohRyx8e2wpXhhPheq3kDINMglfKduaWqiGsEmWG9p2ITV9qHx2i4oQEbwGdH45+0FkOIAywdRBiGBYVc+zbrqUhED6egUbzKeM+FtzVbf9SwFxO+4HIUxR6S4iYYam2+IV9gWXlO67vfFrjl/Nilw8/+WvHsLatqM7vv6h9nzIIe6H5bqsi0fLxsuqU9Ob8YCnWd7Ihw8bwO5Y32PMKDa+wPn+DU/q3nMEzPMiAyxspjdywuxvYF3/IDf3gPbmRVxn6iv8Y3+hc/llyPJazq00h27Gm7RH1mylc8/23eNAsWZFiocjNBU7ShHfKLcPyZL4Kpxk6l+O4kBhIhb4Cnv+ixmDfdfidiJWy9rbTh9N6DQ5f6Po6eflOBz7I9//cI1P46ztfrjrBaWd903DCeBA6zPKAYL4B8xb9VIoieOlUmVpwtZOzuBT8eYmQNo1BsedzgLQtRbNETLd+Sa0toZSD92gHPtIkI5aOBKpaWYuEcMmPenUHM/bympjKOTpoyTm38jJ9n56LLN0/Ns/Okcx/KKgF+rPAHPhmbLlvYQj1cA4u28hx8Ob6kcCqiaUDi/96yDb1Zr01bgWxwJ8AIJ1ETv3EbL3o4l3X5mznIS6uSV2VNbSNB4ziDap9am1VNF21DWO3I22wP2mvRKMBCGxEAaqyEE9bOD475AVfjzwNstrEmd2QeCihjRiHmoT6e/Pry5KnjDydNf9f0l6Qap X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/12/2025 2:10 PM, Huang, Ying wrote: > > Nikhil Dhama writes: > >> On 1/29/2025 10:01 AM, Andrew Morton wrote: >>> On Wed, 15 Jan 2025 19:19:02 +0800 "Huang, Ying" wrote: >>> >>>> Andrew Morton writes: >>>> >>>>> On Tue, 7 Jan 2025 14:47:24 +0530 Nikhil Dhama wrote: >>>>> >>>>>> In current PCP auto-tuning desgin, free_count was introduced to track >>>>>> the consecutive page freeing with a counter, This counter is incremented >>>>>> by the exact amount of pages that are freed, but reduced by half on >>>>>> allocation. This is causing a 2-node iperf3 client to server's network >>>>>> bandwidth to drop by 30% if we scale number of client-server pairs from 32 >>>>>> (where we achieved peak network bandwidth) to 64. >>>>>> >>>>>> To fix this issue, on allocation, reduce free_count by the exact number >>>>>> of pages that are allocated instead of halving it. >>>>> The present division by two appears to be somewhat randomly chosen. >>>>> And as far as I can tell, this patch proposes replacing that with >>>>> another somewhat random adjustment. >>>>> >>>>> What's the actual design here? What are we attempting to do and why, >>>>> and why is the proposed design superior to the present one? >>>> Cc Mel for the original design. >>>> >>>> IIUC, pcp->free_count is used to identify the consecutive, pure, large >>>> number of page freeing pattern. For that pattern, larger batch will be >>>> used to free pages from PCP to buddy to improve the performance. Mixed >>>> free/allocation pattern should not make pcp->free_count large, even if >>>> the number of the pages freed is much larger than that of the pages >>>> allocated in the long run. So, pcp->free_count decreases rapidly for >>>> the page allocation. >>>> >>>> Hi, Mel, please correct me if my understanding isn't correct. >>>> >>> hm, no Mel. >>> >>> Nikhil, please do continue to work on this - it seems that there will >>> be a significant benefit to retuning this. >> >> Hi Andrew, >> >> I have analyzed the performance of different memory-sensitive workloads for these >> two different ways to decrement pcp->free_count. I compared the score amongst >> v6.6 mainline, v6.7 mainline and v6.7 with our patch. >> >> For all the benchmarks, I used a 2-socket AMD server with 382 logical CPUs. >> >> Results I got are as follows: >> All scores are normalized with respect to v6.6 (base). >> >> >> For all the benchmarks below (iperf3, lmbench3 unix, netperf, redis, gups, xsbench), >> a higher score is better. >> >> iperf3 lmbench3 Unix 1-node netperf 2-node netperf >> (AF_UNIX) (SCTP_STREAM_MANY) (SCTP_STREAM_MANY) >> ------- -------------- ------------------ ------------------ >> v6.6 (base) 100 100 100 100 >> v6.7 69 113.2 99 98.59 >> v6.7 with my patch 100 112.1 100.3 101.16 >> >> >> redis standard redis core redis L3 Heavy Gups xsbench >> -------------- ---------- -------------- ---- ------- >> v6.6 (base) 100 100 100 100 100 >> v6.7 99.45 101.66 99.47 100 98.14 >> v6.7 with my patch 99.76 101.12 99.75 100 99.56 >> >> >> and for graph500, hashjoin, pagerank and Kbuild, a lower score is better. >> >> graph500 hashjoin hashjoin pagerank Kbuild >> (THP always) (THP never) >> --------- ------------ ----------- -------- ------ >> v6.6 (base) 100 100 100 100 100 >> v6.7 101.08 101.3 101.9 100 98.8 >> v6.7 with my patch 99.73 100 101.66 100 99.6 >> >> from these result I can conclude that this patch is performing better >> or as good as base v6.7 on almost all of these workloads. > Sorry, this change doesn't make sense to me. > > For example, if a large size process exits on a CPU, pcp->free_count > will increase on this CPU. This is good, because the process can free > pages quicker during exiting with the larger batching. However, after > that, pcp->free_count may be kept large for a long duration unless a > large number of page allocation (without large number of page freeing) > are done on the CPU. So, the page freeing parameter may be influenced > by some unrelated workload for long time. That doesn't sound good. > > In effect, the larger pcp->free_count will increase page freeing batch > size. That will improve the page freeing throughput but hurt page > freeing latency. Please check the page freeing latency too. If larger > batch number helps performance without regressions, just increase batch > number directly instead of playing with pcp->free_count. > And, do you run network related workloads on one machine? If so, please > try to run them on two machines instead, with clients and servers run on > different machines. At least, please use different sockets for clients > and servers. Because larger pcp->free_count will make it easier to > trigger free_high heuristics. If that is the case, please try to > optimize free_high heuristics directly too. I agree with Ying Huang, the above change is not the best possible fix for the issue. On futher analysis I figured that root cause of the issue is the frequent pcp high order flushes. During a 20sec iperf3 run I observed on avg 5 pcp high order flushes in kernel v6.6, whereas, in v6.7, I observed about 170 pcp high order flushes. Tracing pcp->free_count, I figured with the patch v1 (patch I suggested earlier) free_count is going into negatives which reduces the number of times free_high heuristics is triggered hence reducing the high order flushes. As Ying Huang Suggested, it helps the performance on increasing the batch size for free_high heuristics. I tried different scaling factors to find best suitable batch value for free_high heuristics, score # free_high ----------- ----- ----------- v6.6 (base) 100 4 v6.12 (batch*1) 69 170 batch*2 69 150 batch*4 74 101 batch*5 100 53 batch*6 100 36 batch*8 100 3 scaling batch for free_high heuristics with a factor of 5 restores the performance. On AMD 2-node machine, score for other benchmarks with patch v2 are as follows: iperf3 lmbench3 netperf kbuild (AF_UNIX) (SCTP_STREAM_MANY) ------- --------- ----------------- ------ v6.6 (base) 100 100 100 100 v6.12 69 113 98.5 98.8 v6.12 with patch v2 100 112.5 100.1 99.6 for network workloads, clients and server are running on different machines conneted via Mellanox Connect-7 NIC. number of free_high: iperf3 lmbench3 netperf kbuild (AF_UNIX) (SCTP_STREAM_MANY) ------- --------- ----------------- ------ v6.6 (base) 5 12 6 2 v6.12 170 11 92 2 v6.12 with patch v2 58 11 34 2 Signed-off-by: Nikhil Dhama Cc: Andrew Morton Cc: Ying Huang Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Bharata B Rao Cc: Raghavendra --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b6958333054d..326d5fbae353 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, * stops will be drained from vmstat refresh context. */ if (order && order <= PAGE_ALLOC_COSTLY_ORDER) { - free_high = (pcp->free_count >= batch && + free_high = (pcp->free_count >= (batch*5) && (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) && (!(pcp->flags & PCPF_FREE_HIGH_BATCH) || pcp->count >= READ_ONCE(batch)));