From patchwork Tue Feb 20 11:36:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rongwei Wang X-Patchwork-Id: 13563894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0012C48BC3 for ; Tue, 20 Feb 2024 11:36:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08F006B006E; Tue, 20 Feb 2024 06:36:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 03C406B0075; Tue, 20 Feb 2024 06:36:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD1BE6B0074; Tue, 20 Feb 2024 06:36:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C98D26B006E for ; Tue, 20 Feb 2024 06:36:49 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 960CE1C05CC for ; Tue, 20 Feb 2024 11:36:49 +0000 (UTC) X-FDA: 81811980138.01.E147D7D Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) by imf25.hostedemail.com (Postfix) with ESMTP id 787A7A0009 for ; Tue, 20 Feb 2024 11:36:47 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=myYCSnem; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf25.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708429008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Cz1c5ISGmBdsxneLIlZx/BdFlZMirzknWzhxQibv2VY=; b=f3r3jQBiEHM1Q8GZ1XU1xsG44E2oSGhYPiKvCEpI7J71kiyzALMl8OOu4vI2ZwRWsScI+B S/sBjR7tU4m2/+ZI8HDHXv0e4fT0m/3G9XsDv2kStdz0Ps5xebFUD6UleQiv5dusJI+J5/ Qkwm0zpQQmNRmTkBichpVNL9Nu4u67k= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=myYCSnem; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf25.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708429008; a=rsa-sha256; cv=none; b=1Auea1QeT2NOe1uD4HqAgevPli1trUIb95+bh5wYLh0b2ear7euAVXDLgTXSGbebcE47FS PKlc8qJlwqX2ahCNvYHU6bxJGErNqKJ+emQdalsdNlCFfBw9heUpcF3dze2hfUaw8lc95g 7H1o4yA2Yqxx6Ic7OJoxpGSN4MZC6Y0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708429001; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=Cz1c5ISGmBdsxneLIlZx/BdFlZMirzknWzhxQibv2VY=; b=myYCSnem7v/BWmz0M7MvyuyhBIg4zODOaDg1YCDFptmor0rA7rPntF3nyD0WEKBXpCheZTtn9Gw9A+o/C4NkIpZKybcCkeYzAbJm5iegT0jzjpraD6P8ucGHbPpWQzi6zJr8klx57DPRwxA2FbMRTZlP6Cn96r1UNGHHpmMt4js= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R701e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0W0wabbT_1708428963; Received: from localhost.localdomain(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0W0wabbT_1708428963) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 19:36:40 +0800 From: Rongwei Wang To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: akpm@linux-foundation.org, gregkh@linuxfoundation.org, rafael@kernel.org, pierre.gondois@arm.com, mingo@redhat.com, dave.hansen@linux.intel.com, luto@kernel.org, teng.ma@linux.alibaba.com Subject: [PATCH v1 0/2] support NUMA emulation for genertic arch Date: Tue, 20 Feb 2024 19:36:00 +0800 Message-Id: <20240220113602.6943-1-rongwei.wang@linux.alibaba.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20231012024842.99703-1-rongwei.wang@linux.alibaba.com> References: <20231012024842.99703-1-rongwei.wang@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 787A7A0009 X-Stat-Signature: pwcmbtj6u8nddyihmsmg59s3xzau3uzk X-Rspam-User: X-HE-Tag: 1708429007-940028 X-HE-Meta: U2FsdGVkX19jHYu9n6gkj6jkVTQpJzFXKNVjNOnjS6Iu/CvASaCKuMSl1od9vTKDu9fmNi0m4AJPrjjNhVExElRGmG8NU8NFAK9UBC9lbUtqHAtT5XfQ0S5MwDOcZ53mdEJmtDwyQf0c5feIOanUv9QT9zEWorhSiwZ4j6GvfBLGf5JTCMkcAcGqA1LffVjJ0FnW5TA7W119dqtao3N3uoFDwijCkN/3AQJWCaRQIdoEIYdQESH64OsGqFGhJj5OnXkaQb4xIbmmYim5U+TlDnQwabHBVhvknSnvf5rUjSthVuck5hknzJcSX4GnjyTlBAQ+AxiACif2ZrfrTh41Plh55x8nQU5tPt1e4eRJDSl80NSk7mwcLADM8jyytPTiJNKSdU7fevei6GKY7dNZnwsp5JkTSheoC1/Pj/vedMD4rCVVkjWCDfPgRqj0NpizbRnUTxX0iQlrV23DoRpPDXDlPOBq/79wfTY3nHOGe3UfKYn4ztvMxcQPlK/SSafXPhQ8dFAyyYF5+VAZ0VyT+6ihR2swMYflKaptIpGpPBOKiYjDPk17K5U0Bo9NTUo+3IvU5l7LtmVO3rzzsUjY7PXQPp3PlS6ulmYQQq9CTSXd2Rsuo7v361GqF9gejOhYRASH3mszbfvD6+DhDRcjJwwnVDEJz7FCmv+HbhKpo5ztlLicby4EKx2fyuOMsBaAufqq6SNTlLKVAdCLqlNEkm21URbU7HwX3t5D7b7ZVbexuQW6blW7eR1MQxUoeHBPM9Dojvhu6NBQ1VK1/F4EfI5qNZbuzdoXgZ3s9BX3jejYv3bRar8eGEW9+29D9Sl/qkCRNHe5s4KuGGLlcOHUGxEw+/xlFGDO5RqhiMMwtUI/ZnbZP6kTKLLBqCcu+nILfRMuYD/SO0Qq198Tk4RI0Pms435bZmfN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A brief introduction ==================== The NUMA emulation can fake more node base on a single node system, e.g. one node system: [root@localhost ~]# numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 31788 MB node 0 free: 31446 MB node distances: node 0 0: 10 add numa=fake=2 (fake 2 node on each origin node): [root@localhost ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 15806 MB node 0 free: 15451 MB node 1 cpus: 0 1 2 3 4 5 6 7 node 1 size: 16029 MB node 1 free: 15989 MB node distances: node 0 1 0: 10 10 1: 10 10 As above shown, a new node has been faked. As cpus, the realization of x86 NUMA emulation is kept. Maybe each node should has 4 cores is better (not sure, next to do if so). Why do this =========== It seems has following reasons: (1) In x86 host, apply NUMA emulation can fake more nodes environment to test or verify some performance stuff, but arm64 only has one method that modify ACPI table to do this. It's troublesome more or less. (2) Reduce competition for some locks. Here an example we found: will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious hotspot on lruvec->lock when test in single environment. What's more, The performance improved greatly if test in two more nodes system. The data shows below (more is better): --------------------------------------------------------------------- threads/process | 1 | 12 | 24 | 48 | 96 --------------------------------------------------------------------- one node | 14 1122 | 110 5372 | 111 2615 | 79 7084 | 72 4516 --------------------------------------------------------------------- numa=fake=2 | 14 1168 | 144 4848 | 215 9070 | 157 0412 | 142 3968 --------------------------------------------------------------------- | For concurrency 12, no lruvec->lock hotspot. For 24, hotspot | one node has 24% hotspot on lruvec->lock, but | two nodes env hasn't. --------------------------------------------------------------------- As for risks (e.g. numa balance...), they need to be discussed here. Lastly, it seems not a good choice to realize x86 and other genertic archs separately. But it can indeed avoid some architecture related APIs adjustments and alleviate future maintenance. The previous RFC link see [1]. Any advice are welcome, Thanks! Change log ========== RFC v1 -> v1 * add new CONFIG_NUMA_FAKE for genertic archs. * keep x86 implementation, realize numa emulation in driver/base/ for genertic arch, e.g, arm64. [1] RFC v1: https://patchwork.kernel.org/project/linux-arm-kernel/cover/20231012024842.99703-1-rongwei.wang@linux.alibaba.com/ Rongwei Wang (2): arch_numa: remove __init for early_cpu_to_node numa: introduce numa emulation for genertic arch drivers/base/Kconfig | 9 + drivers/base/Makefile | 1 + drivers/base/arch_numa.c | 32 +- drivers/base/numa_emulation.c | 909 ++++++++++++++++++++++++++++++++++ drivers/base/numa_emulation.h | 41 ++ include/asm-generic/numa.h | 2 +- 6 files changed, 992 insertions(+), 2 deletions(-) create mode 100644 drivers/base/numa_emulation.c create mode 100644 drivers/base/numa_emulation.h