From patchwork Thu Oct 12 02:48:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rongwei Wang X-Patchwork-Id: 13418179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A869CDB465 for ; Thu, 12 Oct 2023 02:48:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31F638D00F8; Wed, 11 Oct 2023 22:48:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A8828D0002; Wed, 11 Oct 2023 22:48:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 196C38D00F8; Wed, 11 Oct 2023 22:48:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0691B8D0002 for ; Wed, 11 Oct 2023 22:48:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C4C8A8042A for ; Thu, 12 Oct 2023 02:48:54 +0000 (UTC) X-FDA: 81335276988.28.AC56AD1 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by imf08.hostedemail.com (Postfix) with ESMTP id 3D6AD16000A for ; Thu, 12 Oct 2023 02:48:51 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697078933; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=DkkZZ//wYkiJURWukxbOdUoZN7k+qEzrl/oNW2L9xWA=; b=sxbz0AvQ2CJIU+7PkH2hYVXqitpG+cLOxSoYz2XZC6Wt6WalQA7F9WbzZt/zaeStSp+L1f 5WQao1b6OQbI9MmbGnpDCZceuhPcrGo7LVbTvYYS3zg2RzltwbFnIDwqm7hPD80I5AteiW eJTm3e7vs5DJDCIds0gf7UPlc5KcO3A= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697078933; a=rsa-sha256; cv=none; b=d+5nCrGW3+FiWOY9odYWJIRxaw9VOgsbAjgo5fdt/O/Djd1qFG4PyMrfkLEUZgv4rtDb0G /DHaHqXEbh2L9ihNDL+4+cVMRGL4nnzP8dHtz1ogXD0NW5J7+qttm3vrBSy0ja3VRzC9GQ TK51QCCmHJJHApRPxFcszvjKmpqQTpw= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0VtykMeL_1697078924; Received: from localhost.localdomain(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0VtykMeL_1697078924) by smtp.aliyun-inc.com; Thu, 12 Oct 2023 10:48:47 +0800 From: Rongwei Wang To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: akpm@linux-foundation.org, willy@infradead.org, catalin.marinas@arm.com, dave.hansen@linux.intel.com, tj@kernel.org, mingo@redhat.com Subject: [PATCH RFC 0/5] support NUMA emulation for arm64 Date: Thu, 12 Oct 2023 10:48:37 +0800 Message-Id: <20231012024842.99703-1-rongwei.wang@linux.alibaba.com> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 3D6AD16000A X-Rspam-User: X-Stat-Signature: x4xrisjzxhz91ejhr7un9b8pt65fony3 X-Rspamd-Server: rspam01 X-HE-Tag: 1697078931-271644 X-HE-Meta: U2FsdGVkX18byarhKaLdpxCKc5kPyqagDsmjZKz+UmZIOEzY7Wj6LR2gMprcil34anPAkbVlRurX7ZJpyWjWCnY4EXy/J7yEQQB3fgfAZpmn4Ej/3o1RMclstk7IfIXFGex6zF33UnM81jZ/o+8Cq5YGRGIt80aLvMrdlpMUnMUukD3X9JgOZd+XuXEfsG70rsf2qvBu2XPOfY6W1UA8xebwM9Q/Hf0n43lWJCmBmFjTbYeViKbVOFn/cQ25OOaBzK4SizJxcervzdnvhh7fSeAUWpYLaYzkLXxZ2W3ZW61VkPF7JEMMbiluXpkFssOy6ImSpdwWkKNKAYqz6U3T8zcjP0+/KBG1d/uHQGU4LVbFykFwvfr/by/+695y3aAGQqMST/FVCzQxDMfT2fIZoPb9sIRO61NkgG0T+OyGDlPUnJK9Niq4lheBoddJ2l5y1BI5onwSc6xsXy9/rL6tGQ1kL1VXfmdV1+u1lY66utTG5rOk9UApde7DGVK9aLJpTBtp/RDIVtOhtgzS4nPBrK6gtwtR2HUfbNlSnXEIYclHlkv1QWy5nKzz5ONTIOeYVlvGVvrvjDTvG7dLmxAJ7njrxQ1jpDzsE9ZocfDL2GPBiMTPSYRlnhrrKIZSamk4TYH6uvhX+L//Uo0CJxm1lNldExnD7BGbloh+y4D2pJqd3Vp67rFLbhnpSb98Ahq5d5OJiVoEgrAEQ+s/p8+PlV2bJAX7pAizVbiPMaTyamba3UJTTpYVUCrOdMQkkqzDrHsGZanoiKdrZCGHw8YOgANk+NbsEU7RQPYGuWffiKO2Wr3YZKsvewCYVGrdh6tfX6puJqWqWtKtjHX+vHXqCMJ+CO3Xinz5eO54TqCZ3PcwZ0IS/cEpL5VZtvpAGdT+768/gGBZo6wkwn6ApkyO5l3f4qAFhq9P/q/ewokaBQx5NrChsOjv2Yi9iPeZ3TVKhR5cv4KoTwB6g4CTAYH UGvRfGKE EsagjnXdtVQIdu0WcC8rVSPsW2hSMT04JiXinvK6nLPRdspl2qlrVk5ibTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A brief introduction ==================== The NUMA emulation can fake more node base on a single node system, e.g. one node system: [root@localhost ~]# numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 31788 MB node 0 free: 31446 MB node distances: node 0 0: 10 add numa=fake=2 (fake 2 node on each origin node): [root@localhost ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 15806 MB node 0 free: 15451 MB node 1 cpus: 0 1 2 3 4 5 6 7 node 1 size: 16029 MB node 1 free: 15989 MB node distances: node 0 1 0: 10 10 1: 10 10 As above shown, a new node has been faked. As cpus, the realization of x86 NUMA emulation is kept. Maybe each node should has 4 cores is better (not sure, next to do if so). Why do this =========== It seems has following reasons: (1) In x86 host, apply NUMA emulation can fake more nodes environment to test or verify some performance stuff, but arm64 only has one method that modify ACPI table to do this. It's troublesome more or less. (2) Reduce competition for some locks. Here an example we found: will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious hotspot on lruvec->lock when test in single environment. What's more, The performance improved greatly if test in two more nodes system. The data shows below (more is better): --------------------------------------------------------------------- threads/process | 1 | 12 | 24 | 48 | 96 --------------------------------------------------------------------- one node | 14 1122 | 110 5372 | 111 2615 | 79 7084 | 72 4516 --------------------------------------------------------------------- numa=fake=2 | 14 1168 | 144 4848 | 215 9070 | 157 0412 | 142 3968 --------------------------------------------------------------------- | For concurrency 12, no lruvec->lock hotspot. For 24, hotspot | one node has 24% hotspot on lruvec->lock, but | two nodes env hasn't. --------------------------------------------------------------------- As for risks (e.g. numa balance...), they need to be discussed here. Lastly, this just is a draft, I can improve next if it's acceptable. Thanks! Rongwei Wang (5): mm/numa: move numa emulation APIs into generic files mm: percpu: fix variable type of cpu arch_numa: remove __init in early_cpu_to_node() mm/numa: support CONFIG_NUMA_EMU for arm64 mm/numa: migrate leftover numa emulation into mm/numa.c arch/x86/Kconfig | 8 - arch/x86/include/asm/numa.h | 3 - arch/x86/mm/Makefile | 1 - arch/x86/mm/numa.c | 216 +------------- arch/x86/mm/numa_internal.h | 14 +- drivers/base/arch_numa.c | 7 +- include/asm-generic/numa.h | 33 +++ include/linux/percpu.h | 2 +- mm/Kconfig | 8 + mm/Makefile | 1 + arch/x86/mm/numa_emulation.c => mm/numa.c | 333 +++++++++++++++++++++- 11 files changed, 373 insertions(+), 253 deletions(-) rename arch/x86/mm/numa_emulation.c => mm/numa.c (63%)