From patchwork Thu Dec 9 13:46:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pierre Morel X-Patchwork-Id: 12666617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AE37AC433F5 for ; Thu, 9 Dec 2021 13:58:12 +0000 (UTC) Received: from localhost ([::1]:56762 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvJw3-00021v-Mx for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 08:58:11 -0500 Received: from eggs.gnu.org ([209.51.188.92]:39742) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvJkN-0006LM-7w; Thu, 09 Dec 2021 08:46:07 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:36200) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvJkK-00030Q-Cb; Thu, 09 Dec 2021 08:46:06 -0500 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1B9DXUQr029705; Thu, 9 Dec 2021 13:46:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=HtBjouYh/khg0VzKgu31GEOtWucFRevC0aoJP77O1TE=; b=kFGjUvYD0v9cv8bONCv9kAtaEex7siOpndv042Q7TXFg/dyFgMc57leKpG8zx4xWvUo0 YYrRwnqeO0Qeq098r0HAJLnuvGRUfFuZfYT5DNVZRrabQDM+1B/5GeGB6E9H2VrLEVQS 9YrKXdqFApTy+IKO0qBp5Fm4HzcGWZe/PI4Yw39Da2sz1JDKmPDAjmZEWUcj9O1nLNWk aeNDaAenPMOvpltuvCmUTRUzjAOkT2CDdXUU4N2J+BNW+Nw+BaNECkee9Z/oXeqkPjVx yg/mUM8YOBmk8DHBvG0cELNrmc1i/OVig1QLnMrd7l1JHh0KJVTr6zLL3tth9LebjrDf Zw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3cuj910yhn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Dec 2021 13:46:02 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1B9DDvLB022030; Thu, 9 Dec 2021 13:46:02 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 3cuj910yh2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Dec 2021 13:46:02 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1B9DbxeD009243; Thu, 9 Dec 2021 13:46:00 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma03ams.nl.ibm.com with ESMTP id 3cqyyahy0e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Dec 2021 13:46:00 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1B9DjvL730409046 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Dec 2021 13:45:57 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A2D911C054; Thu, 9 Dec 2021 13:45:57 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4D6C811C04C; Thu, 9 Dec 2021 13:45:56 +0000 (GMT) Received: from li-c6ac47cc-293c-11b2-a85c-d421c8e4747b.ibm.com.com (unknown [9.171.63.16]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 9 Dec 2021 13:45:56 +0000 (GMT) From: Pierre Morel To: qemu-s390x@nongnu.org Subject: [PATCH v5 12/12] s390: Topology: documentation Date: Thu, 9 Dec 2021 14:46:43 +0100 Message-Id: <20211209134643.143866-13-pmorel@linux.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20211209134643.143866-1-pmorel@linux.ibm.com> References: <20211209134643.143866-1-pmorel@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: StNHNJt1wlHwx7ptZmToUjUV3PS_UzKn X-Proofpoint-GUID: 8-wc9X6qvAqPUaq3MsST3PAfunTfxgV6 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2021-12-09_04,2021-12-08_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 mlxlogscore=999 lowpriorityscore=0 spamscore=0 phishscore=0 mlxscore=0 clxscore=1015 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2112090075 Received-SPF: pass client-ip=148.163.158.5; envelope-from=pmorel@linux.ibm.com; helo=mx0b-001b2d01.pphosted.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thuth@redhat.com, ehabkost@redhat.com, kvm@vger.kernel.org, david@redhat.com, eblake@redhat.com, cohuck@redhat.com, richard.henderson@linaro.org, qemu-devel@nongnu.org, armbru@redhat.com, pasic@linux.ibm.com, borntraeger@de.ibm.com, mst@redhat.com, pbonzini@redhat.com, philmd@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The use of the S390x CPU topology is explain in a new documentation file. Signed-off-by: Pierre Morel --- docs/system/s390x/numa-cpu-topology.rst | 273 ++++++++++++++++++++++++ 1 file changed, 273 insertions(+) create mode 100644 docs/system/s390x/numa-cpu-topology.rst diff --git a/docs/system/s390x/numa-cpu-topology.rst b/docs/system/s390x/numa-cpu-topology.rst new file mode 100644 index 0000000000..9ae15f792f --- /dev/null +++ b/docs/system/s390x/numa-cpu-topology.rst @@ -0,0 +1,273 @@ +NUMA CPU Topology on S390x +========================== + +IBM S390 provides a complex CPU architecture with several cache levels. +Using NUMA with the CPU topology is a way to let the guest optimize his +accesses to the main memory. + +The QEMU smp parameter for S390x allows to specify 4 NUMA levels: +core, socket, drawer and book and these levels are available for +the numa parameter too. + + +Prerequisites +------------- + +To take advantage of the CPU topology, KVM must give support for the +Perform Topology Function and to the Store System Information instructions +as indicated by the Perform CPU Topology facility (stfle bit 11). + +If those requirements are met, the capability ``KVM_CAP_S390_CPU_TOPOLOGY`` +will indicate that KVM can support CPU Topology on that LPAR. + + +Using CPU Topology in QEMU for S390x +------------------------------------ + + +QEMU -smp parameter +~~~~~~~~~~~~~~~~~~~ + +With -smp QEMU provides the user with the possibility to define +a Topology based on :: + + -smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books] \ + [,sockets=sockets][,cores=cores] + +The topology reported to the guest in this situation will provide +n cpus of a maximum of maxcpus cpus, filling the topology levels one by one +starting with CPU0 being the first CPU on drawer[0] book[0] socket[0]. + +For example ``-smp 5,books=2,sockets=2,cores=2`` will provide :: + + drawer[0]--+--book[0]--+--socket[0]--+--core[0]-CPU0 + | | | + | | +--core[1]-CPU1 + | | + | +--socket[1]--+--core[0]-CPU2 + | | + | +--core[1]-CPU3 + | + +--book[1]--+--socket[0]--+--core[0]-CPU4 + + +Note that the thread parameter can not be defined on S390 as it +has no representation on the CPU topology. + + +QEMU -numa parameter +~~~~~~~~~~~~~~~~~~~ + +With -numa QEMU provides the user with the possibility to define +the Topology in a non uniform way :: + + -smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books] \ + [,sockets=sockets][,cores=cores] + -numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator] + -numa cpu,node-id=node[,drawer-id=x][,book-id=x][,socket-id=x][,core-id=y] + +The topology reported to the guest in this situation will provide +n cpus of a maximum of maxcpus cpus, and the topology entries will be + +- if there is less cpus than specified by the -numa arguments + the topology will be build by filling the numa definitions + starting with the lowest node. + +- if there is more cpus than specified by the -numa argument + the numa specification will first be fulfilled and the remaining + CPU will be assigned to unassigned slots starting with the + core 0 on socket 0. + +- a CPU declared with -device does not count inside the ncpus parameter + of the -smp argument and will be added on the topology based on + its core ID. + +For example :: + + -smp 3,drawers=8,books=2,sockets=2,cores=2,maxcpus=64 + -object memory-backend-ram,id=mem0,size=10G + -numa node,nodeid=0,memdev=mem0 + -numa node,nodeid=1 + -numa node,nodeid=2 + -numa cpu,node-id=0,drawer-id=0 + -numa cpu,node-id=1,socket-id=9 + -device host-s390x-cpu,core-id=19 + +Will provide the following topology :: + + drawer[0]--+--book[0]--+--socket[0]--+--core[0]-CPU0 + | | + | +--core[1]-CPU1 + | + +--socket[1]--+--core[0]-CPU2 + + drawer[2]--+--book[0]--+--socket[1]--+--core[1]-CPU19 + + +S390 NUMA specificity +--------------------- + +Heterogene Memory Attributes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The S390 topology implementation does not use ACPI HMAT to specify the +cache size and bandwidth between nodes. + +Memory device +~~~~~~~~~~~~~ + +When using NUMA S390 needs a memory device to be associated with +the nodes definitions. As we do not use HMAT, it has little sense +to assign memory to each node and one should assign all memory to +a node without CPU and use other nodes to define the CPU Topology. + +Exemple :: + + -object memory-backend-ram,id=mem0,size=10G + -numa node,nodeid=0,memdev=mem0 + + +CPUs +~~~~ + +In the S390 topology we do not use threads and the first topology +level is the core. +The number of threads can no be defined for S390 and is always equal to 1. + +When using NUMA, QEMU issues a warning for CPUS not assigned to nodes. +The S390 topology will silently assign unassigned CPUs to the topology +searching for free core starting on the first core of the first socket +in the first book. +This is of course advised to assign all possible CPUs to nodes to +guaranty future compatibility. + + +The topology provided to the guest +---------------------------------- + +The guest , when the CPU Topology is available as indicated by the +Perform CPU Topology facility (stfle bit 11) may use two instructions +to retrieve the CPU topology and optimize its CPU scheduling: + +- PTF (Perform Topology function) which will give information + about a change in the CPU Topology, that is a change in the + result of the STSI(15,1,2) instruction. + +- STSI (Stote System Information) with parameters (15,1,2) + to retrieve the CPU Topology. + +Exemple :: + + -smp 3,drawers=8,books=2,sockets=2,cores=2,maxcpus=64 + -object memory-backend-ram,id=mem0,size=10G + -numa node,nodeid=0,memdev=mem0 + -numa node,nodeid=1 + -numa node,nodeid=2 + -numa cpu,node-id=1,drawer-id=0 + -numa cpu,node-id=2,socket-id=9 + -device host-s390x-cpu,core-id=19 + +Formated result for STSI(15,1,2) showing the 6 different levels +with: +- levels 2 (socket) and 1 (core) used. +- 3 sockets with a CPU mask for CPU type 3, non dedicated and + with horizontal polarization. +- The first socket contains 2 cores as specified by the -smp argument +- The second socket contains the 3rd core defined by the -smp argument +- both these sockets belong to drawer-id=0 and to node-1 +- The third socket hold the CPU with core-id 19 assigned to socket-id 9 + and to node-2 + +Here the kernel view :: + + mag[6] = 0 + mag[5] = 0 + mag[4] = 0 + mag[3] = 0 + mag[2] = 32 + mag[1] = 2 + MNest = 2 + socket: 1 0 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : c000000000000000 + + socket: 1 1 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : 2000000000000000 + + socket: 1 9 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : 0000100000000000 + +And the admin view :: + + # lscpu -e + CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS + 0 0 0 0 0 0 0:0:0:0 yes yes horizontal 0 + 1 0 0 0 0 1 1:1:1:1 yes yes horizontal 1 + 2 0 0 0 1 2 2:2:2:2 yes yes horizontal 2 + 3 0 1 1 2 3 3:3:3:3 yes yes horizontal 19 + + +Hotplug with NUMA +----------------- + +Using the core-id the topology is automatically calculated to put the core +inside the right socket. + +Example:: + + (qemu) device_add host-s390x-cpu,core-id=8 + + # lscpu -e + CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS + 0 0 0 0 0 0 0:0:0:0 yes yes horizontal 0 + 1 0 0 0 0 1 1:1:1:1 yes yes horizontal 1 + 2 0 0 0 1 2 2:2:2:2 yes yes horizontal 2 + 3 0 1 1 2 3 3:3:3:3 yes yes horizontal 19 + 4 - - - - - ::: no yes horizontal 8 + + # chcpu -e 4 + CPU 4 enabled + # lscpu -e + CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS + 0 0 0 0 0 0 0:0:0:0 yes yes horizontal 0 + 1 0 0 0 0 1 1:1:1:1 yes yes horizontal 1 + 2 0 0 0 1 2 2:2:2:2 yes yes horizontal 2 + 3 0 1 1 2 3 3:3:3:3 yes yes horizontal 19 + 4 0 2 2 3 4 4:4:4:4 yes yes horizontal 8 + +One can see that the userland tool reports serials IDs which do not correspond +to the firmware IDs but does however report the new CPU on it's own socket. + +The result seen by the kernel looks like :: + + mag[6] = 0 + mag[5] = 0 + mag[4] = 0 + mag[3] = 0 + mag[2] = 32 + mag[1] = 2 + MNest = 2 + 00 - socket: 1 0 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : c000000000000000 + + socket: 1 1 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : 2000000000000000 + + socket: 1 9 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : 0000100000000000 + + socket: 1 4 + cpu type 03 d: 0 pp: 0 + origin : 0000 + mask : 0080000000000000