From patchwork Thu Aug 29 13:06:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Huang X-Patchwork-Id: 13783176 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6EB1C83F16 for ; Thu, 29 Aug 2024 13:07:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFF0A6B007B; Thu, 29 Aug 2024 09:07:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D88066B0083; Thu, 29 Aug 2024 09:07:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C024E6B0085; Thu, 29 Aug 2024 09:07:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9DC846B007B for ; Thu, 29 Aug 2024 09:07:18 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1B6AD120F36 for ; Thu, 29 Aug 2024 13:07:18 +0000 (UTC) X-FDA: 82505308956.10.B425976 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf03.hostedemail.com (Postfix) with ESMTP id 3880F2003C for ; Thu, 29 Aug 2024 13:07:14 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IBVqrp/j"; spf=pass (imf03.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724936745; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=y9EpLMOMAJaE68hMt3i4/fVjXUl7E1TUy6E8jHMwuaU=; b=HC2EULEoydKIP3SNgTKQdvs8xFEYX4UyGj/NyqaJJw/D22k/bbguYJLfZBfkmbToOD26QC jShKxqTnIkf6hBsnLDkFj+N1w4q2Sch+bP0MHMJ/i+YTTqfH+PLGkw9A+t4sf8cdQFcRWo kEtADuH0s7NjY7ycED64PT3tLs/i9To= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724936745; a=rsa-sha256; cv=none; b=EayD0dWI1BjNmnH+ztZ9GNXfed6P5ZBMH/77ijuEhxuHSKFIw7aL/Z53gRWwNA4ImqAXEp oAiBK36FbpZeCxJzG5kT0OrQMUsPgmtMVMaIbbcydZniloqRPxnqSVjtUpgXQrXdFTfsab JY9Kry9Z7ZhbtnuK7LTaW7QNMNBXe8A= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IBVqrp/j"; spf=pass (imf03.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-715cc93694fso540420b3a.2 for ; Thu, 29 Aug 2024 06:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724936834; x=1725541634; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=y9EpLMOMAJaE68hMt3i4/fVjXUl7E1TUy6E8jHMwuaU=; b=IBVqrp/jRNZ+ir9Dx965wsFxeJrVM0lc3xz8jP0689M+e94v005gqoS0SbPopYLKsZ xRYcwCEMxrYOlzW+KBeKvP7S04FhOgThKFsD0Y2bYAx351zV3EHgxoCB1EkEWq59TUYt zsOLZ/EdEEJikNAh3sbZWohiBCNTiZFvkM0/43wN6b/pite+F5xbqF5VjgnLFHMAIyKz CbsvBB7zQ+av7jivMIumSrnFxgpV4fSSsL6xbrBUbc0GE4LSRrSZJ5xPjz8IhLjZ3gCQ vXiJcWlJEderpMwphcmHMmd+W+2kp2wD8ocR3tszB7jmcRz1LQumEtlkSodN5R1ksmvr yKkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724936834; x=1725541634; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=y9EpLMOMAJaE68hMt3i4/fVjXUl7E1TUy6E8jHMwuaU=; b=YGLwYOFjmea9kEwa9TKwXdiB+HkH5k7zlCynAhg8gJTqVPdHSdod7BN9VH7JN7svYK mVkg0WFMdFjCEU3zGGVxEhds1kKuCcRult0u3WzWu2uMbngXRvBxtJZxkhjSjkwcSmjk OqeCArYKZTkO05jItpqic7yLYtGz4+gdS+3ZsVoS9d1Ynp7RqG9bbcbm0h5JlVFZgjAv uJfeVdOMcEoUY2A/VhdaGO1B6YySDEZy7dSnkrmRd35tLdBk1sS3Gk5mmXX4tUw5wDL/ 7m6fF5CnkyATG7mweS3R2kU4ooYTHH2xsFqxWk9XKiO+oixt6mq/EcY9DGMWHF2d/xVu 2Kzw== X-Gm-Message-State: AOJu0YxrZ2eGpWQgo3GKNLPJx3AzDZvT8vIOQeIpSgM1vkEAJbgrFa1P 1PFjVT5UywTNHycjZgmDrf+5w8NaGiCs6bNHxQXH63P/s/uMb6dL X-Google-Smtp-Source: AGHT+IH1IKaqlZuFByO2CwywveQwPI/BFTTibIrS0gNBtRbDctRYoyh+p7lqK61u4EjetDco+Zk0Ww== X-Received: by 2002:a17:902:e80d:b0:1fc:568d:5f05 with SMTP id d9443c01a7336-2050c34f7efmr34708975ad.8.1724936833073; Thu, 29 Aug 2024 06:07:13 -0700 (PDT) Received: from AHUANG12-3ZHH9X.lenovo.com (220-143-217-49.dynamic-ip.hinet.net. [220.143.217.49]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-205152b13c9sm10909145ad.62.2024.08.29.06.07.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Aug 2024 06:07:12 -0700 (PDT) From: Adrian Huang X-Google-Original-From: Adrian Huang To: Andrew Morton , Uladzislau Rezki , Christoph Hellwig Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Adrian Huang Subject: [PATCH 1/1] mm: vmalloc: Optimize vmap_lazy_nr arithmetic when purging each vmap_area Date: Thu, 29 Aug 2024 21:06:33 +0800 Message-Id: <20240829130633.2184-1-ahuang12@lenovo.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Stat-Signature: zz95n1gya9amw6ts953q1okh73auzyb5 X-Rspamd-Queue-Id: 3880F2003C X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724936834-587333 X-HE-Meta: U2FsdGVkX1+Pn+ne27yfIP2cLCqeP7zsFn6VOVBXNzahbj0TFry3+aZA+qSyVGbXqrvLzaxKWqWDFtwIbNTrz76HaLFfsC/PjQOmVeH/e+t2SA7OilneuM9ihp34zJPeke5U98dhOVM16RKaj1oLQSyZNE/sSqF+nkHrPKvrfwK+Ns/BbEHDV4DPT1StSc1MtZFF+CFtUZZd+TMwFNH4Wl3mO3qsZ3xGnQE+3Q9KwP7go43maKL3S+4C1R6mLIBQWTX84sT5ydLQtRKiGxn35JYEgXmLqa8AinHfYL74+Z0CR/HJGARFYikQhFeLCWWQnXZjnoinHGJgDXP9fFjqKjz1fyjbiY5lMpGIorhw+48aI36anvzNtT5WbnYYVIb9Loo1vSwHP8mBTBn75AKQ0WbMrDM3JsTPwmpLmxXmWMx1XYvSkOM4bZeWrFsm5m/BDq4OQEqI9SrdzRxDAlhmu7dlSd1i9Zep7+K3dadaHReMLsNjjkFhIsnS8+uHHLJEQ1QzNnOu8xoKI7Q8y9HqgRzpo/jT5SJCBYqBYYuvgjOuq+htyofiv4KVuZxvoekloMvDQDSwWdKPrZq0VP+Myl7OkOqrpNlMJxt6exiZlbQydKmVgqtSjgjHIrxJHpaFjeGGEi1y/mbcpi3ON2Dc9TFPFIKnOIty+BIzwq6s41lD2R8d3O2IHWM9lGBs7JVlz8JrgGfkJQEko+8GO+Ww5s700kdvyQ0SWz2f78LqBP7Iabk5aNBNC2D+pRtw/ch4lgtVIbPI2bjs/l1AILxwbq6AVHBz0XkaNoGSnHcpr7qPvEfrKYX28fvto5mNvPr8ZjzbwEjNVI9qFdlveeTedbWhj2cgYR0wkfuEsMYlRy5uh8wqAYkxSjhdxXUSFp7c9K5S55ekDQLHXu8qoT68F5/e2fOMab+aog+rLPh73SHEu2/IMnwjvg7i1rVr+Z7Aab6QxEE26j43CYVLy2N 4vmTb+IR z4QT5+3rIJXjBuPMyZiz21ciek2VbdJ11I5eMv18j5gFmrI8CTvG8veiFbnCBaAS7scFU9+8VbBuZoDdTVX58nuB5qgt6e/pqCiFLCaDjpWZXE5uifdkyvoLU4uWQgAzzoDgD95mTfmsPc1mzawrCiOYXVVQo7BLXkqzTl/sZ0dutHicUP26/yarIi1vCVptjPSlx7igop1Y21kzSpzQb5+5D+3TOp+aUh22mt7R5440/NA/zefj59b2KjjzswcmiFgkPYH1P9QDdROg67FjZaiupEHjrEtDzMJHA3NbZIDtLMBgObdVYtCwCU7D4a18UwLOccv1WbyDHJhJ07kweuWJREM0eibMCx+uVIQ8afOAkTErdfEFqOk40jfxax40XDIkygS3FWnaGKcbIRCEMh4lIIw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Adrian Huang When running the vmalloc stress on a 448-core system, observe the average latency of purge_vmap_node() is about 2 seconds by using the eBPF/bcc 'funclatency.py' tool [1]. # /your-git-repo/bcc/tools/funclatency.py -u purge_vmap_node & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1 usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 29 | | 4 -> 7 : 19 | | 8 -> 15 : 56 | | 16 -> 31 : 483 |**** | 32 -> 63 : 1548 |************ | 64 -> 127 : 2634 |********************* | 128 -> 255 : 2535 |********************* | 256 -> 511 : 1776 |************** | 512 -> 1023 : 1015 |******** | 1024 -> 2047 : 573 |**** | 2048 -> 4095 : 488 |**** | 4096 -> 8191 : 1091 |********* | 8192 -> 16383 : 3078 |************************* | 16384 -> 32767 : 4821 |****************************************| 32768 -> 65535 : 3318 |*************************** | 65536 -> 131071 : 1718 |************** | 131072 -> 262143 : 2220 |****************** | 262144 -> 524287 : 1147 |********* | 524288 -> 1048575 : 1179 |********* | 1048576 -> 2097151 : 822 |****** | 2097152 -> 4194303 : 906 |******* | 4194304 -> 8388607 : 2148 |***************** | 8388608 -> 16777215 : 4497 |************************************* | 16777216 -> 33554431 : 289 |** | avg = 2041714 usecs, total: 78381401772 usecs, count: 38390 The worst case is over 16-33 seconds, so soft lockup is triggered [2]. [Root Cause] 1) Each purge_list has the long list. The following shows the number of vmap_area is purged. crash> p vmap_nodes vmap_nodes = $27 = (struct vmap_node *) 0xff2de5a900100000 crash> vmap_node 0xff2de5a900100000 128 | grep nr_purged nr_purged = 663070 ... nr_purged = 821670 nr_purged = 692214 nr_purged = 726808 ... 2) atomic_long_sub() employs the 'lock' prefix to ensure the atomic operation when purging each vmap_area. However, the iteration is over 600000 vmap_area (See 'nr_purged' above). Here is objdump output: $ objdump -D vmlinux ffffffff813e8c80 : ... ffffffff813e8d70: f0 48 29 2d 68 0c bb lock sub %rbp,0x2bb0c68(%rip) ... Quote from "Instruction tables" pdf file [3]: Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices, then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. That's why the latency of purge_vmap_node() dramatically increases on a many-core system: One core is busy on purging each vmap_area of the *long* purge_list and executing atomic_long_sub() for each vmap_area, while other cores free vmalloc allocations and execute atomic_long_add_return() in free_vmap_area_noflush(). [Solution] Employ a local variable to record the total purged pages, and execute atomic_long_sub() after the traversal of the purge_list is done. The experiment result shows the latency improvement is 99%. [Experiment Result] 1) System Configuration: Three servers (with HT-enabled) are tested. * 72-core server: 3rd Gen Intel Xeon Scalable Processor*1 * 192-core server: 5th Gen Intel Xeon Scalable Processor*2 * 448-core server: AMD Zen 4 Processor*2 2) Kernel Config * CONFIG_KASAN is disabled 3) The data in column "w/o patch" and "w/ patch" * Unit: micro seconds (us) * Each data is the average of 3-time measurements System w/o patch (us) w/ patch (us) Improvement (%) --------------- -------------- ------------- ------------- 72-core server 2194 14 99.36% 192-core server 143799 1139 99.21% 448-core server 1992122 6883 99.65% [1] https://github.com/iovisor/bcc/blob/master/tools/funclatency.py [2] https://gist.github.com/AdrianHuang/37c15f67b45407b83c2d32f918656c12 [3] https://www.agner.org/optimize/instruction_tables.pdf Signed-off-by: Adrian Huang Reviewed-by: Uladzislau Rezki (Sony) --- mm/vmalloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 3f9b6bd707d2..607697c81e60 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2210,6 +2210,7 @@ static void purge_vmap_node(struct work_struct *work) { struct vmap_node *vn = container_of(work, struct vmap_node, purge_work); + unsigned long nr_purged_pages = 0; struct vmap_area *va, *n_va; LIST_HEAD(local_list); @@ -2224,7 +2225,7 @@ static void purge_vmap_node(struct work_struct *work) list_del_init(&va->list); - atomic_long_sub(nr, &vmap_lazy_nr); + nr_purged_pages += nr; vn->nr_purged++; if (is_vn_id_valid(vn_id) && !vn->skip_populate) @@ -2235,6 +2236,8 @@ static void purge_vmap_node(struct work_struct *work) list_add(&va->list, &local_list); } + atomic_long_sub(nr_purged_pages, &vmap_lazy_nr); + reclaim_list_global(&local_list); }