From patchwork Tue Aug 22 18:41:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13361322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCFEFEE49AF for ; Tue, 22 Aug 2023 18:42:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23B4E280055; Tue, 22 Aug 2023 14:42:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1EB72280040; Tue, 22 Aug 2023 14:42:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B436280055; Tue, 22 Aug 2023 14:42:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F187F280040 for ; Tue, 22 Aug 2023 14:42:00 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A3BD01C8B81 for ; Tue, 22 Aug 2023 18:42:00 +0000 (UTC) X-FDA: 81152610000.17.B6A4A0D Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf22.hostedemail.com (Postfix) with ESMTP id EB89BC0007 for ; Tue, 22 Aug 2023 18:41:58 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=DquBV+Ma; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692729719; a=rsa-sha256; cv=none; b=L9vq2d1BNiKtlpoA6ZjlMtwsXL+XPUKT+nebeuCyngtaO88LrGibWuZNxxPvtqvZGgmx47 zbZXVhrR3wnJDfC1DTd2fUK3gezb4XW5XvCJOoqveQ4O5SI6gi6E7k9hCQ8g8vcZvgrwuK k2+sbokQd387ST0TexdGZox8gdjP/+I= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=DquBV+Ma; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692729719; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=yu3d85yUXP65X99ohouSmmyW99HdJOlkw3Nlk8oie+k=; b=jyvhoTGXoHMZaDvf1s+tVfer57xSM9DTPLbm7wiE1eFXUaJMJpDimlj4v2mdOwAW+XFObn IRrFOo+h5znehS6L3dcQcv1gtJVhID3ECxOTFIvFRbE4Ns+u1cByY7mvSckjMTsFDrdrCp h5i0Q1Du/6/EPd38rL9iLxweA9Q/pTw= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-52a250aa012so13457a12.3 for ; Tue, 22 Aug 2023 11:41:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692729717; x=1693334517; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=yu3d85yUXP65X99ohouSmmyW99HdJOlkw3Nlk8oie+k=; b=DquBV+MaulAYeXqzDJNWyLcMhNY1odGn3wbffM0pB7q60mM6Q1AFjg2nWFA9j7TV2o ESktdLx6R11am/xbdEDR/zqImsXWkNHPljCOuoc7KCZ7TLoNHmAahiETvv254aQ9714s azpnPe/0REeGoNCCRlI5zQQMGfX5qvF3Vn2Mn97KTVvbBuy7bOAdbvPBV+MJdA8OXV2M wMhVVL1iovf+PGc/qbxdqA6FYwK8oVk5s1pcXEVmqYFzpWkWN+U9Y7/C3hS4a596z/px X1jYooKZBgy4E4LoxQM5Xe/E1mhJZIN3N66prNg1dE30kdWh6o4A/mEmAZn91z8h3zke 0ZzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692729717; x=1693334517; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yu3d85yUXP65X99ohouSmmyW99HdJOlkw3Nlk8oie+k=; b=GzhDArCI+SqPnMsbGqoiNJ7m02f+HlTVlhCRcbSPcDOcF23FHc0r2IKJokwG4E3wBL 0Yt7kTvidQlHDGvxR/q7rpLBzj4tiET8xLsm+Oehaz9ejBtSRrUBkKwvPRfrlqdx/FN9 oQN9uzIv7a721stxq1W077uLjUAzBx9mpw+u+5jdw5Flz/mmZjbuWuzMYTFaly2nKZKD xGaVH3sj0CUNFScD4IYGIQuJNBnlJrtugqPnJ3avjf0m336+dlXTjgCdHSQr1fIkgNiZ U8CMO/odrMjakrbjQRwRrQghyA10UviELGZpdXUhFKv8g7Ac1cMGEgup4C06MaeDOi2A 86UQ== X-Gm-Message-State: AOJu0Yzm5t/szOddebK3QjoB3KQ0X+pQKzEyK8T5TY3pI759NoTtqOfH 5x89CreiNvsUyQCEQUW58UA= X-Google-Smtp-Source: AGHT+IGAtV9+SDFXM8ZMZxI9O5Ai6l1RAuNVNGE1XAnZ4CZCcHzbgYuXwQd4FZEhqYS9Vq0qTkxm7w== X-Received: by 2002:a05:6402:b29:b0:523:aef9:3b7b with SMTP id bo9-20020a0564020b2900b00523aef93b7bmr8719672edb.4.1692729717338; Tue, 22 Aug 2023 11:41:57 -0700 (PDT) Received: from f.. (cst-prg-85-121.cust.vodafone.cz. [46.135.85.121]) by smtp.gmail.com with ESMTPSA id w9-20020a056402070900b00525683f9b2fsm7945317edx.5.2023.08.22.11.41.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 11:41:56 -0700 (PDT) From: Mateusz Guzik To: linux-kernel@vger.kernel.org Cc: dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH v2 0/2] execve scalability issues, part 1 Date: Tue, 22 Aug 2023 20:41:50 +0200 Message-Id: <20230822184152.2194558-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EB89BC0007 X-Stat-Signature: oudbqm8zj7goxhc3insr5a57u5b5iuwz X-HE-Tag: 1692729718-663814 X-HE-Meta: U2FsdGVkX1+r3DUjhpHHVQz78/vsUxlZ6TXMqFeEykJe/5+sjjCYTQrN5o30TuCPTAP2aGjbgjeNmidfTWS4i4wB9Aa8wcHheF25PIcOJloAGH1QwHvajr5+B/awP/HIkRicDfiqr3Jd6zicZ91+MPuC6tvycYAdk2N6+jf108F7O0f70zzPwj5TcXSwaJhPLjTIw5g2zJNc2o6QrU2WIg3fS/TVNXoZXVHkqkXRoZkd6NLqnolzJl7TQmJdtLe5MT5AjysjF8ycp99vrp22VYHzYFKA/NZZlHAkpzFlLKxfnpgU6voFGrB7JQIy2COlZ1LmT8YMRZZkucPPKeKkTctx/386qHiFoFzJIaIkUssyxz/heskztDTQR0KINhgcezdDRhQeLMIWDMsmtqCyZXeCGdADCzZupZiw3X+cA1xNB5k3wKwpr/oSTZNct5YlZIOv5JKnU2yWG/tCWAJbNQpAGq/EMdbFtLl8gMSXhg0fDlI7AabK13tRhv/TB0nh3+UU4NNJgdQkAeBHbjxh5nZjYBzZEXfRVQ03WREx1rJ8uiKs1pm1xZU3rAFFMLN+QZGKIigmtJWVbmHFnga43rEpOQ+vDJAMyhSZ1SgnbwYHk5TNbqJFIlQGuNfYE18eTdrmQ0oZvtLJKicW9+Ylz0S2j7FtnyGaDDUVer4LSnTrJHbUs+kEIPhgIyPbdcFlQsJjVz3BZCuLdHLx39WqEfnJd+2DcIRhN/2X18SQh+VvcIzJ+DonfXI235infWrXONBXIx+yswTwRh2+3SxUohS3A+ljf1Op8zoUnQtKcDN9UAuvRsQW1K9Q432fDo3x+CV8mzjAXum1hgVBB2Q7yZrSR5DOMUOpfWGXVilQi3Ngw/3x2KyI4pFgcUDj88/4O6Xj9fafQTcll7BJcGgP1ZdqCQyMS3+X9TVxFwT+OxtpSwWNfcZUA2FBHaipyQdLBFIobg88wePpfCWbi0c euaZK0S8 /2bgBNC08c3UTm3R82FVbuSJyW5WjcefybCC0PsqitwEy4Uxvstq0xLlD0mEb0OyhKYwTNL3rb3G8UD7zpPcqJbVo6k6M5wIadv82+oJGi8iUR+FUElxQ1Eq0FVgGNRPvr6YoVWQBNXLMiZ30oBQQxBxQtyrMFR995OzLnptzv7e1UDkGeXXqsBr8Oiy3Tn6PTYU/7JzVzziMOLvTK+EKMvdsg/LBOH3gxYLSxTVJLUpB4ezCVAyLGOcbpilDITed97swX2Z8oO68llkHXx8mcOvkLHLOVlaibVw1JY5E/gNUih3//UikDxt2rDPegi9bN99e0y7WayM+gFASf7mMRKk/8hKKhywmmVaAh+KGqXs7Cy4rJffMb/HflYU0UqjFa3LuxSEZ2LenVp8UXyXnKje+yYbOIiEzhBSf3f6Bxzttx8eUdftyj/wUc56djAtaZzi1Th+FSLL96ubG1JV2XWiD0Ig4IuLgFWUE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To start I figured I'm going to bench about as friendly case as it gets -- statically linked *separate* binaries all doing execve in a loop. I borrowed the bench from here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -static -O2 -o static-doexec doexec.c $ ./static-doexec $(nproc) It prints a result every second (warning: first line is garbage). My test box is temporarily only 26 cores and even at this scale I run into massive lock contention stemming from back-to-back calls to percpu_counter_init (and _destroy later). While not a panacea, one simple thing to do here is to batch these ops. Since the term "batching" is already used in the file, I decided to refer to it as "grouping" instead. Even if this code could be patched to dodge these counters, I would argue a high-traffic alloc/free consumer is only a matter of time so it makes sense to facilitate it. With the fix I get an ok win, to quote from the commit: > Even at a very modest scale of 26 cores (ops/s): > before: 133543.63 > after: 186061.81 (+39%) > While with the patch these allocations remain a significant problem, > the primary bottleneck shifts to: > > __pv_queued_spin_lock_slowpath+1 > _raw_spin_lock_irqsave+57 > folio_lruvec_lock_irqsave+91 > release_pages+590 > tlb_batch_pages_flush+61 > tlb_finish_mmu+101 > exit_mmap+327 > __mmput+61 > begin_new_exec+1245 > load_elf_binary+712 > bprm_execve+644 > do_execveat_common.isra.0+429 > __x64_sys_execve+50 > do_syscall_64+46 > entry_SYSCALL_64_after_hwframe+110 I intend to do more work on the area to mostly sort it out, but I would not mind if someone else took the hammer to folio. :) With this out of the way I'll be looking at some form of caching to eliminate these allocs as a problem. Thoughts? v2: - force bigger alignment on alloc - rename "counters" to "nr_counters" and pass prior to lock key - drop {}'s for single-statement loops Mateusz Guzik (2): pcpcntr: add group allocation/free fork: group allocation of per-cpu counters for mm struct include/linux/percpu_counter.h | 20 ++++++++--- kernel/fork.c | 14 ++------ lib/percpu_counter.c | 61 +++++++++++++++++++++++----------- 3 files changed, 60 insertions(+), 35 deletions(-)