From patchwork Mon Aug 21 20:28:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13359793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6723EE49A5 for ; Mon, 21 Aug 2023 20:28:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A7DF94000B; Mon, 21 Aug 2023 16:28:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 557468E0012; Mon, 21 Aug 2023 16:28:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41F4794000B; Mon, 21 Aug 2023 16:28:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2C89A8E0012 for ; Mon, 21 Aug 2023 16:28:43 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 05023B240D for ; Mon, 21 Aug 2023 20:28:42 +0000 (UTC) X-FDA: 81149250126.16.74C2CEF Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf06.hostedemail.com (Postfix) with ESMTP id EA95D18001A for ; Mon, 21 Aug 2023 20:28:39 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=bMRQ0kzR; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692649720; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ypTOlttmO9qqbmmzfV/SSOD9CPm3O5A8OgG4B0eFddk=; b=cNYfkXvqapaOYTUroTNQyiTHU8Fz4fVVjiKTC+pAxYug5z88kFXiCEE2wAPW09DKgH8h3i j2O7fr6fzbrfZtLo3O/U/Asgkykm51fWs3vGRCUZh4muVOq4HUwd5WOvXt7kYrJca08SHG YdoW6Cbvg9ZP1MxCYyIZEuc6iDiisLU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=bMRQ0kzR; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692649720; a=rsa-sha256; cv=none; b=nb3YOuw+fzhpuBRCCSxiOBUiPrH2DKh2YceDDnl1p9pXkHaiuYb/yV5C04Im/uAuFo89t+ B2EMN8rgzwSwUGLYW7DFTg8EXa/ftcp6yT0BoLGaSCujiWxbegd8jRPfFQxGUxoZL6Pvph GlQ+Lsoh6KDbWBg5YLUxp99zf2jBPhE= Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-99357737980so495098966b.2 for ; Mon, 21 Aug 2023 13:28:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692649718; x=1693254518; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ypTOlttmO9qqbmmzfV/SSOD9CPm3O5A8OgG4B0eFddk=; b=bMRQ0kzROXqg2RUFT1ECxbCN7VIAsmdxl1AXLcQb/IOUcX49zPoIRcPjFllUMxDJdu FC/XGCf2ToxMZKBcBuzEBnmI+0OzjsDs2Uftn+ap9cznywVa57usqgqmrYxe41Wfwd45 x+ZvuPlbMGSVuyWwmAAvVRqWKebKXDNaO9sAwzis3OY23bQGxOrhafW+XCNbuRnekziE FuSgdSFiiDt+iAY8YF4cfSqawE2PdT7cUeCY96z8LC38HjiReykJYvihoLDvkyUy1Nk3 6QTKTIUa0MKX03TM36cbQIs3g02J9tpjZHhNJgwNtN4FPHUTEzrdabbegQoCLsN7RuME z7tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692649718; x=1693254518; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ypTOlttmO9qqbmmzfV/SSOD9CPm3O5A8OgG4B0eFddk=; b=RtRZN+ssKStW9Cb1R1EeaoHbCOzhB+dJof7J0eC4c+JNjgzADdDK/amXQ7Pia/Bia9 osHhhn6XHcoWWuDXtqEVJWkdbOcUUYKEWNk5xy5wyv77AxCWF24M4VWpwuND1EvpFn/x NKx6uetgKS6xfBYWFKH1uAy4QCQxPRCm/0mWEEt/ge43CknYwlEjK/DVaZgP6VDPRJtw u+XdUKDEjbxOM4OkiwTrGGJn0v3DGLitNpYHALgh3K8pyZMg22Mlta/JRes94y2ZDXxU lonCmXnufhg4O/fkXoZOE+qMvFPUvwHlGTFtutmMWmLqw5zucs+9MVPGut2EgnVqMeAT lWaQ== X-Gm-Message-State: AOJu0YzJ5qpR+mjsMnpGxT8nTjtMwjFXB6ETrXSluJQNkaopaKvul7Lr FcAJBhvyq3/nHOdeiR8ol/8= X-Google-Smtp-Source: AGHT+IHdO9dQo5jT6szR56qVdF48goDlysiNHz/4gKeh9UDB2o/hVZFIAh0E81l3pDLo2S/Luepx3g== X-Received: by 2002:a17:906:5199:b0:9a1:b6e1:c2e with SMTP id y25-20020a170906519900b009a1b6e10c2emr209558ejk.33.1692649718268; Mon, 21 Aug 2023 13:28:38 -0700 (PDT) Received: from f.. (cst-prg-85-121.cust.vodafone.cz. [46.135.85.121]) by smtp.gmail.com with ESMTPSA id k26-20020a1709062a5a00b00997cce73cc7sm7084450eje.29.2023.08.21.13.28.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Aug 2023 13:28:37 -0700 (PDT) From: Mateusz Guzik To: linux-kernel@vger.kernel.org Cc: dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH 0/2] execve scalability issues, part 1 Date: Mon, 21 Aug 2023 22:28:27 +0200 Message-Id: <20230821202829.2163744-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: EA95D18001A X-Rspam-User: X-Stat-Signature: ybrmt41g93ht8zt91gbgfxuncoq36dtm X-Rspamd-Server: rspam01 X-HE-Tag: 1692649719-521681 X-HE-Meta: U2FsdGVkX181pgYtyJAFvPyF2Buxss7sn+46Mb/gaXjDUzlGC/3U+acH0hSapHVddo8FMonMJuNUJKfnZ67zb6YyQH7W8MaYYoWngdM9S9kPgEagi329Y/pHngkxkGPsrvU253KckjKMdvOGKEA9rQhdtIa8hBjekf2qEZTaHj49F8/1XhCwB6WGvDM3f2Dhpi9Hgfv4xsnN7mV9BxnQD1YkaimqhRTvGJ75drNJuOgnE21FjYsqWdk5wrwmoP0oDwzmjRWM6cfXPhy0HUVidRjuXqv3XkYgMU8HZIxNpAQjeGBS3Jl/J8zdkInFfkETM77EbgjcPqOvAnI0BMiECfB4nwwdCSdNanhM3G4dDZhkZzu9UjSZ9XzK86mFVHE6dVro9KmrYbgQ6GTdZOungfvdDX2nDNmcfrhW+j+NEhigxWrywxSp8lMRZytIn/DLJzq1h0xPb636cnpZ/GlMVQfY/EtIANNospnEvYu+Kkj3qdjMm4+ndIAjDV+qr9RHpL//YSXSUpT9yn4cBcIZvcOUrBz8Sc8hVQR2Ktu9/oa3GI+nxYhbySlfCj6BmjERxX4f6GrtKNIuo4QLte9G69bMrrIW8QN83zy4HuSvnIpv//MVcexlSsN2P56D11cm8u+2QqRdNuZWu90V2uLrJftEb7Iiq/+HlRvt3KFfhRosmRu4F7PwnH2+J28EDynA1u/jDADkLnjl8gu5tRgBj4OToDaU8MtlrnR8F1+k+sgm16UI7Yeh6jmzIGHdwYGgnxu5nbsLM5AwDsdxYUmmYKAHU7i2HDOVkOV8KLShxEb2vNMG+ZFtXr798cmzxhKE/vbvRRe4Lh/TDW7Ize/hW7q9vuqYg6tZZO+Bb+PSRzkJyxXgN2SikqFOBRsXaV+nbscZ9kfDIFSq79hwmBIEk25TYQUYkdoq0NRW9ijrhn17P+B7FEtMe2rvUiZhq71F+MJZT8FGWHy5npDLgrg NpPzfvCo g3auLEBAOODITexduI95uW9nroaz9plg9+C6Sod/xoR9UA+1e/rrZuOoDUQcOKf+1Yxtr8QL87TW1eVRtCLSA+X3I7f8wT/mxL3+HVV01msTsLuCUXA9lXI2M6fG/Cq8jUadhZ/YvhT804a0NccMjKXq5G0LjqzMvninQThJ+9iZoPlSUXG/XVxnmObpJoGw72Cbe9UNkkIEqcjZ3Pr5d1XHMFQWVINbsd0aukTRSZZpmGpKkHHcvkGEi2JWTtXMYbdRCB2hW+V2Pzx97gJEjKFAzhSrX9EbvYy9ZCIYRzeWWc0kSk9qOiLYB5cL0x0imrsn9wv8HVyXJQMWkonczWKVrmCNXrkLB4VuO7MhOH2mZgGfUX8hEIFpBk4VBg4pxUhMshrO9VtAId+ITUpnkz6jW3i3Z3m3hkPYfUn/zZHPIC431FNQ45JV2r5aelKDY+yNjefOTpzEbcTIeUn7GKWLFddJTUSNMsOm0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To start I figured I'm going to bench about as friendly case as it gets -- statically linked *separate* binaries all doing execve in a loop. I borrowed the bench from found here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -static -O2 -o static-doexec doexec.c $ ./static-doexec $(nproc) It prints a result every second (warning: first line is garbage). My test box is temporarily only 26 cores and even at this scale I run into massive lock contention stemming from back-to-back calls to percpu_counter_init (and _destroy later). While not a panacea, one simple thing to do here is to batch these ops. Since the term "batching" is already used in the file, I decided to refer to it as "grouping" instead. Even if this code could be patched to dodge these counters, I would argue a high-traffic alloc/free consumer is only a matter of time so it makes sense to facilitate it. With the fix I get an ok win, to quote from the commit: > Even at a very modest scale of 26 cores (ops/s): > before: 133543.63 > after: 186061.81 (+39%) > While with the patch these allocations remain a significant problem, > the primary bottleneck shifts to: > > __pv_queued_spin_lock_slowpath+1 > _raw_spin_lock_irqsave+57 > folio_lruvec_lock_irqsave+91 > release_pages+590 > tlb_batch_pages_flush+61 > tlb_finish_mmu+101 > exit_mmap+327 > __mmput+61 > begin_new_exec+1245 > load_elf_binary+712 > bprm_execve+644 > do_execveat_common.isra.0+429 > __x64_sys_execve+50 > do_syscall_64+46 > entry_SYSCALL_64_after_hwframe+110 I intend to do more work on the area to mostly sort it out, but I would not mind if someone else took the hammer to folio. :) With this out of the way I'll be looking at some form of caching to eliminate these allocs as a problem. Thoughts? Mateusz Guzik (2): pcpcntr: add group allocation/free fork: group allocation of per-cpu counters for mm struct include/linux/percpu_counter.h | 19 ++++++++--- kernel/fork.c | 13 ++------ lib/percpu_counter.c | 61 ++++++++++++++++++++++++---------- 3 files changed, 60 insertions(+), 33 deletions(-)