From patchwork Wed Mar 19 07:13:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Greg Thelen X-Patchwork-Id: 14022174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FC64C35FFA for ; Wed, 19 Mar 2025 07:13:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EDC5280003; Wed, 19 Mar 2025 03:13:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2754E280001; Wed, 19 Mar 2025 03:13:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EF09280003; Wed, 19 Mar 2025 03:13:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E25ED280001 for ; Wed, 19 Mar 2025 03:13:36 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A2FADACB98 for ; Wed, 19 Mar 2025 07:13:37 +0000 (UTC) X-FDA: 83237435274.24.AF474AB Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf25.hostedemail.com (Postfix) with ESMTP id EED0CA0006 for ; Wed, 19 Mar 2025 07:13:35 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tYWsckld; spf=pass (imf25.hostedemail.com: domain of 3nm7aZwcKCCsNaOLSLUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--gthelen.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3nm7aZwcKCCsNaOLSLUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--gthelen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742368416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ry+Wdi81ySpLoTAkL3O3HsSaQaPoEmK4St7E09u3tvQ=; b=O9z9CR2AWfXj50cj/3jwmh705Lfr46P2F8iBewZ3QBTWLBFqtB5qseLUF/eoxzf8SVgeJo b4JLfLrXjSVWdVayvdscqeNantdEU9jcQlRlonGOEjEyzsPXaHSxUNra4Ehk9jSNRCzbM7 AqrBgg0sP8vt9OdlDNC7h2B0guepMns= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tYWsckld; spf=pass (imf25.hostedemail.com: domain of 3nm7aZwcKCCsNaOLSLUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--gthelen.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3nm7aZwcKCCsNaOLSLUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--gthelen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742368416; a=rsa-sha256; cv=none; b=ZkKmFNOdtTYOMyDUDk64DT5lCknSuP6/bHT1MRM8hk+JfcWcaITdPI4jzM9D94lD9YlV+t eBc7+4QUtpzrZfFG/oN+Wzc+18RTMsJU8UtOxCvo8CRSBgI2/oQZwyBTnqZ6j2FjvsVNj6 Y9p5QyWRpmwEUZ1XBC2BzSmIf/EmVtw= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff6af1e264so11459827a91.3 for ; Wed, 19 Mar 2025 00:13:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742368415; x=1742973215; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=ry+Wdi81ySpLoTAkL3O3HsSaQaPoEmK4St7E09u3tvQ=; b=tYWsckld7AzP/b0Oqj9laSCVRGnbeh+eIo0/wJW8/26SJo5LOXsQRQCDxhR08daaKx ZgKk8bl9IxY5y7/TM2Pcod+sKTQAEpr9CZjEc1EzDsftTlWUCo4pF4cdH4BFEt0lVYHF KJLhRNEG0e+wtb7EhC0Ul+i5ft9aK4kPoSroE0KumqALufcyHC99SUarQZz8UxWRQsoi HdTwMhzHJUyaJkawyT7FJep5Uut/IZXhPgzoe+MX6rEg8zB4yHW1FeZnEeHxBpHrlLDG AokU3w6Lscj6IxgTmNGndmMwFKSQ9KKfrD/giLl5AqnKVU9cSnhs70n0RK8iM+ZxfBHJ IrBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742368415; x=1742973215; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ry+Wdi81ySpLoTAkL3O3HsSaQaPoEmK4St7E09u3tvQ=; b=Pxatb7nSG7SDyXEiBh2QPf4VEqPZg39rVWjor/ugiFMnqnZeooXGWY2MgijY+5Bfxt XgW5sq7MmlnQP9g6JWBgcGs6CuxRCEkujZ0mU1LDVPA65MAD7JvMCyAvu3YiRWz2SRhr 6Cbpaj6RNhqNA4dCpzCLBx0KsCUD0n9RRqHB7pNu89QatdBjJdohu7T8LJPWpT3qS6rx wRSZRbC6/cvE2HNJfvCrxi3+tDjDDomzfSV5wJEHZeYvvX1sZZecMbgFT+onab4KHwRJ ZR6pq4KzMeWSnTRW1DkWsCqbf30JFs5rhABVfNt9xaiwFPwm27HgfpD4k9yL3xUxjSgL douA== X-Forwarded-Encrypted: i=1; AJvYcCU6TI45lZC4SjFY1Tgh2dk/WdwqDvtIklzLQ/nl1imwc5ZzjHl0Gk1IDCfXSj1ujg3vRA9uo687fg==@kvack.org X-Gm-Message-State: AOJu0YzeQ/K78nLpDFFMsdskAJ5pVYjWoKYIeCQVQqRXFzQ+xVaSbVSO gAyhIHCzXVc8Pfd3rQXVha2dfEN9un/QIHtebuPtb8mKPMNwavzZdpfu9JNNk2a2KpMGMFmGHMA I/qCTjQ== X-Google-Smtp-Source: AGHT+IG+6ccyuzARIRGsMxmojnHgsRhFP0NvfPaHKu1MISqqgQ23au5PxFD94AfzmzH2q0n7Nkl99t0RM8mi X-Received: from pgmh3.prod.google.com ([2002:a63:5743:0:b0:af2:54b0:c8d5]) (user=gthelen job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:2451:b0:1f5:8cc8:9cc5 with SMTP id adf61e73a8af0-1fbeca50a36mr3218739637.34.1742368414712; Wed, 19 Mar 2025 00:13:34 -0700 (PDT) Date: Wed, 19 Mar 2025 00:13:30 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <20250319071330.898763-1-gthelen@google.com> Subject: [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu) From: Greg Thelen To: Tejun Heo , Johannes Weiner , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Andrew Morton , Eric Dumazet , Yosry Ahmed , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Greg Thelen , Eric Dumazet X-Rspam-User: X-Rspamd-Queue-Id: EED0CA0006 X-Rspamd-Server: rspam08 X-Stat-Signature: 5dtq3ycn8sh96buhzp1g8ghjcs7aisrr X-HE-Tag: 1742368415-417350 X-HE-Meta: U2FsdGVkX19qfWW7pR18uCLyjWvZEE+QLEdTUM6uml4lxSOHaCwdq8afpSY+yd3fTHga30u/ff63tBOZ/GRRkKZDMod/i1UvOx5LqKIIlBCLI4K+k/rr/9MSyJwjh85c2p49WqOL8DRGuuslj213ljJnnQQEIPR4kCqQEpkeOEGEB5CtVXarz7QQBLWVKNm+9pwrU/JySwifTv1k1f65v4iLfVxD/jboWOVV1Bjrhv3zkq0BLuGxeHxTrVKjujfwISJXGUM/Q0fqZEBOcq52ULmKytfjUyWPdBigSO4zI3abnhnoCUpOYS4nTF30Hq0Z6aKE5LHRJKB6/oycVCmScCn73MXnhYRr5gOkzlI+JAMnPoT+RSfRBDOonGQTCtCyq51PA7Pt17WLUjueb0iGc5cbRsUyiUNaCDNompYuUXdYGiL6B4ZmlTIa9b54ENUezPqaORwtOjL/XQdR8lzVm9UQUQo91eqh5zUpYyn7SAZpiSKZEB7t6CWUzqxNxojPLip89mCCQFs5jbuk0lOZWlnVwoRGvoF8pc+X2rfGwdDd8LCGZWdXCl22+stP3EiWFYeIfHMkv68SPV2VP/g3pTlQ1Kc9r3YYGHKe2oVQOAX0pJE8wnp0wZbyBoU3oDqKnImXg+TQ/VAdmMWLiqc0guoHT2h/NboeuxDQ8403UNjNoOT3ilal0lHh2fCDCI1gf6CM/NZPNZcsC8Lh9rkPRGlvCOC/t2tJYXK7y+RflljhVahLIzivHo32IouVj0rYmnFHxTL6BB5/JxI7Cv4hAO8Zq83N9h7cyM+Pm0EH8yyd7VKv2w4m0sMYKJbHvf2TcRXxo9rCDiPPHP/ud5ELaoqrXy1AvW1hyyTds3GfZYquDKniKrajtV7xmGvtBwaJUvTwI8+d4AqY3cdp+c5bSkSrJJF34kCHT93DmP6B4QlCLhBx5XcN5vmHd0cYVLT33YItdCytM8ShMSkDa5C B5An9BuY UawYQypAc1DChcsyAq1PZCsnzfHmpeBqG7YqDbjO2csOlMfuJOUZP3STsWlgMDeJkNwkGRcpsnztX7aiMEUFKC2u95lRs8wgTvqJrn92o6pWuHUbu5ftOMG0UZ18nR0KsDsmsQsR7CPtqvXOEp65Ng5kEbFgNIWky3Ad+EKsNXUm88pOmBmv9wv2GmP7uJ6H7k5QDPrVJZ4SK4untt6Kz0nabzGkHCm88cpTpV8ODgWLZ3+xSeVsUbgyEQTi8noyRwlS4qDtQkfS2ynD0zYl2FBulE7+QnSjAQynUBeceAwAFGcUCgrPRECqT/UxSrKSwL4wc0KGsur0ljF/ipkyqBHfhCGES6R2FFAfhcSpJT+tFNwmcNFoGH3UL+8piGuqCuIRjDJttdINpj1NRUfEXEnnemZTf7T+njwPXRF1icGFLTg9Z53PgVMeN0V/Or5bP4lLu7gyaqq6S0I+L9wF7d67qSKxkw0bBo5463Np3LYWGDzXTbzyQa2jeAQsZcB2wb/4GuT4qmz9IbEsoy/pVrMoKPufi5+kkbd2e1IgYJlM2Va4R9rO3OVGxjbk6qN1XjEGzw/l+DakrlT3Mh1zbz6JCHaWFYA/R+S9nTg0+Yf6ufr7zj8X+5VVZsywk53Tm0VFCgMpbS1ieA8g948n/UV2/ZpEdGKpzslL2Z6d3CelhV/w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Eric Dumazet cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while iterating all possible cpus. It only drops the lock if there is scheduler or spin lock contention. If neither, then interrupts can be disabled for a long time. On large machines this can disable interrupts for a long enough time to drop network packets. On 400+ CPU machines I've seen interrupt disabled for over 40 msec. Prevent rstat from disabling interrupts while processing all possible cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This approach was previously discussed in https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/, though this was in the context of an non-irq rstat spin lock. Benchmark this change with: 1) a single stat_reader process with 400 threads, each reading a test memcg's memory.stat repeatedly for 10 seconds. 2) 400 memory hog processes running in the test memcg and repeatedly charging memory until oom killed. Then they repeat charging and oom killing. v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds interrupts are disabled by rstat for 45341 usec: # => started at: _raw_spin_lock_irq # => ended at: cgroup_rstat_flush # # # _------=> CPU# # / _-----=> irqs-off/BH-disabled # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / _-=> migrate-disable # ||||| / delay # cmd pid |||||| time | caller # \ / |||||| \ | / stat_rea-96532 52d.... 0us*: _raw_spin_lock_irq stat_rea-96532 52d.... 45342us : cgroup_rstat_flush stat_rea-96532 52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat_flush stat_rea-96532 52d.... 45343us : => memcg1_stat_format => memory_stat_format => memory_stat_show => seq_read_iter => vfs_read => ksys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the longest holder. The longest irqs-off holder has irqs disabled for 4142 usec, a huge reduction from previous 45341 usec rstat finding. Running stat_reader memory.stat reader for 10 seconds: - without memory hogs: 9.84M accesses => 12.7M accesses - with memory hogs: 9.46M accesses => 11.1M accesses The throughput of memory.stat access improves. The mode of memory.stat access latency after grouping by of 2 buckets: - without memory hogs: 64 usec => 16 usec - with memory hogs: 64 usec => 8 usec The memory.stat latency improves. Signed-off-by: Eric Dumazet Signed-off-by: Greg Thelen Tested-by: Greg Thelen Acked-by: Michal Koutný Reviewed-by: Yosry Ahmed Acked-by: Johannes Weiner Reviewed-by: Shakeel Butt --- kernel/cgroup/rstat.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index aac91466279f..976c24b3671a 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -323,13 +323,11 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) rcu_read_unlock(); } - /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { - __cgroup_rstat_unlock(cgrp, cpu); - if (!cond_resched()) - cpu_relax(); - __cgroup_rstat_lock(cgrp, cpu); - } + /* play nice and avoid disabling interrupts for a long time */ + __cgroup_rstat_unlock(cgrp, cpu); + if (!cond_resched()) + cpu_relax(); + __cgroup_rstat_lock(cgrp, cpu); } }