From patchwork Thu May 16 06:28:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13665759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BA40C25B74 for ; Thu, 16 May 2024 06:29:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E631D6B0393; Thu, 16 May 2024 02:29:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E13B26B0395; Thu, 16 May 2024 02:29:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDB316B0396; Thu, 16 May 2024 02:29:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A90146B0393 for ; Thu, 16 May 2024 02:29:03 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4DCF140753 for ; Thu, 16 May 2024 06:29:03 +0000 (UTC) X-FDA: 82123281366.09.80044F3 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf14.hostedemail.com (Postfix) with ESMTP id 7EA76100005 for ; Thu, 16 May 2024 06:29:01 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=T9m3kl+f; spf=pass (imf14.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715840941; a=rsa-sha256; cv=none; b=pXczJGzAtaSMtn0+/ja5l/zRYGopOMjagpZJn9iIX1A0n0M0lu+tdpbbW8mkjIsOAQbh24 rboJL8nHz0W6wF0pY4/6+D8evXtN6pW4XbdUxTmf+oRNfZ/0d1mCa/wV+Ic1GknjfMwy2J Nn0FYnVQGDeoFGJCVk03R629J7/JL1U= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=T9m3kl+f; spf=pass (imf14.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715840941; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=7rg8sgBlpOEhG+vmyc1NABRGMyciJCtyEIj3MgTpcGM=; b=L93nd7APxpnAuOXmea7HC5+pmI8CoP6RSW1xPZPS9zVEhoEJsNzn59ihKrGsln6g/mtlEs kMxpxR44d/j2wOSUbPAk0QxXKx7CSVJ9L4kSsyltgM2Jnfn69ENjCkPpFmBIAP9JeeyvXa 2Dcm3Lf6cNaPpravsczRWVLZE0/AKcw= Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-34e7a35d5d4so6158491f8f.2 for ; Wed, 15 May 2024 23:29:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715840940; x=1716445740; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7rg8sgBlpOEhG+vmyc1NABRGMyciJCtyEIj3MgTpcGM=; b=T9m3kl+fgfQ+C3JsJFTia/NXicv0tz7y7zhaAHbzpP3zIe5tB0NMZWnUfBhVCM+r+f bLmy07re5zJBTAN/blb4ZlR59WakSIZQQOH2J+dDvvVQJOUBhCUZ15ynJscNRg4hHhUK ksR5+MXRHLyPoXq6egNR/gRcVGJbF6s5zgGrhd5b/LvZf2kMnEDO3sR0kTVtOAe1Wgvn 6wNhKgeFNjyCKl5uEvbKU/Y1Ze/KwbkpOLWdCvJB93X7i/bR6jqEVXS/0scBfR8a2+iJ K5bCFkWJk9IuZMCdctFzz4jdzcGkoShCgjB0uPmTFoPoMbbfic9gOrfIvtQCUJ2casHG Bh7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715840940; x=1716445740; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7rg8sgBlpOEhG+vmyc1NABRGMyciJCtyEIj3MgTpcGM=; b=jGIhi04S1GLCpv8hOW7c2cXLWpAauw27jPtdJJ9nC+HWMjxaumHjyMBq3X3Ei++8TK E5xQGAtQSn8NQBYOeC6aJp+OfuS2fK4GixIauoxsn8KRA1yhhrbFzMfHpb/kAizZeHv3 irjkqyCM46rEpyuLzoZGUW0H2AWq9nq3hJEIib+HnIbzWMl2l7fM7WZ87Np1Y0SFYJ5y AaHqzfz6K8aXlinzXcsfMC+piNHw2RV6twwBM96q9CvnU7CKwToCP72h78fOtsbB4d8F Fh8uOSmjVWzl6wQS6SNsTn4j5jnGGwdKFjZRhLmLifuReSmgbr59EgjxXCVxqun3in+8 4lFQ== X-Forwarded-Encrypted: i=1; AJvYcCXMSPefcWGmxopoacsxoU3K8HzbCcRvpluuXVq4mFiKLuRGiZmu7OKdWn7iRBdhK0KAmu9CS00dufL80KTiuA8pstk= X-Gm-Message-State: AOJu0YzYmsWBZArKPZVZG+rewI6AcguJC/mn6pjtltrOWAjiCowfbx3G uFKBvIcM88JGU+AcvAjYjeG1w6BwERYvw2BW+6vrNt773DXtb00O X-Google-Smtp-Source: AGHT+IGhSwgHBXtvft87CVqxzGNP+iqIxB2m7O1j0hd4rs/o3rysHb4Rqq4nTpjVUhe7m07914zAQA== X-Received: by 2002:adf:e40e:0:b0:34d:28bd:a83 with SMTP id ffacd0b85a97d-3504a63645amr12021785f8f.22.1715840939805; Wed, 15 May 2024 23:28:59 -0700 (PDT) Received: from f.. (cst-prg-82-229.cust.vodafone.cz. [46.135.82.229]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3502bbbc955sm18051149f8f.111.2024.05.15.23.28.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 May 2024 23:28:58 -0700 (PDT) From: Mateusz Guzik To: dennis@kernel.org Cc: tj@kernel.org, hughd@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mateusz Guzik Subject: [RFC PATCH] percpu_counter: reimplement _add_batch with __this_cpu_cmpxchg Date: Thu, 16 May 2024 08:28:47 +0200 Message-ID: <20240516062847.1064901-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7EA76100005 X-Stat-Signature: 5iwog8bitfk8i5c3m7x3gtuuukh9as4y X-HE-Tag: 1715840941-642085 X-HE-Meta: U2FsdGVkX1+lxXlGajUyfeZgzrcUvMa0kK6j2RxoRX21gA3yXqKFt8SZwJH6OIZymeM9hN2evkIQYNdO1v5zNqLe6zdOk+JIp2BUZ/5HhrGLTTIDGgsJSnOmEAkOgpxDhG2gu4nzkwnmr000JMcVM/705jp1vRGqkRS4ARV7daFlpHlumy17x5RFsAt9VhX8iqUqm9hsLHCWpc07Pn6X8Q4o6q94mPuYopsoJycrKM7RYsQh1Z7ez/5ahyocjQ+s19NIV7y6aEplb4FZHlP17L3oBerNATDuKgXXAg+sK0j6lNROfM3c/5im8oc5GbKM1mfqzrWCRMUL6e4He+i2sHkq8nIr85kTEOzApRtVcnXQs7/MQyJMroBWyBKYqp/tOlGdDC0a4CR1CqmXCcoUfvq+uupHwLXnKR8vVu7IO9f7Yq2iMB3flVcmP2m8TxZKAE0p3idHnzINTmFPjghHXMCiv3zFvGADITB2kEhPuIqyg/xsv4TEBEL1xV92lTS5t5P5pKRXPG9lqplyPL2bJKeL3K3JSsqA/xE2OoHkAtB1vrkJ+Lv60XfINEBgx8ggl7xFvLE7ej87jxKbXQuIVO+UJH8KAu+cnh/jS75PXM16f2SFeRttr55tVXxyBzpNTtv2cwTk8HDnh9K7P9kdsj5HfwEEMvU+4WYU9/gml0clMfFZxXv4b2EgSFKoRWjz9IlSzc9tv1lQjzonLya4R30djHMy9PJNtfxklsHmFbkyuJAThoeXohnxsbO7o4nvOvoL4ezHOBAWrdAEiCiZ8QPZ4fMRLhaHRiviBERLZlRcf6cHtLXJSQEp+7cD20DeoTBfwv8Imm93rZizGnDl1YiJ8fQ1iWBpDWDmw6J1mPmMQeeNlpZpz+LctD+Cr+MJoY5Vj2+RogD3mIiyv26yDmhYNbR3j8j5KNF4x4joNtE/yLbo7FjQwXUd3FOOLMLwIw4oX64tpPxpqqRvz/2 N5YmJGJL WF0URKpqaUCBXcxy224LMRoK2lMHpFD/UmtnwGR9dqbEKbSc0t45qLgOYG+gSaD4WHbbF1psa9yHTeTMFD0HFUTwthHgYi/FvKIC7kFHzwY2wPUrpDSGeXaumg9Cez5HtcgKRnIjjCwl4NjyAhluqdd66tztnCfFBgGzIbvAX6MyB6Ox98dng44GY1TvmKBuNZpPSauvFRFBm6J0BrqkN2lcBC3HiBul38d9zt25K//0oC0lmKTZD3qFi5zq6XvXBMWKnQzm70+dbr2imgFf7R+uHusrCXvfm28ogQ8c4HchKQpP/Fz0IL7+qiNLC2eaxFA9cC5/ui/9Q5s5YHyxk6+18RtznoBcYBIrry57pdqz6Z3LE++HDezMhjyd0XuNF5zwXNlaU/VajTbzcuPz/S7SiMYHLvgv29+qXeh+FVJ1L55vr/N4yGXkMR1ll6jr3Kcc4tm+E99msw+k7RhqBIuxAHJVzwNBw1Akdk7cq/ecT/UM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This replaces the expensive cli/sti pair with not-lock-prefixed cmpxchg. While it provides a win on x86-64, I have no idea about other architectures and I don't have easy means to test there either. If this is considered a problem then perhaps the variant below could be ifdefed on ARCH_WANTS_CMPXCHG_PERCPU_COUNTER_ADD_BATCH or something more concise, you get the idea. That aside perhaps it is possible to save a branch if there is something cheaper than preemption counter trip -- this code needs to prevent migration, does not mind getting descheduled. ================ cut here ================ Interrupt disable/enable trips are quite expensive on x86-64 compared to a mere cmpxchg (note: no lock prefix!) and percpu counters are used quite often. With this change I get a bump of 1% ops/s for negative path lookups, plugged into will-it-scale: void testcase(unsigned long long *iterations, unsigned long nr) { while (1) { int fd = open("/tmp/nonexistent", O_RDONLY); assert(fd == -1); (*iterations)++; } } The win would be higher if it was not for other slowdowns, but one has to start somewhere. Signed-off-by: Mateusz Guzik --- lib/percpu_counter.c | 48 +++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 44dd133594d4..01f0cd9c6451 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -73,11 +73,14 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount) EXPORT_SYMBOL(percpu_counter_set); /* - * local_irq_save() is needed to make the function irq safe: - * - The slow path would be ok as protected by an irq-safe spinlock. - * - this_cpu_add would be ok as it is irq-safe by definition. - * But: - * The decision slow path/fast path and the actual update must be atomic, too. + * Add to a counter while respecting batch size. + * + * Safety against interrupts is achieved in 2 ways: + * 1. the fast path uses local cmpxchg (note: no lock prefix) + * 2. the slow path operates with interrupts disabled + * + * This deals with the following: + * The decision slow path/fast path and the actual update must be atomic. * Otherwise a call in process context could check the current values and * decide that the fast path can be used. If now an interrupt occurs before * the this_cpu_add(), and the interrupt updates this_cpu(*fbc->counters), @@ -86,20 +89,33 @@ EXPORT_SYMBOL(percpu_counter_set); */ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { - s64 count; + s64 count, ocount; unsigned long flags; - local_irq_save(flags); - count = __this_cpu_read(*fbc->counters) + amount; - if (abs(count) >= batch) { - raw_spin_lock(&fbc->lock); - fbc->count += count; - __this_cpu_sub(*fbc->counters, count - amount); - raw_spin_unlock(&fbc->lock); - } else { - this_cpu_add(*fbc->counters, amount); + preempt_disable(); + ocount = __this_cpu_read(*fbc->counters); +retry: + if (unlikely(abs(ocount + amount) >= batch)) { + raw_spin_lock_irqsave(&fbc->lock, flags); + /* + * Note: the counter might have changed before we got the lock, + * but is guaranteed to be stable now. + */ + ocount = __this_cpu_read(*fbc->counters); + fbc->count += ocount + amount; + __this_cpu_sub(*fbc->counters, ocount); + raw_spin_unlock_irqrestore(&fbc->lock, flags); + preempt_enable(); + return; } - local_irq_restore(flags); + + count = __this_cpu_cmpxchg(*fbc->counters, ocount, ocount + amount); + if (unlikely(count != ocount)) { + ocount = count; + goto retry; + } + + preempt_enable(); } EXPORT_SYMBOL(percpu_counter_add_batch);