From patchwork Fri Apr 14 16:30:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 13211822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9166C77B71 for ; Fri, 14 Apr 2023 16:30:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6958D6B0075; Fri, 14 Apr 2023 12:30:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 645A1900004; Fri, 14 Apr 2023 12:30:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5338C6B007B; Fri, 14 Apr 2023 12:30:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 465506B0075 for ; Fri, 14 Apr 2023 12:30:49 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1B7DB14039C for ; Fri, 14 Apr 2023 16:30:49 +0000 (UTC) X-FDA: 80680535418.22.9F6B13E Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf18.hostedemail.com (Postfix) with ESMTP id 8EF1C1C001B for ; Fri, 14 Apr 2023 16:30:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=mkOEfmQA; dkim=pass header.d=linutronix.de header.s=2020e header.b=ttZCey6p; spf=pass (imf18.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681489846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=hib+QUcWetFyH8FmqwSyijII2kFnQ679zRchSrjewOI=; b=p0EpY8zYTTZro6btH1BygkH/PsYXXQduwxangLogiOcz4tLJ1xdyMcnyjwCoxufl8gKMUg LKXU/j71/9rLTCQ/5AeBq7F8WtSYCwMRu5DbUOEea8oVKH3b9oZen18c2IkNLg0BVXczJx H1o1Jbzq8DuJer2ZVMzibHaM9LHBW+A= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=mkOEfmQA; dkim=pass header.d=linutronix.de header.s=2020e header.b=ttZCey6p; spf=pass (imf18.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681489846; a=rsa-sha256; cv=none; b=Xk8AlVjqY30+A2JfNAoNtL/bewLht4ZihXATaybuKWr1w5GTbHDpmGOR9vlQ2fRjmcX4PD LybjeM70/8RSKrIluYY4wcesmYcT2bsGzSksE0TtYuyVJsnxFbhURJX8f68Jhr8CLFRcq7 fYuzOHONuCgEb+/nvmd6raHXP8L1PnM= Message-ID: <20230414162841.166896739@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1681489844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=hib+QUcWetFyH8FmqwSyijII2kFnQ679zRchSrjewOI=; b=mkOEfmQAy1l9beE/ApwVf5kMTXFdW+W6thNeHlt2hVoKuNaU3U6KpAjMWVyPxYDX/6rhIg mleUKHql8/2KvH6oVIyEysB+Oa+7LeSFf/AvUrWvSQ3ySZ1fb0i4ziBcHdXZVZkxEmai48 x1Ltb1oysAgGS6WWQ5/pZZ1610l/t+J8doGI4HnSJaEnHdbNlaqOXp/mO9ny9sp/ZChUS1 YTR5xlhCZIFRbJSQrtz3LyfFRaKuX0ZuePZXI95hpI9sPWub1dyaBfgSgoEGMfbBdXwztj CPk1+4pOj2TewxCxxZ1Cc7TBxp3VnRo1UxE2YLW11BkCq9r37USZZ35xhpjKAw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1681489844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=hib+QUcWetFyH8FmqwSyijII2kFnQ679zRchSrjewOI=; b=ttZCey6pYGuViXp7ns8wLTTaL1J6mGg7f7cWVBDE3G7lfMbe3mDZ57A/73HoCad5AaN4Jy Wc3B8waWZg+pbPAQ== From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Valentin Schneider , Dennis Zhou , Tejun Heo , Christoph Lameter , Dave Chinner , Yury Norov , Andy Shevchenko , Rasmus Villemoes , Ye Bin , linux-mm@kvack.org Subject: [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling References: <20230414162755.281993820@linutronix.de> MIME-Version: 1.0 Date: Fri, 14 Apr 2023 18:30:43 +0200 (CEST) X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8EF1C1C001B X-Stat-Signature: nxke8nfkjf5gyjgt3bjrubn7xuy7ijr5 X-HE-Tag: 1681489845-766542 X-HE-Meta: U2FsdGVkX18n1wXMAPfejEfp7ieW5/C7Aq9w3EoGpIe/y+kRoclM4FRFkopWgVU5DMCEOaVV51Kr+w+l9l9cc4W8b97F7ewbXFW8W80Y1h90aD+Ll3paX+extXjGH9zlKdVjmLhq9mzj+CS0cK08wK9nnMNgypvt2iTzQnb+NU/7F8CdKEVhuAg41dHuvm5TrBuQsitIfq6a2DCGwjLblLVoXNsLXbjBKd11DQ4IC4nuA5oZoPVYROpH/kFyJmpq7opKBBUm2AGFAH1V76YD4r6/05MouGicALx2OAbWS1t6wrHHk2h9P3955JhfWt4XP+FDdurpQlo3b2X+d5zRmveK6fIzvL6euQpf2azrAY0N6mvMqMzG26wG2ZAW4+J2cFyZOb39JHnjljIvOU6Ypy81vPaBJkRdERWQhnHm3WD3hUFsFkFn/n4o+0/65W82/JacPzCTzU/ucrINXi0g0FKCrqRg9gMC2QgYYpnGtdmCe+5HgXxZbuRJVgMROxeKzCjDVw0IxcgtPc84rHitJZ46yqz2KYmBIMGdxa89uTy6nzE1vSy+Bakjl6LBUvem4qZ0Icj1Wi+3E6fzkagyUZDuc3TpRHhSyl1FBDofftjwd4HVb4H5HVASbxZ/sHip8dmIWEfXMNOGL7IU2opW/+9PX0v9zZryPp/BQmjJAHmBD/lQnX2+R0gXf3SoOYs7dX5Anu9qRNT53ZFmhv/MAdMd4BanE/hCoUs+Ic/iQU07shZP1xA3EjmmJzF3MveSWM0uI2/CZCVS/MeosmJudQQu5gO8n2AkgGXIMVUyz5az/larwGsk/KI8PCI6zb3cVQ1EaouSZh9Tr9uolKMVuppew/k6fzSIpDtx5T1qj/YiT5CwtV+dgWXFugcFWbCR4FigzsIAiU/mIXPqYjYaIYHGO/m42d0qK0/eiI9R0k8ZJgutJh9NjrN6F0RVaASPvLYfBbOW5ojwm/vaX6e gRTjZk9P KMJ8D8w8QjarnU0z/JmRzU6J1kfGG6LBb8chGEw+hfZNYAw9fLNIeHlFkTfHVxV6WMQYZ6GcPE0rVwPrjYgiaJmif7ur0VWIuQBqRUBbPYBXv/CV3HDiKXegBXBxVssSUWrJegb9TVKqQBaE958yVjjCNNpb5RkyVbjOKG6sixEO4O+3zeTdQfNHt/xFuZwGQc09V1R1vNrYTTdtD17GCUeoI+nZGYS0mBCgQ0psCkBk7bjd5dc2ovHxubq46TZnjULMeFpk1vjgJlux6CFX17UtXs+2ji+zugxbFd8GMAyx0UaDMLcSFqEFSdYfsyVmnkzGQmL7kjPdcII8ZCMHVPU2y2uOVuzxEKXVlo0hVCZZX+47wLLOhNB8VQaHomYRpJOzKfAaelPzR1bS2zExmu99d04Ti3iPiizEeesB5ME14bO2vT+te5IDeazhO/FBeN2fyMTuLKVxLO3gxHIuumuJ+mU6Fhvgpu6NmvcCgkGdP0PCXqM12XKWM05G1ou3PUxrB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") tried to address a race condition between percpu_counter_sum() and a concurrent CPU hotplug operation. The race window is between the point where an un-plugged CPU removed itself from the online_cpu_mask and the hotplug state callback which folds the per CPU counters of the now dead CPU into the global count. percpu_counter_sum() used for_each_online_cpu() to accumulate the per CPU local counts, so during the race window it missed to account for the not yet folded back local count of the offlined CPU. The attempt to address this used the admittedly undocumented and pointlessly public cpu_dying_mask by changing the loop iterator to take both the cpu_online_mask and the cpu_dying_mask into account. That works to some extent, but it is incorrect. The cpu_dying_mask bits are sticky even after cpu_up()/cpu_down() completes. That means that all offlined CPUs are always taken into account. In the case of disabling SMT at boottime or runtime this results in evaluating _all_ offlined SMT siblings counters forever. Depending on system size, that's a massive amount of cache-lines to be touched forever. It might be argued, that the cpu_dying_mask bit could be cleared when cpu_down() completes, but that's not possible under all circumstances. Especially with partial hotplug the bit must be sticky in order to keep the initial user, i.e. the scheduler correct. Partial hotplug which allows explicit state transitions also can create a situation where the race window gets recreated: cpu_down(target = CPUHP_PERCPU_CNT_DEAD + 1) brings a CPU down to one state before the per CPU counter folding callback. As this did not reach CPUHP_OFFLINE state the bit would stay set. Now the next partial operation: cpu_up(target = CPUHP_PERCPU_CNT_DEAD + 2) has to clear the bit and the race window is open again. There are two ways to solve this: 1) Maintain a local CPU mask in the per CPU counter code which gets the bit set when a CPU comes online and removed in the the CPUHP_PERCPU_CNT_DEAD state after folding. This adds more code and complexity. 2) Move the folding hotplug state into the DYING callback section, which runs on the outgoing CPU immediatedly after it cleared its online bit. There is no concurrency vs. percpu_counter_sum() on another CPU because all still online CPUs are waiting in stop_machine() for the outgoing CPU to complete its shutdown. The raw spinlock held around the CPU mask iteration prevents that an online CPU reaches the stop machine thread while iterating, which implicitely prevents the outgoing CPU from clearing its online bit. This is way simpler than #1 and makes the hotplug calls symmetric for the price of a slightly longer wait time in stop_machine(), which is not the end of the world as CPU un-plug is already slow. The overall time for a cpu_down() operation stays exactly the same. Implement #2 and plug the race completely. percpu_counter_sum() is still inherently racy against a concurrent percpu_counter_add_batch() fastpath unless externally serialized. That's completely independent of CPU hotplug though. Fixes: 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") Signed-off-by: Thomas Gleixner Cc: Dennis Zhou Cc: Tejun Heo Cc: Christoph Lameter Cc: Dave Chinner Cc: Yury Norov Cc: Andy Shevchenko Cc: Rasmus Villemoes Cc: Ye Bin Cc: linux-mm@kvack.org Acked-by: Dennis Zhou --- include/linux/cpuhotplug.h | 2 - lib/percpu_counter.c | 57 +++++++++++++++++++-------------------------- 2 files changed, 26 insertions(+), 33 deletions(-) --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -91,7 +91,6 @@ enum cpuhp_state { CPUHP_PRINTK_DEAD, CPUHP_MM_MEMCQ_DEAD, CPUHP_XFS_DEAD, - CPUHP_PERCPU_CNT_DEAD, CPUHP_RADIX_DEAD, CPUHP_PAGE_ALLOC, CPUHP_NET_DEV_DEAD, @@ -196,6 +195,7 @@ enum cpuhp_state { CPUHP_AP_SMPCFD_DYING, CPUHP_AP_X86_TBOOT_DYING, CPUHP_AP_ARM_CACHE_B15_RAC_DYING, + CPUHP_AP_PERCPU_COUNTER_STARTING, CPUHP_AP_ONLINE, CPUHP_TEARDOWN_CPU, --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -12,7 +12,7 @@ #ifdef CONFIG_HOTPLUG_CPU static LIST_HEAD(percpu_counters); -static DEFINE_SPINLOCK(percpu_counters_lock); +static DEFINE_RAW_SPINLOCK(percpu_counters_lock); #endif #ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER @@ -126,13 +126,8 @@ EXPORT_SYMBOL(percpu_counter_sync); * Add up all the per-cpu counts, return the result. This is a more accurate * but much slower version of percpu_counter_read_positive(). * - * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums - * from CPUs that are in the process of being taken offline. Dying cpus have - * been removed from the online mask, but may not have had the hotplug dead - * notifier called to fold the percpu count back into the global counter sum. - * By including dying CPUs in the iteration mask, we avoid this race condition - * so __percpu_counter_sum() just does the right thing when CPUs are being taken - * offline. + * Note: This function is inherently racy against the lockless fastpath of + * percpu_counter_add_batch() unless externaly serialized. */ s64 __percpu_counter_sum(struct percpu_counter *fbc) { @@ -142,10 +137,8 @@ s64 __percpu_counter_sum(struct percpu_c raw_spin_lock_irqsave(&fbc->lock, flags); ret = fbc->count; - for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { - s32 *pcount = per_cpu_ptr(fbc->counters, cpu); - ret += *pcount; - } + for_each_online_cpu(cpu) + ret += *per_cpu_ptr(fbc->counters, cpu); raw_spin_unlock_irqrestore(&fbc->lock, flags); return ret; } @@ -167,9 +160,9 @@ int __percpu_counter_init(struct percpu_ #ifdef CONFIG_HOTPLUG_CPU INIT_LIST_HEAD(&fbc->list); - spin_lock_irqsave(&percpu_counters_lock, flags); + raw_spin_lock_irqsave(&percpu_counters_lock, flags); list_add(&fbc->list, &percpu_counters); - spin_unlock_irqrestore(&percpu_counters_lock, flags); + raw_spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif return 0; } @@ -185,9 +178,9 @@ void percpu_counter_destroy(struct percp debug_percpu_counter_deactivate(fbc); #ifdef CONFIG_HOTPLUG_CPU - spin_lock_irqsave(&percpu_counters_lock, flags); + raw_spin_lock_irqsave(&percpu_counters_lock, flags); list_del(&fbc->list); - spin_unlock_irqrestore(&percpu_counters_lock, flags); + raw_spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif free_percpu(fbc->counters); fbc->counters = NULL; @@ -197,22 +190,29 @@ EXPORT_SYMBOL(percpu_counter_destroy); int percpu_counter_batch __read_mostly = 32; EXPORT_SYMBOL(percpu_counter_batch); -static int compute_batch_value(unsigned int cpu) +static void compute_batch_value(int offs) { - int nr = num_online_cpus(); + int nr = num_online_cpus() + offs; + + percpu_counter_batch = max(32, nr * 2); +} - percpu_counter_batch = max(32, nr*2); +static int percpu_counter_cpu_starting(unsigned int cpu) +{ + /* If invoked during hotplug @cpu is not yet marked online. */ + compute_batch_value(cpu_online(cpu) ? 0 : 1); return 0; } -static int percpu_counter_cpu_dead(unsigned int cpu) +static int percpu_counter_cpu_dying(unsigned int cpu) { #ifdef CONFIG_HOTPLUG_CPU struct percpu_counter *fbc; + unsigned long flags; - compute_batch_value(cpu); + compute_batch_value(0); - spin_lock_irq(&percpu_counters_lock); + raw_spin_lock_irqsave(&percpu_counters_lock, flags); list_for_each_entry(fbc, &percpu_counters, list) { s32 *pcount; @@ -222,7 +222,7 @@ static int percpu_counter_cpu_dead(unsig *pcount = 0; raw_spin_unlock(&fbc->lock); } - spin_unlock_irq(&percpu_counters_lock); + raw_spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif return 0; } @@ -256,15 +256,8 @@ EXPORT_SYMBOL(__percpu_counter_compare); static int __init percpu_counter_startup(void) { - int ret; - - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "lib/percpu_cnt:online", - compute_batch_value, NULL); - WARN_ON(ret < 0); - ret = cpuhp_setup_state_nocalls(CPUHP_PERCPU_CNT_DEAD, - "lib/percpu_cnt:dead", NULL, - percpu_counter_cpu_dead); - WARN_ON(ret < 0); + WARN_ON(cpuhp_setup_state(CPUHP_AP_PERCPU_COUNTER_STARTING, "lib/percpu_counter:starting", + percpu_counter_cpu_starting, percpu_counter_cpu_dying)); return 0; } module_init(percpu_counter_startup); From patchwork Fri Apr 14 16:30:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 13211821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CCADC77B73 for ; Fri, 14 Apr 2023 16:30:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BC17900005; Fri, 14 Apr 2023 12:30:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6468E900004; Fri, 14 Apr 2023 12:30:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49990900005; Fri, 14 Apr 2023 12:30:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3CA17900004 for ; Fri, 14 Apr 2023 12:30:50 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id DB58E1C4C83 for ; Fri, 14 Apr 2023 16:30:49 +0000 (UTC) X-FDA: 80680535418.09.3CA967E Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf27.hostedemail.com (Postfix) with ESMTP id 35A1340018 for ; Fri, 14 Apr 2023 16:30:46 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=n8Zzj9XT; dkim=pass header.d=linutronix.de header.s=2020e header.b=rFhSZGVr; spf=pass (imf27.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681489847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=flXL5BT+igZd2jFVy2rrUP9R+ZUvsjSeOSfyM1mplgA=; b=jT6iHSf9/iJSQBpg2pkyYXzSAeDQvLcYQJP667FPgPpkeXMARV9bf7UpyxUtxoOu6rKwmb ejZmu6LWznX/cowhN3gKEwQLUvjvKEomRDjF4Pw2JKReyM/LOYoVpHF7Q26fwMNsx/Gqqy j9aMuUSqbuS44DYRI7GNc0+e4Na/nwk= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=n8Zzj9XT; dkim=pass header.d=linutronix.de header.s=2020e header.b=rFhSZGVr; spf=pass (imf27.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681489847; a=rsa-sha256; cv=none; b=xJeQdqjEe+tYYViKUFX3R/umBCExYG29U2RC/8Djy0+jboHIxPd+9+1VTRv/QUUTMXSEJN gzo2/qVk8uWFivognjYl5fxm7heWqiGai//AcuC0BIsOeuNxBV+VQPDObtq7vDEfgVn8xk 5Q9WwCEzVsmYX8B3RPPoHZECt4qgSU0= Message-ID: <20230414162841.229672670@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1681489845; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=flXL5BT+igZd2jFVy2rrUP9R+ZUvsjSeOSfyM1mplgA=; b=n8Zzj9XT5skwt0s+kneK6aorlhEumvTdJA1WSfhEMGjZt9ak1fX1XTGgEIFh2ef47XvXrZ D7+2OXWHyDTxCYDZ5p6VALUqKfJoXpoRW0HHH0YHBw+oOcRZ1S/FmIpvqk1Vagj3ObQj9O XhAb7QEifrAGwZwUy43O7slG3cs7rJDZhH31szz/3jo6JJyL7cvadj/dF6fc3itd5LkmIR 6Nv7/x9NwJssBpIPCAMK5RFKWXbW08V6A76xgvJN0CpNNlV21rYnVIrD1AXXxkCxtMQj2B IQSxB3B+ygO3UeXgyaFYiA3QDhYOPpSMYPwzXOsClYFIdMUFJzjFSzGEFotmQQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1681489845; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=flXL5BT+igZd2jFVy2rrUP9R+ZUvsjSeOSfyM1mplgA=; b=rFhSZGVrmAMgJZ7xdunNtmMjUVUhPhUjj/zvrYcV4jB2ZZzAS80Zu3IpGYW+1VuyWt8wim 2XkCVzBaPz/mNoAQ== From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Valentin Schneider , Dennis Zhou , Tejun Heo , Christoph Lameter , Dave Chinner , Yury Norov , Andy Shevchenko , Rasmus Villemoes , Ye Bin , linux-mm@kvack.org Subject: [patch 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask References: <20230414162755.281993820@linutronix.de> MIME-Version: 1.0 Date: Fri, 14 Apr 2023 18:30:45 +0200 (CEST) X-Rspamd-Queue-Id: 35A1340018 X-Stat-Signature: k49npqsj5hu864hhy1s9qwih5uxxpsad X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681489846-636509 X-HE-Meta: U2FsdGVkX19fJIImFtd/cCRuNCYIitAW+fe+/knmUj0Hc++g9FV/mC4705Sf+6Gmt2NQSOgwQaTHxzd3bEzjw1MxvsKRSMC9l0SGuItkXG3kZb1iA9MUEjYLldCs6BKZCQHn3yNUBxSl/QhRLMfXQ5ELRACW/WYNcUVR4lsNLJma+PCX/1qh8sUR2WOmmB5S1QZJfZwPTGfWJXxio5+ScmbtNE8dT9Ao6qEomttllraYLwhllnikkm5WjT5R5tThXXmib2Pp8dIB2uEfkeDEZ261vnmgJ0u0i2YVGx5LZAJKJ6tCfwNb59vKRvLVD56xsh60eQGWz62fBZkqkhweec7YTTeJnwPuuf94CzjxtNmYiE0/sfy1Muy0sT9/xKGylR+SVZWI+cgEENMPDvAblvXp0ZTxdQs1tbfvsAvEUj/5SZVCtrh6MLkOYhLAqkIOvA20G6U1nSzLMopdY/aj0DmVG9QIJ5ztIadheIYRre3yVAF717ZV6xOF/VCyq4Cwr0b8qdNHQYmQhW9dG6dYIWDyYoYjb6MdrJq/oOHmmCxmrxozgAheU8BZF9wk+lNB90HBMi7RVezuN6olSKF4+BPEjlmDYuwLOKIcAZotnzNXgQVgges6X+qXTV1TWeiYANWhWbO1IOIM+7sqBT4qxBYR+RsJKoGcB1mPjAyLIGQMNT/l1bGpIsvecWFGoKb/Yy03+7FspMR7aZhUPcxXZS3lnWiokpH8ENHmhuJ36bfTXILT47Hxsy6OO3Uufjt9/so+qVZ8aUfaQ9hhjoXzb7eOo7I56xYwbdO78O1EZAJERiDtDzj6lStFCwwerXeLFKFzle0ftzel8kS/5yStsZcZ87AuI5Ww07Lt4nfa+k0rlBu0pD5MKw00hsxich104ic3ZgAE4P49RseD3X+rnb7PiLCQNLoPaOqrYp4WDG0oIEq9D/bFI0EoL8B7iShZO7XZYRvbTIIIOScWFVR RgGw9d10 rc0Us+w77opftHl1RBzLD80o17rCVSTv8Us0kYAiAYI7pF8k++n3tdcM2F8e3ghhDIMUz7euEKHoLwpCz6WBVpK3gQJbh7kOrktP16AmzlDYpB+GU4hMXze/l/cCMw98YTdw375SChxfvRb+kV4JnXtrnsR4P1ez4gxkNJ+ChEAQJJE6EAemoGSg1NyulEtcsPyiojQ2Bg2Gu6rihFXOJBr6IKTnXap8/DbNHKyZ1CQkFDJ8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: No module users and no module should ever care. Signed-off-by: Thomas Gleixner --- kernel/cpu.c | 2 -- 1 file changed, 2 deletions(-) --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2643,10 +2643,8 @@ struct cpumask __cpu_present_mask __read EXPORT_SYMBOL(__cpu_present_mask); struct cpumask __cpu_active_mask __read_mostly; -EXPORT_SYMBOL(__cpu_active_mask); struct cpumask __cpu_dying_mask __read_mostly; -EXPORT_SYMBOL(__cpu_dying_mask); atomic_t __num_online_cpus __read_mostly; EXPORT_SYMBOL(__num_online_cpus); From patchwork Fri Apr 14 16:30:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 13211823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5AA2C77B72 for ; Fri, 14 Apr 2023 16:30:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B89B900004; Fri, 14 Apr 2023 12:30:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81A27280001; Fri, 14 Apr 2023 12:30:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E19C900006; Fri, 14 Apr 2023 12:30:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5F06C900004 for ; Fri, 14 Apr 2023 12:30:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2558E1603C3 for ; Fri, 14 Apr 2023 16:30:51 +0000 (UTC) X-FDA: 80680535502.19.3149EEE Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf06.hostedemail.com (Postfix) with ESMTP id 1A8A3180022 for ; Fri, 14 Apr 2023 16:30:48 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=ijF0ruw1; dkim=pass header.d=linutronix.de header.s=2020e header.b="88gs/L+W"; spf=pass (imf06.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681489849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=Liy0JL0O99yuZfbRpqXKPbI/GHoH9SgVRO31sSZwo7E=; b=YZzDujDiL+grPGEVz5iLw0ZNCd5v8azdNH6HmUG2xAx7/Kfmzc00JrpIPbgHCyskW56BQ+ 72yCGEY+5pFHEW/hPHg6XvLwHM+Gj5vHVuQYZW+QtBoN0aUDi1rZPtGDqookhmY1c35inI 0kLXR0RWSGjt/jSv31UMsRHDBmjY+3A= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=ijF0ruw1; dkim=pass header.d=linutronix.de header.s=2020e header.b="88gs/L+W"; spf=pass (imf06.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681489849; a=rsa-sha256; cv=none; b=6yvZovuHvjqu+RlqDHDIMl/wRcbjEOdEqz7jFgI6H1oLYbOBcG9VQQLTaB0mSsw3VviWhX F8KnO+VV+0aGSX4FFWZKpK4zpmYNQs1kcET7jFaSHfFMnS2OYpb/rP501QJ86XH213P2mN Iwe+5XuXl8fsKks4uSRA+enVCEL6Trc= Message-ID: <20230414162841.292513270@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1681489847; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Liy0JL0O99yuZfbRpqXKPbI/GHoH9SgVRO31sSZwo7E=; b=ijF0ruw1cRMf22DDY8lmd1RpMdgKqhhAKkwlN0FRiBkbd5rpJLGGXc6GIVBW2BbRxz+vrs bZINbc9E23N9wlLYUwjvi3tmXJ1x+IPt4Ms2sgxXE5oSFZGuZ9xJXM6yqt56hOoy5h+e8r IXQV2yLLh0BEcBOUq7fAsL3h3mRk5+BoQvLVLP9DC9TWBqDDBUtrbtyJOHtpVA8WVv2maC I/49Pc6G6I0AGxUSCTvpI+8aUkJHJtFsa0vzxnk8LF60uJJ14iOfxeeQi9R4xPF0/f3xQh qKas2Szjgx59CicyyVV4R+ZlbG1G+0ip+HLYkw+WSJZgNrbys3cRgU0nx79kYQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1681489847; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Liy0JL0O99yuZfbRpqXKPbI/GHoH9SgVRO31sSZwo7E=; b=88gs/L+WXECo4w3u1kbaszJZORO8dhFU42usZ2pHN0I3SSSJ1k5U4lRitxe8qPvcWmWzCC 0yekjG5CpzkPrmCw== From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Valentin Schneider , Dennis Zhou , Tejun Heo , Christoph Lameter , Dave Chinner , Yury Norov , Andy Shevchenko , Rasmus Villemoes , Ye Bin , linux-mm@kvack.org Subject: [patch 3/3] cpu/hotplug: Get rid of cpu_dying_mask References: <20230414162755.281993820@linutronix.de> MIME-Version: 1.0 Date: Fri, 14 Apr 2023 18:30:46 +0200 (CEST) X-Stat-Signature: bzjigsn8reea34bbqqfp5sberemjeuap X-Rspam-User: X-Rspamd-Queue-Id: 1A8A3180022 X-Rspamd-Server: rspam06 X-HE-Tag: 1681489848-703118 X-HE-Meta: U2FsdGVkX18stRMmLlxrjPFFS2TllSMftr8VjgQL9f+eNtY7sADSQXIn8AW1jsXgptDrsaWzH2kIubRe7KxcNU6IKXUxuvgxgWIkDHlDveM7T4cVW0EnWkYGnXGqLmkO4g+IJjCX+CaMDJHCL4osTDliwrVeOYWrodzuSRT9gYruknPKD07DDD/4Am74mTxuYUbgE/4CoR1MkRGXyoosY+9Xcl+VvhBM7eXFTRpLi8WyrBtjF3FmEdggREaWp1GCnO9jf5YxFKUr6dfjiSlHg71DXQjDeyvNtDY9xuLuFWsXA9536oceSUtuGcZqmpE/aSeav34HEKyUXmuJwxm4cOGfO1luflVQ/LHTZOXCth4Xr1TegG1M/RsE3ZGYM2FiBFmahSz5fE394nhXtDx0cuOYYCgJHnpzAanO7hrdAE9rDPOElWtrEU0UY7qnPc2NWBwnQbisxYeXuvJXLfOmEAkOis7PMCAB3FsreO6/txhAmwRspdEW2dy+GuleHW3L6oZp5AvLEt4wzHiebzA50F2tzoo5UWCxEwNvOKAaq1Q0ajDFujfHVFRNWVs+b5aOYpMKWmPZbrLuKxVjBS46KJV+gLAEoOVEr9Y2uZG3CFrdRZZAaLMBk2YcGyBq+4XZmDf1bp2QidV7uGeeMJ7XYWUqlvFY1CBu36HJ28RrzE2iMqazx6bLgYCoKkMEe2B2mymhZ4xva6O8AaJUrBVWJiY38hXVpApFYywdgnWpbW+urYzIbllF/Ayr5t04R3f6G4+YkSy7b1jenDx2DR0BcyLOYDX7zyNdgcmzuwpWg4ee4FcNuem7P//ilzvc60dOQ8iRBmsLYhP1DzT4G2WZGNqNL0d7cDvvooeaGbeLJsr3gYpZf1pbfHsca1gDmejZBd1DsWbLw5VUPd/xFGeWbGcoz5roYN13A5aNO+b7ngnYNlLqLdqmTsu9kGe8vSCU0fY5a13kAe8AHkpmzGq v2/Bb1E6 oS/fta50kMjPNzJSi0J6w2GyB07XqNvaVIvcyrjcHC0yP/WzVmVHFk2Fzjk1SYHcKu0DsycAJtbj8CuiwT79jkgXRVdIJYL9SXnd2xe2+K9jrWg53KXVs9dUKL5+YYRiIdOmi8s93PRfJ6cPH4YiPNOLcm/7GUZ9AragnpgOC8tqYcheiTZTN8MZGz+po47VbWsmjMb0ML+BOIM9l/TME7RdpNWkt/uYy9gCDRPRrakAxOw0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The cpu_dying_mask is not only undocumented but also to some extent a misnomer. It's purpose is to capture the last direction of a cpu_up() or cpu_down() operation taking eventual rollback operations into account. The name and the lack of documentation lured already someone to use it in the wrong way. The initial user is the scheduler code which needs to keep the decision correct whether to schedule tasks on a CPU, which is between the CPUHP_ONLINE and the CPUHP_ACTIVE state and has the balance_push() hook installed. cpu_dying mask is not really useful for general consumption. The cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes. It might be argued, that the cpu_dying_mask bit could be cleared when cpu_down() completes, but that's not possible under all circumstances. Especially not with partial hotplug operations. In that case the bit must be sticky in order to keep the initial user, i.e. the scheduler correct. Replace the cpumask completely by: - recording the direction internally in the CPU hotplug core state - exposing that state via a documented function to the scheduler After that cpu_dying_mask is not longer in use and removed before the next user trips over it. Signed-off-by: Thomas Gleixner --- include/linux/cpumask.h | 21 --------------------- kernel/cpu.c | 43 +++++++++++++++++++++++++++++++++++++------ kernel/sched/core.c | 4 ++-- kernel/smpboot.h | 2 ++ 4 files changed, 41 insertions(+), 29 deletions(-) --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -126,12 +126,10 @@ extern struct cpumask __cpu_possible_mas extern struct cpumask __cpu_online_mask; extern struct cpumask __cpu_present_mask; extern struct cpumask __cpu_active_mask; -extern struct cpumask __cpu_dying_mask; #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask) #define cpu_online_mask ((const struct cpumask *)&__cpu_online_mask) #define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask) #define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask) -#define cpu_dying_mask ((const struct cpumask *)&__cpu_dying_mask) extern atomic_t __num_online_cpus; @@ -1015,15 +1013,6 @@ set_cpu_active(unsigned int cpu, bool ac cpumask_clear_cpu(cpu, &__cpu_active_mask); } -static inline void -set_cpu_dying(unsigned int cpu, bool dying) -{ - if (dying) - cpumask_set_cpu(cpu, &__cpu_dying_mask); - else - cpumask_clear_cpu(cpu, &__cpu_dying_mask); -} - /** * to_cpumask - convert an NR_CPUS bitmap to a struct cpumask * * @bitmap: the bitmap @@ -1097,11 +1086,6 @@ static inline bool cpu_active(unsigned i return cpumask_test_cpu(cpu, cpu_active_mask); } -static inline bool cpu_dying(unsigned int cpu) -{ - return cpumask_test_cpu(cpu, cpu_dying_mask); -} - #else #define num_online_cpus() 1U @@ -1129,11 +1113,6 @@ static inline bool cpu_active(unsigned i return cpu == 0; } -static inline bool cpu_dying(unsigned int cpu) -{ - return false; -} - #endif /* NR_CPUS > 1 */ #define cpu_is_offline(cpu) unlikely(!cpu_online(cpu)) --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -53,6 +53,9 @@ * @rollback: Perform a rollback * @single: Single callback invocation * @bringup: Single callback bringup or teardown selector + * @goes_down: Indicator for direction of cpu_up()/cpu_down() operations + * including eventual rollbacks. Not affected by state or + * instance add/remove operations. See cpuhp_cpu_goes_down(). * @cpu: CPU number * @node: Remote CPU node; for multi-instance, do a * single entry callback for install/remove @@ -72,6 +75,7 @@ struct cpuhp_cpu_state { bool rollback; bool single; bool bringup; + bool goes_down; struct hlist_node *node; struct hlist_node *last; enum cpuhp_state cb_state; @@ -295,6 +299,37 @@ void cpu_maps_update_done(void) mutex_unlock(&cpu_add_remove_lock); } +/** + * cpuhp_cpu_goes_down - Query the current/last CPU hotplug direction of a CPU + * @cpu: The CPU to query + * + * The direction indicator is modified by the hotplug core on + * cpu_up()/cpu_down() operations including eventual rollback operations. + * The indicator is not affected by state or instance install/remove + * operations. + * + * The indicator is sticky after the hotplug operation completes, whether + * the operation was a full up/down or just a partial bringup/teardown. + * + * goes_down + * cpu_up(target) enter -> False + * rollback on fail -> True + * cpu_up(target) exit Last state + * + * cpu_down(target) enter -> True + * rollback on fail -> False + * cpu_down(target) exit Last state + * + * The return value is a racy snapshot and not protected against concurrent + * CPU hotplug operations which modify the indicator. + * + * Returns: True if cached direction is down, false otherwise + */ +bool cpuhp_cpu_goes_down(unsigned int cpu) +{ + return data_race(per_cpu(cpuhp_state.goes_down, cpu)); +} + /* * If set, cpu_up and cpu_down will return -EBUSY and do nothing. * Should always be manipulated under cpu_add_remove_lock @@ -486,8 +521,7 @@ cpuhp_set_state(int cpu, struct cpuhp_cp st->target = target; st->single = false; st->bringup = bringup; - if (cpu_dying(cpu) != !bringup) - set_cpu_dying(cpu, !bringup); + st->goes_down = !bringup; return prev_state; } @@ -521,8 +555,7 @@ cpuhp_reset_state(int cpu, struct cpuhp_ } st->bringup = bringup; - if (cpu_dying(cpu) != !bringup) - set_cpu_dying(cpu, !bringup); + st->goes_down = !bringup; } /* Regular hotplug invocation of the AP hotplug thread */ @@ -2644,8 +2677,6 @@ EXPORT_SYMBOL(__cpu_present_mask); struct cpumask __cpu_active_mask __read_mostly; -struct cpumask __cpu_dying_mask __read_mostly; - atomic_t __num_online_cpus __read_mostly; EXPORT_SYMBOL(__num_online_cpus); --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2297,7 +2297,7 @@ static inline bool is_cpu_allowed(struct return cpu_online(cpu); /* Regular kernel threads don't get to stay during offline. */ - if (cpu_dying(cpu)) + if (cpuhp_cpu_goes_down(cpu)) return false; /* But are allowed during online. */ @@ -9344,7 +9344,7 @@ static void balance_push(struct rq *rq) * Only active while going offline and when invoked on the outgoing * CPU. */ - if (!cpu_dying(rq->cpu) || rq != this_rq()) + if (!cpuhp_cpu_goes_down(rq->cpu) || rq != this_rq()) return; /* --- a/kernel/smpboot.h +++ b/kernel/smpboot.h @@ -20,4 +20,6 @@ int smpboot_unpark_threads(unsigned int void __init cpuhp_threads_init(void); +bool cpuhp_cpu_goes_down(unsigned int cpu); + #endif