From patchwork Tue Sep 19 09:28:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 13391022 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 86B85CD54AC for ; Tue, 19 Sep 2023 09:29:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=77f8Ur7iN4w+HFbYmB0YvLGHvI05yTXObEMAlC05Dpg=; b=dNXPPLMg+MaTY/ IQ75hce+gc2EMkNOrULp4b0f8SDG8T/HdGkIaL/Qq+IlrqKerUCo3Xd/XzYZswx//5lUfUYAI4GH+ 0ZstlzaD5KToZD1TpUFO7B9pp07ywXxJR02eF00WdukoTiQpnf2lOajaDAagtzLokpLUfs1zQdnsG KdMFleY5nxMMv4mbESDlJ8lS/bBmvLqzkGsosZM+h3Kai5DQY/AfhPwT1UXuOdOVZrM7hPAj8SXb4 E6HQE3s/1eWCVHYz8/HfKFfulKIRPq9dGANGzRy+V6m+YQmwAMUjqym3gIWMn+jzHt72UJd0bTiGz WkV+4EO6EG89A6+2sYgQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qiX2i-00HSfW-2m; Tue, 19 Sep 2023 09:29:16 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qiX2f-00HSef-2Q for linux-arm-kernel@lists.infradead.org; Tue, 19 Sep 2023 09:29:15 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D589DC15; Tue, 19 Sep 2023 02:29:45 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5D2D13F59C; Tue, 19 Sep 2023 02:29:06 -0700 (PDT) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: ardb@kernel.org, bertrand.marquis@arm.com, boris.ostrovsky@oracle.com, broonie@kernel.org, catalin.marinas@arm.com, daniel.lezcano@linaro.org, james.morse@arm.com, jgross@suse.com, mark.rutland@arm.com, maz@kernel.org, oliver.upton@linux.dev, pcc@google.com, sstabellini@kernel.org, suzuki.poulose@arm.com, tglx@linutronix.de, vladimir.murzin@arm.com, will@kernel.org Subject: [PATCH 00/37] arm64: Remove cpus_have_const_cap() Date: Tue, 19 Sep 2023 10:28:13 +0100 Message-Id: <20230919092850.1940729-1-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230919_022913_904650_093F28A2 X-CRM114-Status: GOOD ( 19.56 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For historical reasons, cpus_have_const_cap() does more than its name implies, and its current behaviour is more harmful than helpful. This series removes cpus_have_const_cap(), removing some redundant code and making the kernel more robust. Currently, cpus_have_const_cap() is implemented as: | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } For hyp code this is safe and practically ideal. We finalize system cpucaps and patch the relevant alternatives before KVM is initialized, and so the alternative branch generated by cpus_have_final_cap() is guaranteed to observe the finalized value of the cpucap. For non-hyp code this is potentially unsafe and sub-optimal: 1) System cpucaps are detected on the boot CPU while secondary CPUs are executing code. This leads to potential races around cpucaps being detected, where the cpucaps can change at arbitrary points in time, potentially in the middle of sequences which depend on them not changing, e.g. CPU 0 CPU 1 // doesn't save PMR flags = local_daif_save(); // detects PSEUDO-NMI // attempts to restore PMR local_daif_restore(flags); This can potentially lead to erratic behaviour, and for stateful sequences it would be better to use alternatives such that the entire sequence is patched atomically. 2) For several cpucaps we perform some enablement/intialization work between detecting the cpucap nad patching alternatives. For some features (e.g. SVE and SME) we need to record some additional properties (e.g. vector lengths) before patching alternatives. If patched alternative sequences consume any of the recorded properties, it's possible that these race with the enablement/initialization and consume stale values, which could potentially result in erratic behaviour. It would be better to use alternatives such that the enablement/initialization is guaranteed to happen before any such usage. 3) Most code doesn't run between cpucaps being detected and their alternatives being patched, and will have redundant code generated, with an alternative branch for system_capabilities_finalized(), and a bitmap test for cpus_have_cap(). This bloats the kernel and wastes I-cache resources, and the resulting branching structure pessimizes compiler output. This is especially noticeable in part of the kernel which need to test a number of cpucaps in quick succession, such as exception handlers in entry-common.c and state save/restore in fpsimd.c. Using alternative branches directly can dramatically improve the code generated for such paths (e.g. making the entry code several KB smaller in some configurations). This series attempts to address the above issues by removing cpus_have_const_cap() and migrating code over to alternative branches wherever possible: * Patches 1 to 2 address a couple of bugs I spotted where cpucaps are consumed prior to being initialized. * Patches 3 to 5 rework some low-level cpucap helpers and add new helpers which are used later in the series. * Patches 6 to 8 rework some feature enablement code so that this can work in the window between cpucap detection and alternative patching without the need to use cpus_have_const_cap(). * Patch 9 moves KVM entirely over to cpus_have_final_cap(). * Patches 10 to 12 clean up the ARM64_HAS_NO_FPSIMD cpucap, inverting this and making it behave the same way as all other system cpucaps. * Patches 13 to 36 migrate code away from cpus_have_const_cap(). * Patch 37 removes the now-unused cpus_have_const_cap(). The series is based on v6.6-rc2. Mark. Mark Rutland (37): clocksource/drivers/arm_arch_timer: Initialize evtstrm after finalizing cpucaps arm64/arm: xen: enlighten: Fix KPTI checks arm64: Factor out cpucap definitions arm64: Add cpucap_is_possible() arm64: Add cpus_have_final_boot_cap() arm64: Rework setup_cpu_features() arm64: Fixup user features at boot time arm64: Split kpti_install_ng_mappings() arm64: kvm: Use cpus_have_final_cap() explicitly arm64: Explicitly save/restore CPACR when probing SVE and SME arm64: Rename SVE/SME cpu_enable functions arm64: Use a positive cpucap for FP/SIMD arm64: Avoid cpus_have_const_cap() for ARM64_HAS_{ADDRESS,GENERIC}_AUTH arm64: Avoid cpus_have_const_cap() for ARM64_HAS_ARMv8_4_TTL arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTI arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CACHE_DIC arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNP arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Remove cpus_have_const_cap() arch/arm/xen/enlighten.c | 25 +-- arch/arm64/include/asm/alternative-macros.h | 8 +- arch/arm64/include/asm/arch_gicv3.h | 8 + arch/arm64/include/asm/archrandom.h | 2 +- arch/arm64/include/asm/cacheflush.h | 2 +- arch/arm64/include/asm/cpucaps.h | 67 ++++++++ arch/arm64/include/asm/cpufeature.h | 96 +++++------ arch/arm64/include/asm/fpsimd.h | 35 +++- arch/arm64/include/asm/irqflags.h | 20 +-- arch/arm64/include/asm/kvm_emulate.h | 4 +- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/include/asm/kvm_mmu.h | 2 +- arch/arm64/include/asm/mmu.h | 2 +- arch/arm64/include/asm/mmu_context.h | 28 ++-- arch/arm64/include/asm/module.h | 3 +- arch/arm64/include/asm/pgtable-prot.h | 6 +- arch/arm64/include/asm/spectre.h | 2 +- arch/arm64/include/asm/tlbflush.h | 7 +- arch/arm64/include/asm/vectors.h | 2 +- arch/arm64/kernel/cpu_errata.c | 17 -- arch/arm64/kernel/cpufeature.c | 167 ++++++++++++-------- arch/arm64/kernel/efi.c | 3 +- arch/arm64/kernel/fpsimd.c | 81 ++++++---- arch/arm64/kernel/module-plts.c | 7 +- arch/arm64/kernel/process.c | 2 +- arch/arm64/kernel/proton-pack.c | 2 +- arch/arm64/kernel/smp.c | 3 +- arch/arm64/kernel/suspend.c | 13 +- arch/arm64/kernel/sys_compat.c | 2 +- arch/arm64/kernel/traps.c | 2 +- arch/arm64/kernel/vdso.c | 2 +- arch/arm64/kvm/arm.c | 10 +- arch/arm64/kvm/guest.c | 4 +- arch/arm64/kvm/hyp/pgtable.c | 4 +- arch/arm64/kvm/mmu.c | 2 +- arch/arm64/kvm/sys_regs.c | 2 +- arch/arm64/kvm/vgic/vgic-v3.c | 2 +- arch/arm64/lib/delay.c | 2 +- arch/arm64/mm/fault.c | 2 +- arch/arm64/mm/hugetlbpage.c | 3 +- arch/arm64/mm/mmap.c | 2 +- arch/arm64/mm/mmu.c | 3 +- arch/arm64/mm/proc.S | 3 +- arch/arm64/tools/Makefile | 4 +- arch/arm64/tools/cpucaps | 2 +- arch/arm64/tools/gen-cpucaps.awk | 6 +- drivers/clocksource/arm_arch_timer.c | 31 +++- drivers/irqchip/irq-gic-v3.c | 11 -- include/linux/cpuhotplug.h | 2 + 49 files changed, 429 insertions(+), 288 deletions(-) create mode 100644 arch/arm64/include/asm/cpucaps.h