From patchwork Thu Jun 13 06:17:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshuman Khandual X-Patchwork-Id: 13696176 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5479CC27C4F for ; Thu, 13 Jun 2024 06:18:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=hbrg+jg9l5q0j2HJd2WLdtotzSF2lY4ONEOF2NDRSEU=; b=fDwOMlXDlxsT32hvyHCyLOqnlZ mp08lDZQi0FVvbTNyBoweuAQr0VarRFQ8/jNM89EgnUeI6kdnoPzHy43XvrQSHpl7XRDRctU11Cs8 kAkrkuRAMqW4OKOjpUp1rXaQYJJtGEimO2mXCh9TpHD2arEzcC5RloFWV1vAh4asa567NQBTLoB/u OuzKiIsYc96X8qizUlRN91ihnDFpY9bzgIBbhptagCVDtxDNtVV3cdP7i+pPk4t0gWvuna43D/1Cc yhPHVHiv0GoUjX9hXU7Ld5vnwUoW8xR+4MGlY8D10Fg9vzIGtJ21SzojjzBxlmBmnGOIpDK+25WFc f1rdj/wg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sHdmU-0000000FGxG-3vLl; Thu, 13 Jun 2024 06:17:54 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sHdmP-0000000FGu1-0aPa for linux-arm-kernel@lists.infradead.org; Thu, 13 Jun 2024 06:17:52 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ACDEB1063; Wed, 12 Jun 2024 23:18:09 -0700 (PDT) Received: from a077893.arm.com (unknown [10.163.44.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9085A3F5A1; Wed, 12 Jun 2024 23:17:39 -0700 (PDT) From: Anshuman Khandual To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, will@kernel.org, catalin.marinas@arm.com, mark.rutland@arm.com Cc: Anshuman Khandual , Mark Brown , James Clark , Rob Herring , Marc Zyngier , Suzuki Poulose , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , linux-perf-users@vger.kernel.org Subject: [PATCH V18 0/9] arm64/perf: Enable branch stack sampling Date: Thu, 13 Jun 2024 11:47:22 +0530 Message-Id: <20240613061731.3109448-1-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240612_231749_461578_5B59BA49 X-CRM114-Status: GOOD ( 20.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All the relevant register definitions could be accessed here. https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers This series applies on 6.10-rc3. Also this series is being hosted below for quick access, review and test. https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v18) - Anshuman ========== Perf Branch Stack Sampling Support (arm64 platforms) =========== Currently arm64 platform does not support perf branch stack sampling. Hence any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK marked in event->attr.sample_type, will be rejected in armpmu_event_init(). static int armpmu_event_init(struct perf_event *event) { ........ /* does not support taken branch sampling */ if (has_branch_stack(event)) return -EOPNOTSUPP; ........ } $perf record -j any,u,k ls Error: cycles:P: PMU Hardware or event type doesn't support branch stack sampling. -------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ---------------------- After this series, perf branch stack sampling feature gets enabled on arm64 platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is also selected during build. Let's observe all all possible scenarios here. 1. Feature not built (!CONFIG_ARM64_BRBE): Falls back to the current behaviour i.e event gets rejected. 2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE): Falls back to the current behaviour i.e event gets rejected. 3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE): Platform supports branch stack sampling requests. Let's observe through a simple example here. $perf record -j any_call,u,k,save_type ls [Please refer perf-record man pages for all possible branch filter options] $perf report -------------------------- Snip ---------------------- # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................................ ............................................ .................. # 3.52% ls [kernel.kallsyms] [k] sched_clock_noinstr [k] arch_counter_get_cntpct 16 3.52% ls [kernel.kallsyms] [k] sched_clock [k] sched_clock_noinstr 9 1.85% ls [kernel.kallsyms] [k] sched_clock_cpu [k] sched_clock 5 1.80% ls [kernel.kallsyms] [k] irqtime_account_irq [k] sched_clock_cpu 20 1.58% ls [kernel.kallsyms] [k] gic_handle_irq [k] generic_handle_domain_irq 19 1.58% ls [kernel.kallsyms] [k] call_on_irq_stack [k] gic_handle_irq 9 1.58% ls [kernel.kallsyms] [k] do_interrupt_handler [k] call_on_irq_stack 23 1.58% ls [kernel.kallsyms] [k] generic_handle_domain_irq [k] __irq_resolve_mapping 6 1.58% ls [kernel.kallsyms] [k] __irq_resolve_mapping [k] __rcu_read_lock 10 -------------------------- Snip ---------------------- $perf report -D | grep cycles -------------------------- Snip ---------------------- ..... 1: ffff800080dd3334 -> ffff800080dd759c 39 cycles P 0 IND_CALL ..... 2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles P 0 IND_CALL ..... 3: ffff800080139918 -> ffff800080ffae64 9 cycles P 0 CALL ..... 4: ffff800080dd3324 -> ffff8000801398f8 7 cycles P 0 CALL ..... 5: ffff8000800f8548 -> ffff800080dd330c 21 cycles P 0 IND_CALL ..... 6: ffff8000800f864c -> ffff8000800f84ec 6 cycles P 0 CALL ..... 7: ffff8000800f86dc -> ffff8000800f8638 11 cycles P 0 CALL ..... 8: ffff8000800f86d4 -> ffff800081008630 16 cycles P 0 CALL -------------------------- Snip ---------------------- perf script and other tooling can also be applied on the captured perf.data Similarly branch stack sampling records can be collected via direct system call i.e perf_event_open() method after setting 'struct perf_event_attr' as required. event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_ | PERF_SAMPLE_BRANCH_ | PERF_SAMPLE_BRANCH_ | ............................... But all branch filters might not be supported on the platform. ----------------------- BRBE Branch Filters Support ----------------------- - Following branch filters are supported on arm64. PERF_SAMPLE_BRANCH_USER /* Branch privilege filters */ PERF_SAMPLE_BRANCH_KERNEL PERF_SAMPLE_BRANCH_HV PERF_SAMPLE_BRANCH_ANY /* Branch type filters */ PERF_SAMPLE_BRANCH_ANY_CALL PERF_SAMPLE_BRANCH_ANY_RETURN PERF_SAMPLE_BRANCH_IND_CALL PERF_SAMPLE_BRANCH_COND PERF_SAMPLE_BRANCH_IND_JUMP PERF_SAMPLE_BRANCH_CALL PERF_SAMPLE_BRANCH_NO_FLAGS /* Branch record flags */ PERF_SAMPLE_BRANCH_NO_CYCLES PERF_SAMPLE_BRANCH_TYPE_SAVE PERF_SAMPLE_BRANCH_HW_INDEX PERF_SAMPLE_BRANCH_PRIV_SAVE - Following branch filters are not supported on arm64. PERF_SAMPLE_BRANCH_ABORT_TX PERF_SAMPLE_BRANCH_IN_TX PERF_SAMPLE_BRANCH_NO_TX PERF_SAMPLE_BRANCH_CALL_STACK Events requesting above non-supported branch filters get rejected. --------------------------- Virtualisation support ------------------------ - No guest support -------------------------------- Testing --------------------------------- - Cross compiled for both arm64 and arm32 platforms - Passes all branch tests with 'perf test branch' on arm64 Changes in V18: - Changed BRBIDR0_EL1 register fields CC and FORMAT, updated the commit message - Replaced BRBIDR0_EL1_FORMAT_0 as BRBIDR0_EL1_FORMAT_FORMAT_0 in BRBE driver - Dropped ifdef CONFIG_ARM64_BRBE around __init_el2_brbe() - Updated in code comment around __init_el2_brbe() - Dropped the write up for EL2->EL1 transition, also moved up the EL3 write up - Unconditionally capture branch record type and privilege information - Scan valid branch stack events in armpmu_start() to create merged filter - Dropped branch_sample_type override in armv8pmu_branch_stack_add() - Dropped branch filter mismatch between PMU and event in read_branch_records() - Added SW filtering framework in read_branch_records() during filter mismatch - Added SW filtering for privilege modes - Used host_data_ptr() to access host_debug_state.brbcr_el1 register - Changed DEBUG_STATE_SAVE_BRBE to use BIT(7) - Reverted back iflags as u8 https://lore.kernel.org/all/20240405024639.1179064-1-anshuman.khandual@arm.com/ Changes in V17: - Added back Reviewed-by tags from Mark Brown - Updated the commit message regarding the field BRBINFx_EL1_TYPE_IMPDEF_TRAP_EL3 - Added leading 0s for all values as BRBIDR0_EL1.NUMREC is a 8 bit field - Added leading 0s for all values as BRBFCR_EL1.BANK is a 2 bit field - Reordered BRBCR_EL1/BRBCR_EL12/BRBCR_EL2 registers as per sysreg encodings - Renamed s/FIRST/BANK_0 and s/SECOND/BANK_1 in BRBFCR_EL1.BANK - Renamed s/UNCOND_DIRECT/DIRECT_UNCOND in BRBINFx_EL1.TYPE - Renamed s/COND_DIRECT/DIRECT_COND in BRBINFx_EL1.TYPE - Dropped __SYS_BRBINF/__SYS_BRBSRC/__SYS_BRBTGT and their expansions - Moved all existing BRBE registers from sysreg.h header to tools/sysreg format - Updated the commit message including about sys_insn_descs[] - Changed KVM to use existing SYS_BRBSRC/TGT/INF_EL1(n) format - Moved the BRBE instructions into sys_insn_descs[] array - ARM PMUV3 changes have been moved into the BRBE driver patch instead - Moved down branch_stack_add() in armpmu_add() after event's basic checks - Added new callbacks in struct arm_pmu e.g branch_stack_[init|add|del]() - Renamed struct arm_pmu callback branch_reset() as branch_stack_reset() - Dropped the comment in armpmu_event_init() - Renamed 'pmu_hw_events' elements from 'brbe_' to more generic 'branch_' - Separated out from the BRBE driver implementation patch - Dropped the comment in __init_el2_brbe() - Updated __init_el2_brbe() with BRBCR_EL2.MPRED requirements - Updated __init_el2_brbe() with __check_hvhe() constructs - Updated booting.rst regarding MPRED, MDCR_EL3 and fine grained control - Dropped Documentation/arch/arm64/brbe.rst - Renamed armv8pmu_branch_reset() as armv8pmu_branch_stack_reset() - Separated out booting.rst and EL2 boot requirements into a new patch - Dropped process_branch_aborts() completely - Added an warning if transaction states get detected unexpectedly - Dropped enum brbe_bank_idx from the driver - Defined armv8pmu_branch_stack_init/add/del() callbacks in the driver - Changed BRBE driver to use existing SYS_BRBSRC/TGT/INF_EL1(n) format - Dropped isb() call sites in __debug_[save|restore]_brbe() - Changed to [read|write]_sysreg_el1() accessors in __debug_[save|restore]_brbe() Changes in V16 https://lore.kernel.org/all/20240125094119.2542332-1-anshuman.khandual@arm.com/ - Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3 - Updated BRBCR_ELx[9] as field FZPSS - Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1 - Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion - Renamed arm_brbe.h as arm_pmuv3_branch.h - Updated perf_sample_save_brstack()'s new argument requirements with NULL - Fixed typo (s/informations/information) in Documentation/arch/arm64/brbe.rst - Added SPDX-License-Identifier in Documentation/arch/arm64/brbe.rst - Added new PERF_SAMPLE_BRANCH_COUNTERS into BRBE_EXCLUDE_BRANCH_FILTERS - Dropped BRBFCR_EL1 and BRBCR_EL1 from enum vcpu_sysreg - Reverted back the KVM NVHE patch - use host_debug_state based 'brbcr_el1' element and dropped the previous dependency on Jame's coresight series Changes in V15: https://lore.kernel.org/all/20231201053906.1261704-1-anshuman.khandual@arm.com/ - Added a comment for armv8pmu_branch_probe() regarding single cpu probe - Added a text in brbe.rst regarding single cpu probe - Dropped runtime BRBE enable for setting DEBUG_STATE_SAVE_BRBE - Dropped zero_branch_stack based zero branch records mechanism - Replaced BRBFCR_EL1_DEFAULT_CONFIG with BRBFCR_EL1_CONFIG_MASK - Added BRBFCR_EL1_CONFIG_MASK masking in branch_type_to_brbfcr() - Moved BRBE helpers from arm_brbe.h into arm_brbe.c - Moved armv8_pmu_xxx() declaration inside arm_brbe.h for arm64 (CONFIG_ARM64_BRBE) - Moved armv8_pmu_xxx() stub definitions inside arm_brbe.h for arm32 (!CONFIG_ARM64_BRBE) - Included arm_brbe.h header both in arm_pmuv3.c and arm_brbe.c - Dropped BRBE custom pr_fmt() - Dropped CONFIG_PERF_EVENTS wrapping from header entries - Flush branch records when a cpu bound event follows a task bound event - Dropped BRBFCR_EL1 from __debug_save_brbe()/__debug_restore_brbe() - Always save the live SYS_BRBCR_EL1 in host context and then check if BRBE was enabled before resetting SYS_BRBCR_EL1 for the host Changes in V14: https://lore.kernel.org/all/20231114051329.327572-1-anshuman.khandual@arm.com/ - This series has been reorganised as suggested during V13 - There are just eight patches now i.e 5 enablement and 3 perf branch tests - Fixed brackets problem in __SYS_BRBINFO/BRBSRC/BRBTGT() macros - Renamed the macro i.e s/__SYS_BRBINFO/__SYS_BRBINF/ - Renamed s/BRB_IALL/BRB_IALL_INSN and s/BRBE_INJ/BRB_INJ_INSN - Moved BRB_IALL_INSN and SYS_BRB_INSN instructions to sysreg patch - Changed E1BRE as ExBRE in sysreg fields inside BRBCR_ELx - Used BRBCR_ELx for defining all BRBCR_EL1, BRBCR_EL2, and BRBCR_EL12 (new) - Folded the following three patches into a single patch i.e [PATCH 3/8] drivers: perf: arm_pmu: Add new sched_task() callback arm64/perf: Add branch stack support in struct arm_pmu arm64/perf: Add branch stack support in struct pmu_hw_events arm64/perf: Add branch stack support in ARMV8 PMU arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack() - All armv8pmu_branch_xxxx() stub definitions have been moved inside include/linux/perf/arm_pmuv3.h for easy access from both arm32 and arm64 - Added brbe_users, brbe_context and brbe_sample_type in struct pmu_hw_events - Added comments for all the above new elements in struct pmu_hw_events - Added branch_reset() and sched_task() callbacks - Changed and optimized branch records processing during a PMU IRQ - NO branch records get captured for event with mismatched brbe_sample_type - Branch record context is tracked from armpmu_del() & armpmu_add() - Branch record hardware is driven from armv8pmu_start() & armv8pmu_stop() - Dropped NULL check for 'pmu_ctx' inside armv8pmu_sched_task() - Moved down PERF_ATTACH_TASK_DATA assignment with a preceding comment - In conflicting branch sample type requests, first event takes precedence - Folded the following five patches from V13 into a single patch i.e [PATCH 4/8] arm64/perf: Enable branch stack events via FEAT_BRBE arm64/perf: Add struct brbe_regset helper functions arm64/perf: Implement branch records save on task sched out arm64/perf: Implement branch records save on PMU IRQ - Fixed the year in copyright statement - Added Documentation/arch/arm64/brbe.rst - Updated Documentation/arch/arm64/booting.rst (BRBCR_EL2.CC for EL1 entry) - Added __init_el2_brbe() which enables branch record cycle count support - Disabled EL2 traps in __init_el2_fgt() while accessing BRBE registers and executing instructions - Changed CONFIG_ARM64_BRBE user visible description - Fixed a typo in CONFIG_ARM64_BRBE config option description text - Added BUILD_BUG_ON() co-relating BRBE_BANK_MAX_ENTRIES and MAX_BRANCH_RECORDS - Dropped arm64_create_brbe_task_ctx_kmem_cache() - Moved down comment for PERF_SAMPLE_BRANCH_KERNEL in branch_type_to_brbcr() - Renamed BRBCR_ELx_DEFAULT_CONFIG as BRBCR_ELx_CONFIG_MASK - Replaced BRBCR_ELx_DEFAULT_TS with BRBCR_ELx_TS_MASK in BRBCR_ELx_CONFIG_MASK - Replaced BRBCR_ELx_E1BRE instances with BRBCR_ELx_ExBRE - Added BRBE specific branch stack sampling perf test patches into the series - Added a patch to prevent guest accesses into BRBE registers and instructions - Added a patch to save the BRBE host context in NVHE environment - Updated most commit messages Changes in V13: https://lore.kernel.org/all/20230711082455.215983-1-anshuman.khandual@arm.com/ https://lore.kernel.org/all/20230622065351.1092893-1-anshuman.khandual@arm.com/ - Added branch callback stubs for aarch32 pmuv3 based platforms - Updated the comments for capture_brbe_regset() - Deleted the comments in __read_brbe_regset() - Reversed the arguments order in capture_brbe_regset() and brbe_branch_save() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset() Changes in V12: https://lore.kernel.org/all/20230615133239.442736-1-anshuman.khandual@arm.com/ - Replaced branch types with complete DIRECT/INDIRECT prefixes/suffixes - Replaced branch types with complete INSN/ALIGN prefixes/suffixes - Replaced return branch types as simple RET/ERET - Replaced time field GST_PHYSICAL as GUEST_PHYSICAL - Added 0 padding for BRBIDR0_EL1.NUMREC enum values - Dropped helper arm_pmu_branch_stack_supported() - Renamed armv8pmu_branch_valid() as armv8pmu_branch_attr_valid() - Separated perf_task_ctx_cache setup from arm_pmu private allocation - Collected changes to branch_records_alloc() in a single patch [5/10] - Reworked and cleaned up branch_records_alloc() - Reworked armv8pmu_branch_read() with new loop iterations in patch [6/10] - Reworked capture_brbe_regset() with new loop iterations in patch [8/10] - Updated the comment in branch_type_to_brbcr() - Fixed the comment before stitch_stored_live_entries() - Fixed BRBINFINJ_EL1 definition for VALID_FULL enum field - Factored out helper __read_brbe_regset() from capture_brbe_regset() - Dropped the helper copy_brbe_regset() - Simplified stitch_stored_live_entries() with memcpy(), memmove() - Reworked armv8pmu_probe_pmu() to bail out early with !probe.present - Rework brbe_attributes_probe() without 'struct brbe_hw_attr' - Dropped 'struct brbe_hw_attr' argument from capture_brbe_regset() - Dropped 'struct brbe_hw_attr' argument from brbe_branch_save() - Dropped arm_pmu->private and added arm_pmu->reg_trbidr instead Changes in V11: https://lore.kernel.org/all/20230531040428.501523-1-anshuman.khandual@arm.com/ - Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data Changes in V10: https://lore.kernel.org/all/20230517022410.722287-1-anshuman.khandual@arm.com/ - Rebased the series on v6.4-rc2 - Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c - Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h] - Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq() Changes in V9: https://lore.kernel.org/all/20230315051444.1683170-1-anshuman.khandual@arm.com/ - Fixed build problem with has_branch_stack() in arm64 header - BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields' - Renamed all BRBINF_EL1 call sites as BRBINFx_EL1 - Dropped static const char branch_filter_error_msg[] - Implemented a positive list check for BRBE supported perf branch filters - Added a comment in armv8pmu_handle_irq() - Implemented per-cpu allocation for struct branch_record records - Skipped looping through bank 1 if an invalid record is detected in bank 0 - Added comment in armv8pmu_branch_read() explaining prohibited region etc - Added comment warning about erroneously marking transactions as aborted - Replaced the first argument (perf_branch_entry) in capture_brbe_flags() - Dropped the last argument (idx) in capture_brbe_flags() - Dropped the brbcr argument from capture_brbe_flags() - Used perf_sample_save_brstack() to capture branch records for perf_sample_data - Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces - Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read() - Implemented event task context based branch records save mechanism Changes in V8: https://lore.kernel.org/all/20230123125956.1350336-1-anshuman.khandual@arm.com/ - Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper - Added a comment and line break before arm_pmu->private element - Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]() - Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event() - Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP() - Changed brbe_hw_attr->brbe_version from 'bool' to 'int' - Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]() - Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid() - Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings - Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags() - Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type() - Renamed brbe_valid() as brbe_record_is_complete() - Renamed brbe_source() as brbe_record_is_source_only() - Renamed brbe_target() as brbe_record_is_target_only() - Inverted checks for !brbe_record_is_[target|source]_only() for info capture - Replaced 'fetch' with 'get' in all helpers that extract field value - Dropped 'static int brbe_current_bank' optimization in select_brbe_bank() - Dropped select_brbe_bank_index() completely, added capture_branch_entry() - Process captured branch entries in two separate loops one for each BRBE bank - Moved branch_records_alloc() inside armv8pmu_probe_pmu() - Added a forward declaration for the helper has_branch_stack() - Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free() - Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call Changes in V7: https://lore.kernel.org/all/20230105031039.207972-1-anshuman.khandual@arm.com/ - Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event - Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header - Defined BRBFCR_EL1_DEFAULT_CONFIG in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP - Defined BRBCR_EL1_DEFAULT_TS in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS - Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr() - Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr() - Also set BRBE in paused state in armv8pmu_branch_disable() - Dropped brbe_paused(), set_brbe_paused() helpers - Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid() - Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr - Added valid_brbe_[cc, format, version]() helpers - Split a separate brbe_attributes_probe() from armv8pmu_branch_probe() - Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid() - Defined enum brbe_bank_idx with possible values for BRBE bank indices - Changed armpmu->hw_attr into armpmu->private - Added missing space in stub definition for armv8pmu_branch_valid() - Replaced both kmalloc() with kzalloc() - Added BRBE_BANK_MAX_ENTRIES - Updated comment for capture_brbe_flags() - Updated comment for struct brbe_hw_attr - Dropped space after type cast in couple of places - Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read() - Captured cpuc->branches->branch_entries[idx] in a local variable - Dropped saved_priv from armv8pmu_branch_read() - Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration - Replaced with FIELD_GET() and FIELD_PREP() wherever applicable - Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL - Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version() select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation - Reorganized brbe_valid_nr() and dropped the pr_warn() message - Changed probe sequence in brbe_attributes_probe() - Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state - Disable BRBE before disabling the PMU event counter - Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode() - Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported() Changes in V6: https://lore.kernel.org/linux-arm-kernel/20221208084402.863310-1-anshuman.khandual@arm.com/ - Restore the exception level privilege after reading the branch records - Unpause the buffer after reading the branch records - Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level - Reworked BRBE implementation and branch stack sampling support on arm pmu - BRBE implementation is now part of overall ARMV8 PMU implementation - BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/ - CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig - File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c - File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h - BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events - BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events - BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation - Added sched_task() callback into struct arm_pmu - Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes - Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu - Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events - Dropped brbe_users, brbe_context tracking in struct hw_pmu_events - Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag - armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation - Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe - Added armv8pmu_branch_reset() inside armv8pmu_branch_enable() - Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK - Dropped set_brbe_disabled() as well - Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events Changes in V5: https://lore.kernel.org/linux-arm-kernel/20221107062514.2851047-1-anshuman.khandual@arm.com/ - Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01 - Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI - Changed config ARM_BRBE_PMU from 'tristate' to 'bool' Changes in V4: https://lore.kernel.org/all/20221017055713.451092-1-anshuman.khandual@arm.com/ - Changed ../tools/sysreg declarations as suggested - Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags - Dropped perfmon_capable() check in armpmu_event_init() - s/pr_warn_once/pr_info in armpmu_event_init() - Added brbe_format element into struct pmu_hw_events - Changed v1p1 as brbe_v1p1 in struct pmu_hw_events - Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning Changes in V3: https://lore.kernel.org/all/20220929075857.158358-1-anshuman.khandual@arm.com/ - Moved brbe_stack from the stack and now dynamically allocated - Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv() - Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg - Created dummy BRBINF_EL1 field definitions in tools/sysreg - Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable() - Both exception and exception return branche records are now captured only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already been checked in generic perf via perf_allow_kernel() Changes in V2: https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@arm.com/ - Dropped branch sample filter helpers consolidation patch from this series - Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable() - Use cached perfmon_capable() while configuring BRBE branch record filters Changes in V1: https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/ - Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers - Process new perf branch types via PERF_BR_EXTEND_ABI Changes in RFC V2: https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/ - Added branch_sample_priv() while consolidating other branch sample filter helpers - Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc - Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5) - Added documentation for struct arm_pmu changes, updated commit message - Updated commit message for BRBE detection infrastructure patch - PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver) - Branch privilege state capture mechanism has now moved inside the driver Changes in RFC V1: https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/ Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Mark Brown Cc: James Clark Cc: Rob Herring Cc: Marc Zyngier Cc: Suzuki Poulose Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Anshuman Khandual (6): arm64/sysreg: Add BRBE registers and fields KVM: arm64: Explicitly handle BRBE traps as UNDEFINED drivers: perf: arm_pmu: Add infrastructure for branch stack sampling arm64/boot: Enable EL2 requirements for BRBE drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE KVM: arm64: nvhe: Disable branch generation in nVHE guests James Clark (3): perf: test: Speed up running brstack test on an Arm model perf: test: Remove empty lines from branch filter test output perf: test: Extend branch stack sampling test for Arm64 BRBE Documentation/arch/arm64/booting.rst | 23 +- arch/arm64/include/asm/el2_setup.h | 87 +- arch/arm64/include/asm/kvm_host.h | 3 + arch/arm64/include/asm/sysreg.h | 17 +- arch/arm64/kvm/debug.c | 5 + arch/arm64/kvm/hyp/nvhe/debug-sr.c | 31 + arch/arm64/kvm/sys_regs.c | 56 ++ arch/arm64/tools/sysreg | 131 +++ drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 1198 ++++++++++++++++++++++++ drivers/perf/arm_pmu.c | 42 +- drivers/perf/arm_pmuv3.c | 160 +++- drivers/perf/arm_pmuv3_branch.h | 83 ++ include/linux/perf/arm_pmu.h | 37 +- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/shell/test_brstack.sh | 57 +- tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 2 + tools/perf/tests/workloads/traploop.c | 39 + 20 files changed, 1959 insertions(+), 26 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_pmuv3_branch.h create mode 100644 tools/perf/tests/workloads/traploop.c