mbox series

[V18,0/9] arm64/perf: Enable branch stack sampling

Message ID 20240613061731.3109448-1-anshuman.khandual@arm.com (mailing list archive)
Headers show
Series arm64/perf: Enable branch stack sampling | expand

Message

Anshuman Khandual June 13, 2024, 6:17 a.m. UTC
This series enables perf branch stack sampling support on arm64 platform
via a new arch feature called Branch Record Buffer Extension (BRBE). All
the relevant register definitions could be accessed here.

https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers

This series applies on 6.10-rc3.

Also this series is being hosted below for quick access, review and test.

https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v18)

- Anshuman

========== Perf Branch Stack Sampling Support (arm64 platforms) ===========

Currently arm64 platform does not support perf branch stack sampling. Hence
any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK
marked in event->attr.sample_type, will be rejected in armpmu_event_init().

static int armpmu_event_init(struct perf_event *event)
{
	........
        /* does not support taken branch sampling */
        if (has_branch_stack(event))
                return -EOPNOTSUPP;
	........
}

$perf record -j any,u,k ls
Error:
cycles:P: PMU Hardware or event type doesn't support branch stack sampling.

-------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ----------------------

After this series, perf branch stack sampling feature gets enabled on arm64
platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is
also selected during build. Let's observe all all possible scenarios here.

1. Feature not built (!CONFIG_ARM64_BRBE):

Falls back to the current behaviour i.e event gets rejected.

2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE):

Falls back to the current behaviour i.e event gets rejected.

3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE):

Platform supports branch stack sampling requests. Let's observe through a
simple example here.

$perf record -j any_call,u,k,save_type ls

[Please refer perf-record man pages for all possible branch filter options]

$perf report
-------------------------- Snip ----------------------
# Overhead  Command  Source Shared Object  Source Symbol                                 Target Symbol                                 Basic Block Cycles
# ........  .......  ....................  ............................................  ............................................  ..................
#
     3.52%  ls       [kernel.kallsyms]     [k] sched_clock_noinstr                       [k] arch_counter_get_cntpct                   16
     3.52%  ls       [kernel.kallsyms]     [k] sched_clock                               [k] sched_clock_noinstr                       9
     1.85%  ls       [kernel.kallsyms]     [k] sched_clock_cpu                           [k] sched_clock                               5
     1.80%  ls       [kernel.kallsyms]     [k] irqtime_account_irq                       [k] sched_clock_cpu                           20
     1.58%  ls       [kernel.kallsyms]     [k] gic_handle_irq                            [k] generic_handle_domain_irq                 19
     1.58%  ls       [kernel.kallsyms]     [k] call_on_irq_stack                         [k] gic_handle_irq                            9
     1.58%  ls       [kernel.kallsyms]     [k] do_interrupt_handler                      [k] call_on_irq_stack                         23
     1.58%  ls       [kernel.kallsyms]     [k] generic_handle_domain_irq                 [k] __irq_resolve_mapping                     6
     1.58%  ls       [kernel.kallsyms]     [k] __irq_resolve_mapping                     [k] __rcu_read_lock                           10
-------------------------- Snip ----------------------

$perf report -D | grep cycles
-------------------------- Snip ----------------------
.....  1: ffff800080dd3334 -> ffff800080dd759c 39 cycles  P   0 IND_CALL
.....  2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles  P   0 IND_CALL
.....  3: ffff800080139918 -> ffff800080ffae64 9  cycles  P   0 CALL
.....  4: ffff800080dd3324 -> ffff8000801398f8 7  cycles  P   0 CALL
.....  5: ffff8000800f8548 -> ffff800080dd330c 21 cycles  P   0 IND_CALL
.....  6: ffff8000800f864c -> ffff8000800f84ec 6  cycles  P   0 CALL
.....  7: ffff8000800f86dc -> ffff8000800f8638 11 cycles  P   0 CALL
.....  8: ffff8000800f86d4 -> ffff800081008630 16 cycles  P   0 CALL
-------------------------- Snip ----------------------

perf script and other tooling can also be applied on the captured perf.data
Similarly branch stack sampling records can be collected via direct system
call i.e perf_event_open() method after setting 'struct perf_event_attr' as
required.

event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK
event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_<FILTER_1> |
				  PERF_SAMPLE_BRANCH_<FILTER_2> |
				  PERF_SAMPLE_BRANCH_<FILTER_3> |
				  ...............................

But all branch filters might not be supported on the platform.

----------------------- BRBE Branch Filters Support -----------------------

- Following branch filters are supported on arm64.

	PERF_SAMPLE_BRANCH_USER		/* Branch privilege filters */
	PERF_SAMPLE_BRANCH_KERNEL
	PERF_SAMPLE_BRANCH_HV

	PERF_SAMPLE_BRANCH_ANY		/* Branch type filters */
	PERF_SAMPLE_BRANCH_ANY_CALL
	PERF_SAMPLE_BRANCH_ANY_RETURN
	PERF_SAMPLE_BRANCH_IND_CALL
	PERF_SAMPLE_BRANCH_COND
	PERF_SAMPLE_BRANCH_IND_JUMP
	PERF_SAMPLE_BRANCH_CALL

	PERF_SAMPLE_BRANCH_NO_FLAGS	/* Branch record flags */
	PERF_SAMPLE_BRANCH_NO_CYCLES
	PERF_SAMPLE_BRANCH_TYPE_SAVE
	PERF_SAMPLE_BRANCH_HW_INDEX
	PERF_SAMPLE_BRANCH_PRIV_SAVE

- Following branch filters are not supported on arm64.

	PERF_SAMPLE_BRANCH_ABORT_TX
	PERF_SAMPLE_BRANCH_IN_TX
	PERF_SAMPLE_BRANCH_NO_TX
	PERF_SAMPLE_BRANCH_CALL_STACK

Events requesting above non-supported branch filters get rejected.

--------------------------- Virtualisation support ------------------------

- No guest support

-------------------------------- Testing ---------------------------------

- Cross compiled for both arm64 and arm32 platforms
- Passes all branch tests with 'perf test branch' on arm64

Changes in V18:

- Changed BRBIDR0_EL1 register fields CC and FORMAT, updated the commit message
- Replaced BRBIDR0_EL1_FORMAT_0 as BRBIDR0_EL1_FORMAT_FORMAT_0 in BRBE driver
- Dropped ifdef CONFIG_ARM64_BRBE around __init_el2_brbe()
- Updated in code comment around __init_el2_brbe()
- Dropped the write up for EL2->EL1 transition, also moved up the EL3 write up
- Unconditionally capture branch record type and privilege information
- Scan valid branch stack events in armpmu_start() to create merged filter
- Dropped branch_sample_type override in armv8pmu_branch_stack_add()
- Dropped branch filter mismatch between PMU and event in read_branch_records()
- Added SW filtering framework in read_branch_records() during filter mismatch
- Added SW filtering for privilege modes
- Used host_data_ptr() to access host_debug_state.brbcr_el1 register
- Changed DEBUG_STATE_SAVE_BRBE to use BIT(7)
- Reverted back iflags as u8

https://lore.kernel.org/all/20240405024639.1179064-1-anshuman.khandual@arm.com/

Changes in V17:

- Added back Reviewed-by tags from Mark Brown
- Updated the commit message regarding the field BRBINFx_EL1_TYPE_IMPDEF_TRAP_EL3
- Added leading 0s for all values as BRBIDR0_EL1.NUMREC is a 8 bit field
- Added leading 0s for all values as BRBFCR_EL1.BANK is a 2 bit field
- Reordered BRBCR_EL1/BRBCR_EL12/BRBCR_EL2 registers as per sysreg encodings
- Renamed s/FIRST/BANK_0 and s/SECOND/BANK_1 in BRBFCR_EL1.BANK
- Renamed s/UNCOND_DIRECT/DIRECT_UNCOND in BRBINFx_EL1.TYPE
- Renamed s/COND_DIRECT/DIRECT_COND in BRBINFx_EL1.TYPE
- Dropped __SYS_BRBINF/__SYS_BRBSRC/__SYS_BRBTGT and their expansions
- Moved all existing BRBE registers from sysreg.h header to tools/sysreg format
- Updated the commit message including about sys_insn_descs[]
- Changed KVM to use existing SYS_BRBSRC/TGT/INF_EL1(n) format
- Moved the BRBE instructions into sys_insn_descs[] array
- ARM PMUV3 changes have been moved into the BRBE driver patch instead
- Moved down branch_stack_add() in armpmu_add() after event's basic checks
- Added new callbacks in struct arm_pmu e.g branch_stack_[init|add|del]()
- Renamed struct arm_pmu callback branch_reset() as branch_stack_reset()
- Dropped the comment in armpmu_event_init()
- Renamed 'pmu_hw_events' elements from 'brbe_' to more generic 'branch_'
- Separated out from the BRBE driver implementation patch
- Dropped the comment in __init_el2_brbe()
- Updated __init_el2_brbe() with BRBCR_EL2.MPRED requirements
- Updated __init_el2_brbe() with __check_hvhe() constructs
- Updated booting.rst regarding MPRED, MDCR_EL3 and fine grained control
- Dropped Documentation/arch/arm64/brbe.rst
- Renamed armv8pmu_branch_reset() as armv8pmu_branch_stack_reset()
- Separated out booting.rst and EL2 boot requirements into a new patch
- Dropped process_branch_aborts() completely
- Added an warning if transaction states get detected unexpectedly
- Dropped enum brbe_bank_idx from the driver
- Defined armv8pmu_branch_stack_init/add/del() callbacks in the driver
- Changed BRBE driver to use existing SYS_BRBSRC/TGT/INF_EL1(n) format
- Dropped isb() call sites in  __debug_[save|restore]_brbe()
- Changed to [read|write]_sysreg_el1() accessors in __debug_[save|restore]_brbe()

Changes in V16

https://lore.kernel.org/all/20240125094119.2542332-1-anshuman.khandual@arm.com/

- Updated BRBINFx_EL1.TYPE = 0b110000 as field IMPDEF_TRAP_EL3
- Updated BRBCR_ELx[9] as field FZPSS
- Updated BRBINFINJ_EL1 to use sysreg field BRBINFx_EL1
- Added BRB_INF_SRC_TGT_EL1 macro for corresponding BRB_[INF|SRC|TGT] expansion
- Renamed arm_brbe.h as arm_pmuv3_branch.h
- Updated perf_sample_save_brstack()'s new argument requirements with NULL
- Fixed typo (s/informations/information) in Documentation/arch/arm64/brbe.rst
- Added SPDX-License-Identifier in Documentation/arch/arm64/brbe.rst
- Added new PERF_SAMPLE_BRANCH_COUNTERS into BRBE_EXCLUDE_BRANCH_FILTERS
- Dropped BRBFCR_EL1 and BRBCR_EL1 from enum vcpu_sysreg
- Reverted back the KVM NVHE patch - use host_debug_state based 'brbcr_el1'
  element and dropped the previous dependency on Jame's coresight series

Changes in V15:

https://lore.kernel.org/all/20231201053906.1261704-1-anshuman.khandual@arm.com/

- Added a comment for armv8pmu_branch_probe() regarding single cpu probe
- Added a text in brbe.rst regarding single cpu probe
- Dropped runtime BRBE enable for setting DEBUG_STATE_SAVE_BRBE
- Dropped zero_branch_stack based zero branch records mechanism
- Replaced BRBFCR_EL1_DEFAULT_CONFIG with BRBFCR_EL1_CONFIG_MASK
- Added BRBFCR_EL1_CONFIG_MASK masking in branch_type_to_brbfcr()
- Moved BRBE helpers from arm_brbe.h into arm_brbe.c
- Moved armv8_pmu_xxx() declaration inside arm_brbe.h for arm64 (CONFIG_ARM64_BRBE)
- Moved armv8_pmu_xxx() stub definitions inside arm_brbe.h for arm32 (!CONFIG_ARM64_BRBE)
- Included arm_brbe.h header both in arm_pmuv3.c and arm_brbe.c
- Dropped BRBE custom pr_fmt()
- Dropped CONFIG_PERF_EVENTS wrapping from header entries
- Flush branch records when a cpu bound event follows a task bound event
- Dropped BRBFCR_EL1 from __debug_save_brbe()/__debug_restore_brbe()
- Always save the live SYS_BRBCR_EL1 in host context and then check if
  BRBE was enabled before resetting SYS_BRBCR_EL1 for the host

Changes in V14:

https://lore.kernel.org/all/20231114051329.327572-1-anshuman.khandual@arm.com/

- This series has been reorganised as suggested during V13
- There are just eight patches now i.e 5 enablement and 3 perf branch tests

- Fixed brackets problem in __SYS_BRBINFO/BRBSRC/BRBTGT() macros
- Renamed the macro i.e s/__SYS_BRBINFO/__SYS_BRBINF/
- Renamed s/BRB_IALL/BRB_IALL_INSN and s/BRBE_INJ/BRB_INJ_INSN
- Moved BRB_IALL_INSN and SYS_BRB_INSN instructions to sysreg patch
- Changed E1BRE as ExBRE in sysreg fields inside BRBCR_ELx
- Used BRBCR_ELx for defining all BRBCR_EL1, BRBCR_EL2, and BRBCR_EL12 (new)

- Folded the following three patches into a single patch i.e [PATCH 3/8]

  drivers: perf: arm_pmu: Add new sched_task() callback
  arm64/perf: Add branch stack support in struct arm_pmu
  arm64/perf: Add branch stack support in struct pmu_hw_events
  arm64/perf: Add branch stack support in ARMV8 PMU
  arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack()

- All armv8pmu_branch_xxxx() stub definitions have been moved inside
  include/linux/perf/arm_pmuv3.h for easy access from both arm32 and arm64
- Added brbe_users, brbe_context and brbe_sample_type in struct pmu_hw_events
- Added comments for all the above new elements in struct pmu_hw_events
- Added branch_reset() and sched_task() callbacks
- Changed and optimized branch records processing during a PMU IRQ
- NO branch records get captured for event with mismatched brbe_sample_type
- Branch record context is tracked from armpmu_del() & armpmu_add()
- Branch record hardware is driven from armv8pmu_start() & armv8pmu_stop()
- Dropped NULL check for 'pmu_ctx' inside armv8pmu_sched_task()
- Moved down PERF_ATTACH_TASK_DATA assignment with a preceding comment
- In conflicting branch sample type requests, first event takes precedence

- Folded the following five patches from V13 into a single patch i.e
  [PATCH 4/8]

  arm64/perf: Enable branch stack events via FEAT_BRBE
  arm64/perf: Add struct brbe_regset helper functions
  arm64/perf: Implement branch records save on task sched out
  arm64/perf: Implement branch records save on PMU IRQ

- Fixed the year in copyright statement
- Added Documentation/arch/arm64/brbe.rst
- Updated Documentation/arch/arm64/booting.rst (BRBCR_EL2.CC for EL1 entry)
- Added __init_el2_brbe() which enables branch record cycle count support
- Disabled EL2 traps in __init_el2_fgt() while accessing BRBE registers and
  executing instructions
- Changed CONFIG_ARM64_BRBE user visible description
- Fixed a typo in CONFIG_ARM64_BRBE config option description text
- Added BUILD_BUG_ON() co-relating BRBE_BANK_MAX_ENTRIES and MAX_BRANCH_RECORDS
- Dropped arm64_create_brbe_task_ctx_kmem_cache()
- Moved down comment for PERF_SAMPLE_BRANCH_KERNEL in branch_type_to_brbcr()
- Renamed BRBCR_ELx_DEFAULT_CONFIG as BRBCR_ELx_CONFIG_MASK
- Replaced BRBCR_ELx_DEFAULT_TS with BRBCR_ELx_TS_MASK in BRBCR_ELx_CONFIG_MASK
- Replaced BRBCR_ELx_E1BRE instances with BRBCR_ELx_ExBRE

- Added BRBE specific branch stack sampling perf test patches into the series
- Added a patch to prevent guest accesses into BRBE registers and instructions
- Added a patch to save the BRBE host context in NVHE environment
- Updated most commit messages

Changes in V13:

https://lore.kernel.org/all/20230711082455.215983-1-anshuman.khandual@arm.com/
https://lore.kernel.org/all/20230622065351.1092893-1-anshuman.khandual@arm.com/

- Added branch callback stubs for aarch32 pmuv3 based platforms
- Updated the comments for capture_brbe_regset()
- Deleted the comments in __read_brbe_regset()
- Reversed the arguments order in capture_brbe_regset() and brbe_branch_save()
- Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read()
- Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset()

Changes in V12:

https://lore.kernel.org/all/20230615133239.442736-1-anshuman.khandual@arm.com/

- Replaced branch types with complete DIRECT/INDIRECT prefixes/suffixes
- Replaced branch types with complete INSN/ALIGN prefixes/suffixes
- Replaced return branch types as simple RET/ERET
- Replaced time field GST_PHYSICAL as GUEST_PHYSICAL
- Added 0 padding for BRBIDR0_EL1.NUMREC enum values
- Dropped helper arm_pmu_branch_stack_supported()
- Renamed armv8pmu_branch_valid() as armv8pmu_branch_attr_valid()
- Separated perf_task_ctx_cache setup from arm_pmu private allocation
- Collected changes to branch_records_alloc() in a single patch [5/10]
- Reworked and cleaned up branch_records_alloc()
- Reworked armv8pmu_branch_read() with new loop iterations in patch [6/10]
- Reworked capture_brbe_regset() with new loop iterations in patch [8/10]
- Updated the comment in branch_type_to_brbcr()
- Fixed the comment before stitch_stored_live_entries()
- Fixed BRBINFINJ_EL1 definition for VALID_FULL enum field
- Factored out helper __read_brbe_regset() from capture_brbe_regset()
- Dropped the helper copy_brbe_regset()
- Simplified stitch_stored_live_entries() with memcpy(), memmove()
- Reworked armv8pmu_probe_pmu() to bail out early with !probe.present
- Rework brbe_attributes_probe() without 'struct brbe_hw_attr'
- Dropped 'struct brbe_hw_attr' argument from capture_brbe_regset()
- Dropped 'struct brbe_hw_attr' argument from brbe_branch_save()
- Dropped arm_pmu->private and added arm_pmu->reg_trbidr instead

Changes in V11:

https://lore.kernel.org/all/20230531040428.501523-1-anshuman.khandual@arm.com/

- Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data

Changes in V10:

https://lore.kernel.org/all/20230517022410.722287-1-anshuman.khandual@arm.com/

- Rebased the series on v6.4-rc2
- Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c
- Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h]
- Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq()

Changes in V9:

https://lore.kernel.org/all/20230315051444.1683170-1-anshuman.khandual@arm.com/

- Fixed build problem with has_branch_stack() in arm64 header
- BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields'
- Renamed all BRBINF_EL1 call sites as BRBINFx_EL1
- Dropped static const char branch_filter_error_msg[]
- Implemented a positive list check for BRBE supported perf branch filters
- Added a comment in armv8pmu_handle_irq()
- Implemented per-cpu allocation for struct branch_record records
- Skipped looping through bank 1 if an invalid record is detected in bank 0
- Added comment in armv8pmu_branch_read() explaining prohibited region etc
- Added comment warning about erroneously marking transactions as aborted
- Replaced the first argument (perf_branch_entry) in capture_brbe_flags()
- Dropped the last argument (idx) in capture_brbe_flags()
- Dropped the brbcr argument from capture_brbe_flags()
- Used perf_sample_save_brstack() to capture branch records for perf_sample_data
- Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces
- Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read()
- Implemented event task context based branch records save mechanism

Changes in V8:

https://lore.kernel.org/all/20230123125956.1350336-1-anshuman.khandual@arm.com/

- Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper
- Added a comment and line break before arm_pmu->private element
- Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]()
- Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event()
- Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP()
- Changed brbe_hw_attr->brbe_version from 'bool' to 'int'
- Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]()
- Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid()
- Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings
- Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags()
- Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type()
- Renamed brbe_valid() as brbe_record_is_complete()
- Renamed brbe_source() as brbe_record_is_source_only()
- Renamed brbe_target() as brbe_record_is_target_only()
- Inverted checks for !brbe_record_is_[target|source]_only() for info capture
- Replaced 'fetch' with 'get' in all helpers that extract field value
- Dropped 'static int brbe_current_bank' optimization in select_brbe_bank()
- Dropped select_brbe_bank_index() completely, added capture_branch_entry()
- Process captured branch entries in two separate loops one for each BRBE bank
- Moved branch_records_alloc() inside armv8pmu_probe_pmu()
- Added a forward declaration for the helper has_branch_stack()
- Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free()
- Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call

Changes in V7:

https://lore.kernel.org/all/20230105031039.207972-1-anshuman.khandual@arm.com/

- Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event
- Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header
- Defined BRBFCR_EL1_DEFAULT_CONFIG in the header
- Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP
- Defined BRBCR_EL1_DEFAULT_TS in the header
- Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS
- Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr()
- Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr()
- Also set BRBE in paused state in armv8pmu_branch_disable()
- Dropped brbe_paused(), set_brbe_paused() helpers
- Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid()
- Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr
- Added valid_brbe_[cc, format, version]() helpers
- Split a separate brbe_attributes_probe() from armv8pmu_branch_probe()
- Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid()
- Defined enum brbe_bank_idx with possible values for BRBE bank indices
- Changed armpmu->hw_attr into armpmu->private
- Added missing space in stub definition for armv8pmu_branch_valid()
- Replaced both kmalloc() with kzalloc()
- Added BRBE_BANK_MAX_ENTRIES
- Updated comment for capture_brbe_flags()
- Updated comment for struct brbe_hw_attr
- Dropped space after type cast in couple of places
- Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read()
- Captured cpuc->branches->branch_entries[idx] in a local variable
- Dropped saved_priv from armv8pmu_branch_read()
- Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration
- Replaced with FIELD_GET() and FIELD_PREP() wherever applicable
- Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL
- Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version()
  select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation
- Reorganized brbe_valid_nr() and dropped the pr_warn() message
- Changed probe sequence in brbe_attributes_probe()
- Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state
- Disable BRBE before disabling the PMU event counter
- Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode()
- Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported()

Changes in V6:

https://lore.kernel.org/linux-arm-kernel/20221208084402.863310-1-anshuman.khandual@arm.com/

- Restore the exception level privilege after reading the branch records
- Unpause the buffer after reading the branch records
- Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level
- Reworked BRBE implementation and branch stack sampling support on arm pmu
- BRBE implementation is now part of overall ARMV8 PMU implementation
- BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/
- CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig
- File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c
- File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h
- BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events
- BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events
- BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation
- Added sched_task() callback into struct arm_pmu
- Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes
- Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu
- Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events
- Dropped brbe_users, brbe_context tracking in struct hw_pmu_events
- Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag
- armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation
- Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe
- Added armv8pmu_branch_reset() inside armv8pmu_branch_enable()
- Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK
- Dropped set_brbe_disabled() as well
- Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events

Changes in V5:

https://lore.kernel.org/linux-arm-kernel/20221107062514.2851047-1-anshuman.khandual@arm.com/

- Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01
- Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI
- Changed config ARM_BRBE_PMU from 'tristate' to 'bool'

Changes in V4:

https://lore.kernel.org/all/20221017055713.451092-1-anshuman.khandual@arm.com/

- Changed ../tools/sysreg declarations as suggested
- Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags
- Dropped perfmon_capable() check in armpmu_event_init()
- s/pr_warn_once/pr_info in armpmu_event_init()
- Added brbe_format element into struct pmu_hw_events
- Changed v1p1 as brbe_v1p1 in struct pmu_hw_events
- Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning

Changes in V3:

https://lore.kernel.org/all/20220929075857.158358-1-anshuman.khandual@arm.com/

- Moved brbe_stack from the stack and now dynamically allocated
- Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv()
- Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg
- Created dummy BRBINF_EL1 field definitions in tools/sysreg
- Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable()
- Both exception and exception return branche records are now captured
  only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already
  been checked in generic perf via perf_allow_kernel()

Changes in V2:

https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@arm.com/

- Dropped branch sample filter helpers consolidation patch from this series
- Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
- Use cached perfmon_capable() while configuring BRBE branch record filters

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/

- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
- Process new perf branch types via PERF_BR_EXTEND_ABI

Changes in RFC V2:

https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/

- Added branch_sample_priv() while consolidating other branch sample filter helpers
- Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
- Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
- Added documentation for struct arm_pmu changes, updated commit message
- Updated commit message for BRBE detection infrastructure patch
- PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
- Branch privilege state capture mechanism has now moved inside the driver

Changes in RFC V1:

https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (6):
  arm64/sysreg: Add BRBE registers and fields
  KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
  drivers: perf: arm_pmu: Add infrastructure for branch stack sampling
  arm64/boot: Enable EL2 requirements for BRBE
  drivers: perf: arm_pmuv3: Enable branch stack sampling via FEAT_BRBE
  KVM: arm64: nvhe: Disable branch generation in nVHE guests

James Clark (3):
  perf: test: Speed up running brstack test on an Arm model
  perf: test: Remove empty lines from branch filter test output
  perf: test: Extend branch stack sampling test for Arm64 BRBE

 Documentation/arch/arm64/booting.rst   |   23 +-
 arch/arm64/include/asm/el2_setup.h     |   87 +-
 arch/arm64/include/asm/kvm_host.h      |    3 +
 arch/arm64/include/asm/sysreg.h        |   17 +-
 arch/arm64/kvm/debug.c                 |    5 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c     |   31 +
 arch/arm64/kvm/sys_regs.c              |   56 ++
 arch/arm64/tools/sysreg                |  131 +++
 drivers/perf/Kconfig                   |   11 +
 drivers/perf/Makefile                  |    1 +
 drivers/perf/arm_brbe.c                | 1198 ++++++++++++++++++++++++
 drivers/perf/arm_pmu.c                 |   42 +-
 drivers/perf/arm_pmuv3.c               |  160 +++-
 drivers/perf/arm_pmuv3_branch.h        |   83 ++
 include/linux/perf/arm_pmu.h           |   37 +-
 tools/perf/tests/builtin-test.c        |    1 +
 tools/perf/tests/shell/test_brstack.sh |   57 +-
 tools/perf/tests/tests.h               |    1 +
 tools/perf/tests/workloads/Build       |    2 +
 tools/perf/tests/workloads/traploop.c  |   39 +
 20 files changed, 1959 insertions(+), 26 deletions(-)
 create mode 100644 drivers/perf/arm_brbe.c
 create mode 100644 drivers/perf/arm_pmuv3_branch.h
 create mode 100644 tools/perf/tests/workloads/traploop.c