[RFC,V3,00/29] Support of Virtual CPU Hotplug for ARMv8 Arch

Message ID	20240613233639.202896-1-salil.mehta@huawei.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> To: <qemu-devel@nongnu.org>, <qemu-arm@nongnu.org>, <mst@redhat.com> CC: <salil.mehta@huawei.com>, <maz@kernel.org>, <jean-philippe@linaro.org>, <jonathan.cameron@huawei.com>, <lpieralisi@kernel.org>, <peter.maydell@linaro.org>, <richard.henderson@linaro.org>, <imammedo@redhat.com>, <andrew.jones@linux.dev>, <david@redhat.com>, <philmd@linaro.org>, <eric.auger@redhat.com>, <will@kernel.org>, <ardb@kernel.org>, <oliver.upton@linux.dev>, <pbonzini@redhat.com>, <gshan@redhat.com>, <rafael@kernel.org>, <borntraeger@linux.ibm.com>, <alex.bennee@linaro.org>, <npiggin@gmail.com>, <harshpb@linux.ibm.com>, <linux@armlinux.org.uk>, <darren@os.amperecomputing.com>, <ilkka@os.amperecomputing.com>, <vishnu@os.amperecomputing.com>, <karl.heubaum@oracle.com>, <miguel.luis@oracle.com>, <salil.mehta@opnsrc.net>, <zhukeqian1@huawei.com>, <wangxiongfeng2@huawei.com>, <wangyanan55@huawei.com>, <jiakernel2@gmail.com>, <maobibo@loongson.cn>, <lixianglai@loongson.cn>, <shahuang@redhat.com>, <zhao1.liu@intel.com>, <linuxarm@huawei.com> Subject: [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch Date: Fri, 14 Jun 2024 00:36:10 +0100 Message-ID: <20240613233639.202896-1-salil.mehta@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=185.176.79.56; envelope-from=salil.mehta@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Reply-to: Salil Mehta <salil.mehta@huawei.com> From: Salil Mehta via <qemu-devel@nongnu.org> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	Support of Virtual CPU Hotplug for ARMv8 Arch \| expand [RFC,V3,00/29] Support of Virtual CPU Hotplug for ARMv8 Arch [RFC,V3,01/29] arm/virt, target/arm: Add new ARMCPU {socket, cluster, core, thread}-id property [RFC,V3,02/29] cpu-common: Add common CPU utility for possible vCPUs [RFC,V3,03/29] hw/arm/virt: Limit number of possible vCPUs for unsupported Accel or GIC Type [RFC,V3,04/29] hw/arm/virt: Move setting of common CPU properties in a function [RFC,V3,05/29] arm/virt, target/arm: Machine init time change common to vCPU {cold\|hot}-plug [RFC,V3,06/29] arm/virt, kvm: Pre-create disabled possible vCPUs @machine init [RFC,V3,07/29] arm/virt, gicv3: Changes to pre-size GIC with possible vcpus @machine init [RFC,V3,08/29] arm/virt: Init PMU at host for all possible vcpus [RFC,V3,09/29] arm/acpi: Enable ACPI support for vcpu hotplug [RFC,V3,10/29] arm/virt: Add cpu hotplug events to GED during creation [RFC,V3,11/29] arm/virt: Create GED dev before disabled CPU Objs are destroyed [RFC,V3,12/29] arm/virt/acpi: Build CPUs AML with CPU Hotplug support [RFC,V3,13/29] arm/virt: Make ARM vCPU present status ACPI persistent [RFC,V3,14/29] hw/acpi: ACPI/AML Changes to reflect the correct _STA.{PRES, ENA} Bits to Guest [RFC,V3,15/29] hw/arm: MADT Tbl change to size the guest with possible vCPUs [RFC,V3,16/29] hw/acpi: Make _MAT method optional [RFC,V3,17/29] arm/virt: Release objects for disabled possible vCPUs after init [RFC,V3,18/29] arm/virt: Add/update basic hot-(un)plug framework [RFC,V3,19/29] arm/virt: Changes to (un)wire GICC<->vCPU IRQs during hot-(un)plug [RFC,V3,20/29] hw/arm, gicv3: Changes to update GIC with vCPU hot-plug notification [RFC,V3,21/29] hw/intc/arm-gicv3: Changes required to (re)init the vCPU register info [RFC,V3,22/29] arm/virt: Update the guest(via GED) about CPU hot-(un)plug events [RFC,V3,23/29] hw/arm: Changes required for reset and to support next boot [RFC,V3,24/29] target/arm: Add support of unrealize* ARMCPU during vCPU Hot-unplug [RFC,V3,25/29] target/arm/kvm: Write CPU state back to KVM on reset [RFC,V3,26/29] target/arm/kvm, tcg: Register/Handle SMCCC hypercall exits to VMM/Qemu [RFC,V3,27/29] hw/arm: Support hotplug capability check using _OSC method [RFC,V3,28/29] tcg/mttcg: enable threads to unregister in tcg_ctxs[] [RFC,V3,29/29] hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled

Salil Mehta June 13, 2024, 11:36 p.m. UTC

PROLOGUE
========

To assist in review and set the right expectations from this RFC, please first
read the sections *APPENDED AT THE END* of this cover letter:

1. Important *DISCLAIMER* [Section (X)]
2. Work presented at KVMForum Conference (slides available) [Section (V)F]
3. Organization of patches [Section (XI)]
4. References [Section (XII)]
5. Detailed TODO list of leftover work or work-in-progress [Section (IX)]

There has been interest shown by other organizations in adapting this series
for their architecture. Hence, RFC V2 [21] has been split into architecture
*agnostic* [22] and *specific* patch sets.

This is an ARM architecture-specific patch set carved out of RFC V2. Please
check section (XI)B for details of architecture agnostic patches.

SECTIONS [I - XIII] are as follows:

(I) Key Changes [details in last section (XIV)]
==============================================

RFC V2 -> RFC V3

1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3) patch sets.
2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang (RedHat), Philippe Mathieu-Daudé (Linaro),
   Jonathan Cameron (Huawei), Zhao Liu (Intel).

RFC V1 -> RFC V2

RFC V1: https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/

1. ACPI MADT Table GIC CPU Interface can now be presented [6] as ACPI
   *online-capable* or *enabled* to the Guest OS at boot time. This means
   associated CPUs can have ACPI _STA as *enabled* or *disabled* even after boot.
   See UEFI ACPI 6.5 Spec, Section 05, Table 5.37 GICC CPU Interface Flags[20].
2. SMCC/HVC Hypercall exit handling in userspace/Qemu for PSCI CPU_{ON,OFF}
   request. This is required to {dis}allow online'ing a vCPU.
3. Always presenting unplugged vCPUs in CPUs ACPI AML code as ACPI _STA.PRESENT 
   to the Guest OS. Toggling ACPI _STA.Enabled to give an effect of the
   hot{un}plug.
4. Live Migration works (some issues are still there).
5. TCG/HVF/qtest does not support Hotplug and falls back to default.
6. Code for TCG support exists in this release (it is a work-in-progress).
7. ACPI _OSC method can now be used by OSPM to negotiate Qemu VM platform
   hotplug capability (_OSC Query support still pending).
8. Misc. Bug fixes.

(II) Summary
============

This patch set introduces virtual CPU hotplug support for the ARMv8 architecture
in QEMU. The idea is to be able to hotplug and hot-unplug vCPUs while the guest VM
is running, without requiring a reboot. This does *not* make any assumptions about
the physical CPU hotplug availability within the host system but rather tries to
solve the problem at the virtualizer/QEMU layer. It introduces ACPI CPU hotplug hooks
and event handling to interface with the guest kernel, and code to initialize, plug,
and unplug CPUs. No changes are required within the host kernel/KVM except the
support of hypercall exit handling in the user-space/Qemu, which has recently
been added to the kernel. Corresponding guest kernel changes have been
posted on the mailing list [3] [4] by James Morse.

(III) Motivation
================

This allows scaling the guest VM compute capacity on-demand, which would be
useful for the following example scenarios:

1. Vertical Pod Autoscaling [9][10] in the cloud: Part of the orchestration
   framework that could adjust resource requests (CPU and Mem requests) for
   the containers in a pod, based on usage.
2. Pay-as-you-grow Business Model: Infrastructure providers could allocate and
   restrict the total number of compute resources available to the guest VM
   according to the SLA (Service Level Agreement). VM owners could request more
   compute to be hot-plugged for some cost.

For example, Kata Container VM starts with a minimum amount of resources (i.e.,
hotplug everything approach). Why?

1. Allowing faster *boot time* and
2. Reduction in *memory footprint*

Kata Container VM can boot with just 1 vCPU, and then later more vCPUs can be
hot-plugged as needed.

(IV) Terminology
================

(*) Possible CPUs: Total vCPUs that could ever exist in the VM. This includes
                   any cold-booted CPUs plus any CPUs that could be later
                   hot-plugged.
                   - Qemu parameter (-smp maxcpus=N)
(*) Present CPUs:  Possible CPUs that are ACPI 'present'. These might or might
                   not be ACPI 'enabled'. 
                   - Present vCPUs = Possible vCPUs (Always on ARM Arch)
(*) Enabled CPUs:  Possible CPUs that are ACPI 'present' and 'enabled' and can
                   now be ‘onlined’ (PSCI) for use by the Guest Kernel. All cold-
                   booted vCPUs are ACPI 'enabled' at boot. Later, using
                   device_add, more vCPUs can be hotplugged and made ACPI
                   'enabled'.
                   - Qemu parameter (-smp cpus=N). Can be used to specify some
	           cold-booted vCPUs during VM init. Some can be added using the
	           '-device' option.

(V) Constraints Due to ARMv8 CPU Architecture [+] Other Impediments
===================================================================

A. Physical Limitation to Support CPU Hotplug: (Architectural Constraint)
   1. ARMv8 CPU architecture does not support the concept of the physical CPU
      hotplug. 
      a. There are many per-CPU components like PMU, SVE, MTE, Arch timers, etc.,
         whose behavior needs to be clearly defined when the CPU is hot(un)plugged.
         There is no specification for this.

   2. Other ARM components like GIC, etc., have not been designed to realize
      physical CPU hotplug capability as of now. For example,
      a. Every physical CPU has a unique GICC (GIC CPU Interface) by construct.
         Architecture does not specify what CPU hot(un)plug would mean in
         context to any of these.
      b. CPUs/GICC are physically connected to unique GICR (GIC Redistributor).
         GIC Redistributors are always part of the always-on power domain. Hence,
         they cannot be powered off as per specification.

B. Impediments in Firmware/ACPI (Architectural Constraint)

   1. Firmware has to expose GICC, GICR, and other per-CPU features like PMU,
      SVE, MTE, Arch Timers, etc., to the OS. Due to the architectural constraint
      stated in section A1(a), all interrupt controller structures of
      MADT describing GIC CPU Interfaces and the GIC Redistributors MUST be
      presented by firmware to the OSPM during boot time.
   2. Architectures that support CPU hotplug can evaluate the ACPI _MAT method to
      get this kind of information from the firmware even after boot, and the
      OSPM has the capability to process these. ARM kernel uses information in MADT
      interrupt controller structures to identify the number of present CPUs during
      boot and hence does not allow to change these after boot. The number of
      present CPUs cannot be changed. It is an architectural constraint!

C. Impediments in KVM to Support Virtual CPU Hotplug (Architectural Constraint)

   1. KVM VGIC:
      a. Sizing of various VGIC resources like memory regions, etc., related to
         the redistributor happens only once and is fixed at the VM init time
         and cannot be changed later after initialization has happened.
         KVM statically configures these resources based on the number of vCPUs
         and the number/size of redistributor ranges.
      b. Association between vCPU and its VGIC redistributor is fixed at the
         VM init time within the KVM, i.e., when redistributor iodevs gets
         registered. VGIC does not allow to setup/change this association
         after VM initialization has happened. Physically, every CPU/GICC is
         uniquely connected with its redistributor, and there is no
         architectural way to set this up.
   2. KVM vCPUs:
      a. Lack of specification means destruction of KVM vCPUs does not exist as
         there is no reference to tell what to do with other per-vCPU
         components like redistributors, arch timer, etc.
      b. In fact, KVM does not implement the destruction of vCPUs for any
         architecture. This is independent of whether the architecture
         actually supports CPU Hotplug feature. For example, even for x86 KVM
         does not implement the destruction of vCPUs.

D. Impediments in Qemu to Support Virtual CPU Hotplug (KVM Constraints->Arch)

   1. Qemu CPU Objects MUST be created to initialize all the Host KVM vCPUs to
      overcome the KVM constraint. KVM vCPUs are created and initialized when Qemu
      CPU Objects are realized. But keeping the QOM CPU objects realized for
      'yet-to-be-plugged' vCPUs can create problems when these new vCPUs shall
      be plugged using device_add and a new QOM CPU object shall be created.
   2. GICV3State and GICV3CPUState objects MUST be sized over *possible vCPUs*
      during VM init time while QOM GICV3 Object is realized. This is because
      KVM VGIC can only be initialized once during init time. But every
      GICV3CPUState has an associated QOM CPU Object. Later might correspond to
      vCPU which are 'yet-to-be-plugged' (unplugged at init).
   3. How should new QOM CPU objects be connected back to the GICV3CPUState
      objects and disconnected from it in case the CPU is being hot(un)plugged?
   4. How should 'unplugged' or 'yet-to-be-plugged' vCPUs be represented in the
      QOM for which KVM vCPU already exists? For example, whether to keep,
       a. No QOM CPU objects Or
       b. Unrealized CPU Objects
   5. How should vCPU state be exposed via ACPI to the Guest? Especially for
      the unplugged/yet-to-be-plugged vCPUs whose CPU objects might not exist
      within the QOM but the Guest always expects all possible vCPUs to be
      identified as ACPI *present* during boot.
   6. How should Qemu expose GIC CPU interfaces for the unplugged or
      yet-to-be-plugged vCPUs using ACPI MADT Table to the Guest?

E. Summary of Approach ([+] Workarounds to problems in sections A, B, C & D)

   1. At VM Init, pre-create all the possible vCPUs in the Host KVM i.e., even
      for the vCPUs which are yet-to-be-plugged in Qemu but keep them in the
      powered-off state.
   2. After the KVM vCPUs have been initialized in the Host, the KVM vCPU
      objects corresponding to the unplugged/yet-to-be-plugged vCPUs are parked
      at the existing per-VM "kvm_parked_vcpus" list in Qemu. (similar to x86)
   3. GICV3State and GICV3CPUState objects are sized over possible vCPUs during
      VM init time i.e., when Qemu GIC is realized. This, in turn, sizes KVM VGIC
      resources like memory regions, etc., related to the redistributors with the
      number of possible KVM vCPUs. This never changes after VM has initialized.
   4. Qemu CPU objects corresponding to unplugged/yet-to-be-plugged vCPUs are
      released post Host KVM CPU and GIC/VGIC initialization.
   5. Build ACPI MADT Table with the following updates:
      a. Number of GIC CPU interface entries (=possible vCPUs)
      b. Present Boot vCPU as MADT.GICC.Enabled=1 (Not hot[un]pluggable) 
      c. Present hot(un)pluggable vCPUs as MADT.GICC.online-capable=1  
         - MADT.GICC.Enabled=0 (Mutually exclusive) [6][7]
	 - vCPU can be ACPI enabled+onlined after Guest boots (Firmware Policy) 
	 - Some issues with above (details in later sections)
   6. Expose below ACPI Status to Guest kernel:
      a. Always _STA.Present=1 (all possible vCPUs)
      b. _STA.Enabled=1 (plugged vCPUs)
      c. _STA.Enabled=0 (unplugged vCPUs)
   7. vCPU hotplug *realizes* new QOM CPU object. The following happens:
      a. Realizes, initializes QOM CPU Object & spawns Qemu vCPU thread.
      b. Unparks the existing KVM vCPU ("kvm_parked_vcpus" list).
         - Attaches to QOM CPU object.
      c. Reinitializes KVM vCPU in the Host.
         - Resets the core and sys regs, sets defaults, etc.
      d. Runs KVM vCPU (created with "start-powered-off").
	 - vCPU thread sleeps (waits for vCPU reset via PSCI). 
      e. Updates Qemu GIC.
         - Wires back IRQs related to this vCPU.
         - GICV3CPUState association with QOM CPU Object.
      f. Updates [6] ACPI _STA.Enabled=1.
      g. Notifies Guest about the new vCPU (via ACPI GED interface).
	 - Guest checks _STA.Enabled=1.
	 - Guest adds processor (registers CPU with LDM) [3].
      h. Plugs the QOM CPU object in the slot.
         - slot-number = cpu-index {socket, cluster, core, thread}.
      i. Guest online's vCPU (CPU_ON PSCI call over HVC/SMC).
         - KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
         - Qemu powers-on KVM vCPU in the Host.
   8. vCPU hot-unplug *unrealizes* QOM CPU Object. The following happens:
      a. Notifies Guest (via ACPI GED interface) vCPU hot-unplug event.
         - Guest offline's vCPU (CPU_OFF PSCI call over HVC/SMC).
      b. KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
         - Qemu powers-off the KVM vCPU in the Host.
      c. Guest signals *Eject* vCPU to Qemu.
      d. Qemu updates [6] ACPI _STA.Enabled=0.
      e. Updates GIC.
         - Un-wires IRQs related to this vCPU.
         - GICV3CPUState association with new QOM CPU Object is updated.
      f. Unplugs the vCPU.
	 - Removes from slot.
         - Parks KVM vCPU ("kvm_parked_vcpus" list).
         - Unrealizes QOM CPU Object & joins back Qemu vCPU thread.
	 - Destroys QOM CPU object.
      g. Guest checks ACPI _STA.Enabled=0.
         - Removes processor (unregisters CPU with LDM) [3].

F. Work Presented at KVM Forum Conferences:
==========================================

Details of the above work have been presented at KVMForum2020 and KVMForum2023
conferences. Slides & video are available at the links below:
a. KVMForum 2023
   - Challenges Revisited in Supporting Virt CPU Hotplug on architectures that don't Support CPU Hotplug (like ARM64).
     https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
     https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
     https://www.youtube.com/watch?v=hyrw4j2D6I0&t=23970s
     https://kvm-forum.qemu.org/2023/talk/9SMPDQ/
b. KVMForum 2020
   - Challenges in Supporting Virtual CPU Hotplug on SoC Based Systems (like ARM64) - Salil Mehta, Huawei.
     https://sched.co/eE4m

(VI) Commands Used
==================

A. Qemu launch commands to init the machine:

    $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
      -cpu host -smp cpus=4,maxcpus=6 \
      -m 300M \
      -kernel Image \
      -initrd rootfs.cpio.gz \
      -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
      -nographic \
      -bios QEMU_EFI.fd \

B. Hot-(un)plug related commands:

  # Hotplug a host vCPU (accel=kvm):
    $ device_add host-arm-cpu,id=core4,core-id=4

  # Hotplug a vCPU (accel=tcg):
    $ device_add cortex-a57-arm-cpu,id=core4,core-id=4

  # Delete the vCPU:
    $ device_del core4

Sample output on guest after boot:

    $ cat /sys/devices/system/cpu/possible
    0-5
    $ cat /sys/devices/system/cpu/present
    0-5
    $ cat /sys/devices/system/cpu/enabled
    0-3
    $ cat /sys/devices/system/cpu/online
    0-1
    $ cat /sys/devices/system/cpu/offline
    2-5

Sample output on guest after hotplug of vCPU=4:

    $ cat /sys/devices/system/cpu/possible
    0-5
    $ cat /sys/devices/system/cpu/present
    0-5
    $ cat /sys/devices/system/cpu/enabled
    0-4
    $ cat /sys/devices/system/cpu/online
    0-1,4
    $ cat /sys/devices/system/cpu/offline
    2-3,5

    Note: vCPU=4 was explicitly 'onlined' after hot-plug
    $ echo 1 > /sys/devices/system/cpu/cpu4/online

(VII) Latest Repository
=======================

(*) Latest Qemu RFC V3 (Architecture Specific) patch set:
    https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v3
(*) Latest Qemu V13 (Architecture Agnostic) patch set:
    https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v3.arch.agnostic.v13
(*) QEMU changes for vCPU hotplug can be cloned from below site:
    https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2
(*) Guest Kernel changes (by James Morse, ARM) are available here:
    https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git virtual_cpu_hotplug/rfc/v2
(*) Leftover patches of the kernel are available here:
    https://lore.kernel.org/lkml/20240529133446.28446-1-Jonathan.Cameron@huawei.com/
    https://github.com/salil-mehta/linux/commits/virtual_cpu_hotplug/rfc/v6.jic/ (not latest)

(VIII) KNOWN ISSUES
===================

1. Migration has been lightly tested but has been found working.
2. TCG is broken.
3. HVF and qtest are not supported yet.
4. ACPI MADT Table flags [7] MADT.GICC.Enabled and MADT.GICC.online-capable are
   mutually exclusive, i.e., as per the change [6], a vCPU cannot be both
   GICC.Enabled and GICC.online-capable. This means:
      [ Link: https://bugzilla.tianocore.org/show_bug.cgi?id=3706 ]
   a. If we have to support hot-unplug of the cold-booted vCPUs, then these MUST
      be specified as GICC.online-capable in the MADT Table during boot by the
      firmware/Qemu. But this requirement conflicts with the requirement to
      support new Qemu changes with legacy OS that don't understand
      MADT.GICC.online-capable Bit. Legacy OS during boot time will ignore this
      bit, and hence these vCPUs will not appear on such OS. This is unexpected
      behavior.
   b. In case we decide to specify vCPUs as MADT.GICC.Enabled and try to unplug
      these cold-booted vCPUs from OS (which in actuality should be blocked by
      returning error at Qemu), then features like 'kexec' will break.
   c. As I understand, removal of the cold-booted vCPUs is a required feature
      and x86 world allows it.
   d. Hence, either we need a specification change to make the MADT.GICC.Enabled
      and MADT.GICC.online-capable Bits NOT mutually exclusive or NOT support
      the removal of cold-booted vCPUs. In the latter case, a check can be introduced
      to bar the users from unplugging vCPUs, which were cold-booted, using QMP
      commands. (Needs discussion!)
      Please check the patch part of this patch set:
      [hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled].
   
      NOTE: This is definitely not a blocker!
5. Code related to the notification to GICV3 about the hot(un)plug of a vCPU event
   might need further discussion.


(IX) THINGS TO DO
=================

1. Fix issues related to TCG/Emulation support. (Not a blocker)
2. Comprehensive Testing is in progress. (Positive feedback from Oracle & Ampere)
3. Qemu Documentation (.rst) needs to be updated.
4. Fix qtest, HVF Support (Future).
5. Update the design issue related to ACPI MADT.GICC flags discussed in known
   issues. This might require UEFI ACPI specification change (Not a blocker).
6. Add ACPI _OSC 'Query' support. Only part of _OSC support exists now. (Not a blocker).

The above is *not* a complete list. Will update later!

Best regards,  
Salil.

(X) DISCLAIMER
==============

This work is an attempt to present a proof-of-concept of the ARM64 vCPU hotplug
implementation to the community. This is *not* production-level code and might
have bugs. Comprehensive testing is being done on HiSilicon Kunpeng920 SoC,
Oracle, and Ampere servers. We are nearing stable code and a non-RFC
version shall be floated soon.

This work is *mostly* in the lines of the discussions that have happened in the
previous years [see refs below] across different channels like the mailing list,
Linaro Open Discussions platform, and various conferences like KVMForum, etc. This
RFC is being used as a way to verify the idea mentioned in this cover letter and
to get community views. Once this has been agreed upon, a formal patch shall be
posted to the mailing list for review.

[The concept being presented has been found to work!]

(XI) ORGANIZATION OF PATCHES
============================
 
A. Architecture *specific* patches:

   [Patch 1-8, 17, 27, 29] logic required during machine init.
    (*) Some validation checks.
    (*) Introduces core-id property and some util functions required later.
    (*) Logic to pre-create vCPUs.
    (*) GIC initialization pre-sized with possible vCPUs.
    (*) Some refactoring to have common hot and cold plug logic together.
    (*) Release of disabled QOM CPU objects in post_cpu_init().
    (*) Support of ACPI _OSC method to negotiate platform hotplug capabilities.
   [Patch 9-16] logic related to ACPI at machine init time.
    (*) Changes required to Enable ACPI for CPU hotplug.
    (*) Initialization of ACPI GED framework to cater to CPU Hotplug Events.
    (*) ACPI MADT/MAT changes.
   [Patch 18-26] logic required during vCPU hot-(un)plug.
    (*) Basic framework changes to support vCPU hot-(un)plug.
    (*) ACPI GED changes for hot-(un)plug hooks.
    (*) Wire-unwire the IRQs.
    (*) GIC notification logic.
    (*) ARMCPU unrealize logic.
    (*) Handling of SMCC Hypercall Exits by KVM to Qemu.
   
B. Architecture *agnostic* patches:

   [PATCH V13 0/8] Add architecture agnostic code to support vCPU Hotplug.
   https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
    (*) Refactors vCPU create, Parking, unparking logic of vCPUs, and addition of traces.
    (*) Build ACPI AML related to CPU control dev.
    (*) Changes related to the destruction of CPU Address Space.
    (*) Changes related to the uninitialization of GDB Stub.
    (*) Updating of Docs.

(XII) REFERENCES
================

[1] https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
[2] https://lore.kernel.org/linux-arm-kernel/20200625133757.22332-1-salil.mehta@huawei.com/
[3] https://lore.kernel.org/lkml/20230203135043.409192-1-james.morse@arm.com/
[4] https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/
[5] https://lore.kernel.org/all/20230404154050.2270077-1-oliver.upton@linux.dev/
[6] https://bugzilla.tianocore.org/show_bug.cgi?id=3706
[7] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gic-cpu-interface-gicc-structure
[8] https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
[9] https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler
[10] https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html
[11] https://lkml.org/lkml/2019/7/10/235
[12] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
[13] https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06517.html
[14] https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/thread/7CGL6JTACPUZEYQC34CZ2ZBWJGSR74WE/
[15] http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01168.html
[16] https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00131.html
[17] https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/
[18] https://lore.kernel.org/lkml/20210608154805.216869-1-jean-philippe@linaro.org/
[19] https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/ 
[20] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gicc-cpu-interface-flags
[21] https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
[22] https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7

(XIII) ACKNOWLEDGEMENTS
=======================

I would like to take this opportunity to thank below people for various
discussions with me over different channels during the development:

Marc Zyngier (Google)               Catalin Marinas (ARM),         
James Morse(ARM),                   Will Deacon (Google), 
Jean-Phillipe Brucker (Linaro),     Sudeep Holla (ARM),
Lorenzo Pieralisi (Linaro),         Gavin Shan (Redhat), 
Jonathan Cameron (Huawei),          Darren Hart (Ampere),
Igor Mamedov (Redhat),              Ilkka Koskinen (Ampere),
Andrew Jones (Redhat),              Karl Heubaum (Oracle),
Keqian Zhu (Huawei),                Miguel Luis (Oracle),
Xiongfeng Wang (Huawei),            Vishnu Pajjuri (Ampere),
Shameerali Kolothum (Huawei)        Russell King (Oracle)
Xuwei/Joy (Huawei),                 Peter Maydel (Linaro)
Zengtao/Prime (Huawei),             And all those whom I have missed! 

Many thanks to the following people for their current or past contributions:

1. James Morse (ARM)
   (Current Kernel part of vCPU Hotplug Support on AARCH64)
2. Jean-Philippe Brucker (Linaro)
   (Prototyped one of the earlier PSCI-based POC [17][18] based on RFC V1)
3. Keqian Zhu (Huawei)
   (Co-developed Qemu prototype)
4. Xiongfeng Wang (Huawei)
   (Co-developed an earlier kernel prototype with me)
5. Vishnu Pajjuri (Ampere)
   (Verification on Ampere ARM64 Platforms + fixes)
6. Miguel Luis (Oracle)
   (Verification on Oracle ARM64 Platforms + fixes)
7. Russell King (Oracle) & Jonathan Cameron (Huawei)
   (Helping in upstreaming James Morse's Kernel patches).

(XIV) Change Log:
=================

RFC V2 -> RFC V3:
-----------------
1. Miscellaneous:
   - Split the RFC V2 into arch-agnostic and arch-specific patch sets.
2. Addressed Gavin Shan's (RedHat) comments:
   - Made CPU property accessors inline.
     https://lore.kernel.org/qemu-devel/6cd28639-2cfa-f233-c6d9-d5d2ec5b1c58@redhat.com/
   - Collected Reviewed-bys [PATCH RFC V2 4/37, 14/37, 22/37].
   - Dropped the patch as it was not required after init logic was refactored.
     https://lore.kernel.org/qemu-devel/4fb2eef9-6742-1eeb-721a-b3db04b1be97@redhat.com/
   - Fixed the range check for the core during vCPU Plug.
     https://lore.kernel.org/qemu-devel/1c5fa24c-6bf3-750f-4f22-087e4a9311af@redhat.com/
   - Added has_hotpluggable_vcpus check to make build_cpus_aml() conditional.
     https://lore.kernel.org/qemu-devel/832342cb-74bc-58dd-c5d7-6f995baeb0f2@redhat.com/
   - Fixed the states initialization in cpu_hotplug_hw_init() to accommodate previous refactoring.
     https://lore.kernel.org/qemu-devel/da5e5609-1883-8650-c7d8-6868c7b74f1c@redhat.com/
   - Fixed typos.
     https://lore.kernel.org/qemu-devel/eb1ac571-7844-55e6-15e7-3dd7df21366b@redhat.com/
   - Removed the unnecessary 'goto fail'.
     https://lore.kernel.org/qemu-devel/4d8980ac-f402-60d4-fe52-787815af8a7d@redhat.com/#t
   - Added check for hotpluggable vCPUs in the _OSC method.
     https://lore.kernel.org/qemu-devel/20231017001326.FUBqQ1PTowF2GxQpnL3kIW0AhmSqbspazwixAHVSi6c@z/
3. Addressed Shaoqin Huang's (Intel) comments:
   - Fixed the compilation break due to the absence of a call to virt_cpu_properties() missing
     along with its definition.
     https://lore.kernel.org/qemu-devel/3632ee24-47f7-ae68-8790-26eb2cf9950b@redhat.com/
4. Addressed Jonathan Cameron's (Huawei) comments:
   - Gated the 'disabled vcpu message' for GIC version < 3.
     https://lore.kernel.org/qemu-devel/20240116155911.00004fe1@Huawei.com/

RFC V1 -> RFC V2:
-----------------
1. Addressed James Morse's (ARM) requirement as per Linaro Open Discussion:
   - Exposed all possible vCPUs as always ACPI _STA.present and available during boot time.
   - Added the _OSC handling as required by James's patches.
   - Introduction of 'online-capable' bit handling in the flag of MADT GICC.
   - SMCC Hypercall Exit handling in Qemu.
2. Addressed Marc Zyngier's comment:
   - Fixed the note about GIC CPU Interface in the cover letter.
3. Addressed issues raised by Vishnu Pajjuru (Ampere) & Miguel Luis (Oracle) during testing:
   - Live/Pseudo Migration crashes.
4. Others:
   - Introduced the concept of persistent vCPU at QOM.
   - Introduced wrapper APIs of present, possible, and persistent.
   - Change at ACPI hotplug H/W init leg accommodating initializing is_present and is_enabled states.
   - Check to avoid unplugging cold-booted vCPUs.
   - Disabled hotplugging with TCG/HVF/QTEST.
   - Introduced CPU Topology, {socket, cluster, core, thread}-id property.
   - Extract virt CPU properties as a common virt_vcpu_properties() function.

Author Salil Mehta (1):
  target/arm/kvm,tcg: Register/Handle SMCCC hypercall exits to VMM/Qemu

Jean-Philippe Brucker (2):
  hw/acpi: Make _MAT method optional
  target/arm/kvm: Write CPU state back to KVM on reset

Miguel Luis (1):
  tcg/mttcg: enable threads to unregister in tcg_ctxs[]

Salil Mehta (25):
  arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id
    property
  cpu-common: Add common CPU utility for possible vCPUs
  hw/arm/virt: Limit number of possible vCPUs for unsupported Accel or
    GIC Type
  hw/arm/virt: Move setting of common CPU properties in a function
  arm/virt,target/arm: Machine init time change common to vCPU
    {cold|hot}-plug
  arm/virt,kvm: Pre-create disabled possible vCPUs @machine init
  arm/virt,gicv3: Changes to pre-size GIC with possible vcpus @machine
    init
  arm/virt: Init PMU at host for all possible vcpus
  arm/acpi: Enable ACPI support for vcpu hotplug
  arm/virt: Add cpu hotplug events to GED during creation
  arm/virt: Create GED dev before *disabled* CPU Objs are destroyed
  arm/virt/acpi: Build CPUs AML with CPU Hotplug support
  arm/virt: Make ARM vCPU *present* status ACPI *persistent*
  hw/acpi: ACPI/AML Changes to reflect the correct _STA.{PRES,ENA} Bits
    to Guest
  hw/arm: MADT Tbl change to size the guest with possible vCPUs
  arm/virt: Release objects for *disabled* possible vCPUs after init
  arm/virt: Add/update basic hot-(un)plug framework
  arm/virt: Changes to (un)wire GICC<->vCPU IRQs during hot-(un)plug
  hw/arm,gicv3: Changes to update GIC with vCPU hot-plug notification
  hw/intc/arm-gicv3*: Changes required to (re)init the vCPU register
    info
  arm/virt: Update the guest(via GED) about CPU hot-(un)plug events
  hw/arm: Changes required for reset and to support next boot
  target/arm: Add support of *unrealize* ARMCPU during vCPU Hot-unplug
  hw/arm: Support hotplug capability check using _OSC method
  hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled

 accel/tcg/tcg-accel-ops-mttcg.c    |   1 +
 cpu-common.c                       |  37 ++
 hw/acpi/cpu.c                      |  62 +-
 hw/acpi/generic_event_device.c     |  11 +
 hw/arm/Kconfig                     |   1 +
 hw/arm/boot.c                      |   2 +-
 hw/arm/virt-acpi-build.c           | 113 +++-
 hw/arm/virt.c                      | 877 +++++++++++++++++++++++------
 hw/core/gpio.c                     |   2 +-
 hw/intc/arm_gicv3.c                |   1 +
 hw/intc/arm_gicv3_common.c         |  66 ++-
 hw/intc/arm_gicv3_cpuif.c          | 269 +++++----
 hw/intc/arm_gicv3_cpuif_common.c   |   5 +
 hw/intc/arm_gicv3_kvm.c            |  39 +-
 hw/intc/gicv3_internal.h           |   2 +
 include/hw/acpi/cpu.h              |   2 +
 include/hw/arm/boot.h              |   2 +
 include/hw/arm/virt.h              |  38 +-
 include/hw/core/cpu.h              |  78 +++
 include/hw/intc/arm_gicv3_common.h |  23 +
 include/hw/qdev-core.h             |   2 +
 include/tcg/startup.h              |   7 +
 target/arm/arm-powerctl.c          |  51 +-
 target/arm/cpu-qom.h               |  18 +-
 target/arm/cpu.c                   | 112 ++++
 target/arm/cpu.h                   |  18 +
 target/arm/cpu64.c                 |  15 +
 target/arm/gdbstub.c               |   6 +
 target/arm/helper.c                |  27 +-
 target/arm/internals.h             |  14 +-
 target/arm/kvm.c                   | 146 ++++-
 target/arm/kvm_arm.h               |  25 +
 target/arm/meson.build             |   1 +
 target/arm/{tcg => }/psci.c        |   8 +
 target/arm/tcg/meson.build         |   4 -
 tcg/tcg.c                          |  24 +
 36 files changed, 1749 insertions(+), 360 deletions(-)
 rename target/arm/{tcg => }/psci.c (97%)

Vishnu Pajjuri June 26, 2024, 9:53 a.m. UTC | #1

Hi Salil,

On 14-06-2024 05:06, Salil Mehta wrote:
> PROLOGUE
> ========
>
> To assist in review and set the right expectations from this RFC, please first
> read the sections *APPENDED AT THE END* of this cover letter:
>
> 1. Important *DISCLAIMER* [Section (X)]
> 2. Work presented at KVMForum Conference (slides available) [Section (V)F]
> 3. Organization of patches [Section (XI)]
> 4. References [Section (XII)]
> 5. Detailed TODO list of leftover work or work-in-progress [Section (IX)]
>
> There has been interest shown by other organizations in adapting this series
> for their architecture. Hence, RFC V2 [21] has been split into architecture
> *agnostic* [22] and *specific* patch sets.
>
> This is an ARM architecture-specific patch set carved out of RFC V2. Please
> check section (XI)B for details of architecture agnostic patches.
>
> SECTIONS [I - XIII] are as follows:
>
> (I) Key Changes [details in last section (XIV)]
> ==============================================
>
> RFC V2 -> RFC V3
>
> 1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3) patch sets.
> 2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang (RedHat), Philippe Mathieu-Daudé (Linaro),
>     Jonathan Cameron (Huawei), Zhao Liu (Intel).

I tried following test cases with rfc-v3 and kernel patches v10, and 
it's looking good on Ampere platforms.

  * Regular hotplug and hot unplug tests
  * Live migration with and with out hot-plugging vcpus tests

Please feel free to add,
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>

_Regards_,

-Vishnu.

> RFC V1 -> RFC V2
>
> RFC V1:https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
>
> 1. ACPI MADT Table GIC CPU Interface can now be presented [6] as ACPI
>     *online-capable* or *enabled* to the Guest OS at boot time. This means
>     associated CPUs can have ACPI _STA as *enabled* or *disabled* even after boot.
>     See UEFI ACPI 6.5 Spec, Section 05, Table 5.37 GICC CPU Interface Flags[20].
> 2. SMCC/HVC Hypercall exit handling in userspace/Qemu for PSCI CPU_{ON,OFF}
>     request. This is required to {dis}allow online'ing a vCPU.
> 3. Always presenting unplugged vCPUs in CPUs ACPI AML code as ACPI _STA.PRESENT
>     to the Guest OS. Toggling ACPI _STA.Enabled to give an effect of the
>     hot{un}plug.
> 4. Live Migration works (some issues are still there).
> 5. TCG/HVF/qtest does not support Hotplug and falls back to default.
> 6. Code for TCG support exists in this release (it is a work-in-progress).
> 7. ACPI _OSC method can now be used by OSPM to negotiate Qemu VM platform
>     hotplug capability (_OSC Query support still pending).
> 8. Misc. Bug fixes.
>
> (II) Summary
> ============
>
> This patch set introduces virtual CPU hotplug support for the ARMv8 architecture
> in QEMU. The idea is to be able to hotplug and hot-unplug vCPUs while the guest VM
> is running, without requiring a reboot. This does *not* make any assumptions about
> the physical CPU hotplug availability within the host system but rather tries to
> solve the problem at the virtualizer/QEMU layer. It introduces ACPI CPU hotplug hooks
> and event handling to interface with the guest kernel, and code to initialize, plug,
> and unplug CPUs. No changes are required within the host kernel/KVM except the
> support of hypercall exit handling in the user-space/Qemu, which has recently
> been added to the kernel. Corresponding guest kernel changes have been
> posted on the mailing list [3] [4] by James Morse.
>
> (III) Motivation
> ================
>
> This allows scaling the guest VM compute capacity on-demand, which would be
> useful for the following example scenarios:
>
> 1. Vertical Pod Autoscaling [9][10] in the cloud: Part of the orchestration
>     framework that could adjust resource requests (CPU and Mem requests) for
>     the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure providers could allocate and
>     restrict the total number of compute resources available to the guest VM
>     according to the SLA (Service Level Agreement). VM owners could request more
>     compute to be hot-plugged for some cost.
>
> For example, Kata Container VM starts with a minimum amount of resources (i.e.,
> hotplug everything approach). Why?
>
> 1. Allowing faster *boot time* and
> 2. Reduction in *memory footprint*
>
> Kata Container VM can boot with just 1 vCPU, and then later more vCPUs can be
> hot-plugged as needed.
>
> (IV) Terminology
> ================
>
> (*) Possible CPUs: Total vCPUs that could ever exist in the VM. This includes
>                     any cold-booted CPUs plus any CPUs that could be later
>                     hot-plugged.
>                     - Qemu parameter (-smp maxcpus=N)
> (*) Present CPUs:  Possible CPUs that are ACPI 'present'. These might or might
>                     not be ACPI 'enabled'.
>                     - Present vCPUs = Possible vCPUs (Always on ARM Arch)
> (*) Enabled CPUs:  Possible CPUs that are ACPI 'present' and 'enabled' and can
>                     now be ‘onlined’ (PSCI) for use by the Guest Kernel. All cold-
>                     booted vCPUs are ACPI 'enabled' at boot. Later, using
>                     device_add, more vCPUs can be hotplugged and made ACPI
>                     'enabled'.
>                     - Qemu parameter (-smp cpus=N). Can be used to specify some
> 	           cold-booted vCPUs during VM init. Some can be added using the
> 	           '-device' option.
>
> (V) Constraints Due to ARMv8 CPU Architecture [+] Other Impediments
> ===================================================================
>
> A. Physical Limitation to Support CPU Hotplug: (Architectural Constraint)
>     1. ARMv8 CPU architecture does not support the concept of the physical CPU
>        hotplug.
>        a. There are many per-CPU components like PMU, SVE, MTE, Arch timers, etc.,
>           whose behavior needs to be clearly defined when the CPU is hot(un)plugged.
>           There is no specification for this.
>
>     2. Other ARM components like GIC, etc., have not been designed to realize
>        physical CPU hotplug capability as of now. For example,
>        a. Every physical CPU has a unique GICC (GIC CPU Interface) by construct.
>           Architecture does not specify what CPU hot(un)plug would mean in
>           context to any of these.
>        b. CPUs/GICC are physically connected to unique GICR (GIC Redistributor).
>           GIC Redistributors are always part of the always-on power domain. Hence,
>           they cannot be powered off as per specification.
>
> B. Impediments in Firmware/ACPI (Architectural Constraint)
>
>     1. Firmware has to expose GICC, GICR, and other per-CPU features like PMU,
>        SVE, MTE, Arch Timers, etc., to the OS. Due to the architectural constraint
>        stated in section A1(a), all interrupt controller structures of
>        MADT describing GIC CPU Interfaces and the GIC Redistributors MUST be
>        presented by firmware to the OSPM during boot time.
>     2. Architectures that support CPU hotplug can evaluate the ACPI _MAT method to
>        get this kind of information from the firmware even after boot, and the
>        OSPM has the capability to process these. ARM kernel uses information in MADT
>        interrupt controller structures to identify the number of present CPUs during
>        boot and hence does not allow to change these after boot. The number of
>        present CPUs cannot be changed. It is an architectural constraint!
>
> C. Impediments in KVM to Support Virtual CPU Hotplug (Architectural Constraint)
>
>     1. KVM VGIC:
>        a. Sizing of various VGIC resources like memory regions, etc., related to
>           the redistributor happens only once and is fixed at the VM init time
>           and cannot be changed later after initialization has happened.
>           KVM statically configures these resources based on the number of vCPUs
>           and the number/size of redistributor ranges.
>        b. Association between vCPU and its VGIC redistributor is fixed at the
>           VM init time within the KVM, i.e., when redistributor iodevs gets
>           registered. VGIC does not allow to setup/change this association
>           after VM initialization has happened. Physically, every CPU/GICC is
>           uniquely connected with its redistributor, and there is no
>           architectural way to set this up.
>     2. KVM vCPUs:
>        a. Lack of specification means destruction of KVM vCPUs does not exist as
>           there is no reference to tell what to do with other per-vCPU
>           components like redistributors, arch timer, etc.
>        b. In fact, KVM does not implement the destruction of vCPUs for any
>           architecture. This is independent of whether the architecture
>           actually supports CPU Hotplug feature. For example, even for x86 KVM
>           does not implement the destruction of vCPUs.
>
> D. Impediments in Qemu to Support Virtual CPU Hotplug (KVM Constraints->Arch)
>
>     1. Qemu CPU Objects MUST be created to initialize all the Host KVM vCPUs to
>        overcome the KVM constraint. KVM vCPUs are created and initialized when Qemu
>        CPU Objects are realized. But keeping the QOM CPU objects realized for
>        'yet-to-be-plugged' vCPUs can create problems when these new vCPUs shall
>        be plugged using device_add and a new QOM CPU object shall be created.
>     2. GICV3State and GICV3CPUState objects MUST be sized over *possible vCPUs*
>        during VM init time while QOM GICV3 Object is realized. This is because
>        KVM VGIC can only be initialized once during init time. But every
>        GICV3CPUState has an associated QOM CPU Object. Later might correspond to
>        vCPU which are 'yet-to-be-plugged' (unplugged at init).
>     3. How should new QOM CPU objects be connected back to the GICV3CPUState
>        objects and disconnected from it in case the CPU is being hot(un)plugged?
>     4. How should 'unplugged' or 'yet-to-be-plugged' vCPUs be represented in the
>        QOM for which KVM vCPU already exists? For example, whether to keep,
>         a. No QOM CPU objects Or
>         b. Unrealized CPU Objects
>     5. How should vCPU state be exposed via ACPI to the Guest? Especially for
>        the unplugged/yet-to-be-plugged vCPUs whose CPU objects might not exist
>        within the QOM but the Guest always expects all possible vCPUs to be
>        identified as ACPI *present* during boot.
>     6. How should Qemu expose GIC CPU interfaces for the unplugged or
>        yet-to-be-plugged vCPUs using ACPI MADT Table to the Guest?
>
> E. Summary of Approach ([+] Workarounds to problems in sections A, B, C & D)
>
>     1. At VM Init, pre-create all the possible vCPUs in the Host KVM i.e., even
>        for the vCPUs which are yet-to-be-plugged in Qemu but keep them in the
>        powered-off state.
>     2. After the KVM vCPUs have been initialized in the Host, the KVM vCPU
>        objects corresponding to the unplugged/yet-to-be-plugged vCPUs are parked
>        at the existing per-VM "kvm_parked_vcpus" list in Qemu. (similar to x86)
>     3. GICV3State and GICV3CPUState objects are sized over possible vCPUs during
>        VM init time i.e., when Qemu GIC is realized. This, in turn, sizes KVM VGIC
>        resources like memory regions, etc., related to the redistributors with the
>        number of possible KVM vCPUs. This never changes after VM has initialized.
>     4. Qemu CPU objects corresponding to unplugged/yet-to-be-plugged vCPUs are
>        released post Host KVM CPU and GIC/VGIC initialization.
>     5. Build ACPI MADT Table with the following updates:
>        a. Number of GIC CPU interface entries (=possible vCPUs)
>        b. Present Boot vCPU as MADT.GICC.Enabled=1 (Not hot[un]pluggable)
>        c. Present hot(un)pluggable vCPUs as MADT.GICC.online-capable=1
>           - MADT.GICC.Enabled=0 (Mutually exclusive) [6][7]
> 	 - vCPU can be ACPI enabled+onlined after Guest boots (Firmware Policy)
> 	 - Some issues with above (details in later sections)
>     6. Expose below ACPI Status to Guest kernel:
>        a. Always _STA.Present=1 (all possible vCPUs)
>        b. _STA.Enabled=1 (plugged vCPUs)
>        c. _STA.Enabled=0 (unplugged vCPUs)
>     7. vCPU hotplug *realizes* new QOM CPU object. The following happens:
>        a. Realizes, initializes QOM CPU Object & spawns Qemu vCPU thread.
>        b. Unparks the existing KVM vCPU ("kvm_parked_vcpus" list).
>           - Attaches to QOM CPU object.
>        c. Reinitializes KVM vCPU in the Host.
>           - Resets the core and sys regs, sets defaults, etc.
>        d. Runs KVM vCPU (created with "start-powered-off").
> 	 - vCPU thread sleeps (waits for vCPU reset via PSCI).
>        e. Updates Qemu GIC.
>           - Wires back IRQs related to this vCPU.
>           - GICV3CPUState association with QOM CPU Object.
>        f. Updates [6] ACPI _STA.Enabled=1.
>        g. Notifies Guest about the new vCPU (via ACPI GED interface).
> 	 - Guest checks _STA.Enabled=1.
> 	 - Guest adds processor (registers CPU with LDM) [3].
>        h. Plugs the QOM CPU object in the slot.
>           - slot-number = cpu-index {socket, cluster, core, thread}.
>        i. Guest online's vCPU (CPU_ON PSCI call over HVC/SMC).
>           - KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
>           - Qemu powers-on KVM vCPU in the Host.
>     8. vCPU hot-unplug *unrealizes* QOM CPU Object. The following happens:
>        a. Notifies Guest (via ACPI GED interface) vCPU hot-unplug event.
>           - Guest offline's vCPU (CPU_OFF PSCI call over HVC/SMC).
>        b. KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
>           - Qemu powers-off the KVM vCPU in the Host.
>        c. Guest signals *Eject* vCPU to Qemu.
>        d. Qemu updates [6] ACPI _STA.Enabled=0.
>        e. Updates GIC.
>           - Un-wires IRQs related to this vCPU.
>           - GICV3CPUState association with new QOM CPU Object is updated.
>        f. Unplugs the vCPU.
> 	 - Removes from slot.
>           - Parks KVM vCPU ("kvm_parked_vcpus" list).
>           - Unrealizes QOM CPU Object & joins back Qemu vCPU thread.
> 	 - Destroys QOM CPU object.
>        g. Guest checks ACPI _STA.Enabled=0.
>           - Removes processor (unregisters CPU with LDM) [3].
>
> F. Work Presented at KVM Forum Conferences:
> ==========================================
>
> Details of the above work have been presented at KVMForum2020 and KVMForum2023
> conferences. Slides & video are available at the links below:
> a. KVMForum 2023
>     - Challenges Revisited in Supporting Virt CPU Hotplug on architectures that don't Support CPU Hotplug (like ARM64).
>       https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>       https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
>       https://www.youtube.com/watch?v=hyrw4j2D6I0&t=23970s
>       https://kvm-forum.qemu.org/2023/talk/9SMPDQ/
> b. KVMForum 2020
>     - Challenges in Supporting Virtual CPU Hotplug on SoC Based Systems (like ARM64) - Salil Mehta, Huawei.
>       https://sched.co/eE4m
>
> (VI) Commands Used
> ==================
>
> A. Qemu launch commands to init the machine:
>
>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
>        -cpu host -smp cpus=4,maxcpus=6 \
>        -m 300M \
>        -kernel Image \
>        -initrd rootfs.cpio.gz \
>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
>        -nographic \
>        -bios QEMU_EFI.fd \
>
> B. Hot-(un)plug related commands:
>
>    # Hotplug a host vCPU (accel=kvm):
>      $ device_add host-arm-cpu,id=core4,core-id=4
>
>    # Hotplug a vCPU (accel=tcg):
>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>
>    # Delete the vCPU:
>      $ device_del core4
>
> Sample output on guest after boot:
>
>      $ cat /sys/devices/system/cpu/possible
>      0-5
>      $ cat /sys/devices/system/cpu/present
>      0-5
>      $ cat /sys/devices/system/cpu/enabled
>      0-3
>      $ cat /sys/devices/system/cpu/online
>      0-1
>      $ cat /sys/devices/system/cpu/offline
>      2-5
>
> Sample output on guest after hotplug of vCPU=4:
>
>      $ cat /sys/devices/system/cpu/possible
>      0-5
>      $ cat /sys/devices/system/cpu/present
>      0-5
>      $ cat /sys/devices/system/cpu/enabled
>      0-4
>      $ cat /sys/devices/system/cpu/online
>      0-1,4
>      $ cat /sys/devices/system/cpu/offline
>      2-3,5
>
>      Note: vCPU=4 was explicitly 'onlined' after hot-plug
>      $ echo 1 > /sys/devices/system/cpu/cpu4/online
>
> (VII) Latest Repository
> =======================
>
> (*) Latest Qemu RFC V3 (Architecture Specific) patch set:
>      https://github.com/salil-mehta/qemu.git  virt-cpuhp-armv8/rfc-v3
> (*) Latest Qemu V13 (Architecture Agnostic) patch set:
>      https://github.com/salil-mehta/qemu.git  virt-cpuhp-armv8/rfc-v3.arch.agnostic.v13
> (*) QEMU changes for vCPU hotplug can be cloned from below site:
>      https://github.com/salil-mehta/qemu.git  virt-cpuhp-armv8/rfc-v2
> (*) Guest Kernel changes (by James Morse, ARM) are available here:
>      https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git  virtual_cpu_hotplug/rfc/v2
> (*) Leftover patches of the kernel are available here:
>      https://lore.kernel.org/lkml/20240529133446.28446-1-Jonathan.Cameron@huawei.com/
>      https://github.com/salil-mehta/linux/commits/virtual_cpu_hotplug/rfc/v6.jic/  (not latest)
>
> (VIII) KNOWN ISSUES
> ===================
>
> 1. Migration has been lightly tested but has been found working.
> 2. TCG is broken.
> 3. HVF and qtest are not supported yet.
> 4. ACPI MADT Table flags [7] MADT.GICC.Enabled and MADT.GICC.online-capable are
>     mutually exclusive, i.e., as per the change [6], a vCPU cannot be both
>     GICC.Enabled and GICC.online-capable. This means:
>        [ Link:https://bugzilla.tianocore.org/show_bug.cgi?id=3706  ]
>     a. If we have to support hot-unplug of the cold-booted vCPUs, then these MUST
>        be specified as GICC.online-capable in the MADT Table during boot by the
>        firmware/Qemu. But this requirement conflicts with the requirement to
>        support new Qemu changes with legacy OS that don't understand
>        MADT.GICC.online-capable Bit. Legacy OS during boot time will ignore this
>        bit, and hence these vCPUs will not appear on such OS. This is unexpected
>        behavior.
>     b. In case we decide to specify vCPUs as MADT.GICC.Enabled and try to unplug
>        these cold-booted vCPUs from OS (which in actuality should be blocked by
>        returning error at Qemu), then features like 'kexec' will break.
>     c. As I understand, removal of the cold-booted vCPUs is a required feature
>        and x86 world allows it.
>     d. Hence, either we need a specification change to make the MADT.GICC.Enabled
>        and MADT.GICC.online-capable Bits NOT mutually exclusive or NOT support
>        the removal of cold-booted vCPUs. In the latter case, a check can be introduced
>        to bar the users from unplugging vCPUs, which were cold-booted, using QMP
>        commands. (Needs discussion!)
>        Please check the patch part of this patch set:
>        [hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled].
>     
>        NOTE: This is definitely not a blocker!
> 5. Code related to the notification to GICV3 about the hot(un)plug of a vCPU event
>     might need further discussion.
>
>
> (IX) THINGS TO DO
> =================
>
> 1. Fix issues related to TCG/Emulation support. (Not a blocker)
> 2. Comprehensive Testing is in progress. (Positive feedback from Oracle & Ampere)
> 3. Qemu Documentation (.rst) needs to be updated.
> 4. Fix qtest, HVF Support (Future).
> 5. Update the design issue related to ACPI MADT.GICC flags discussed in known
>     issues. This might require UEFI ACPI specification change (Not a blocker).
> 6. Add ACPI _OSC 'Query' support. Only part of _OSC support exists now. (Not a blocker).
>
> The above is *not* a complete list. Will update later!
>
> Best regards,
> Salil.
>
> (X) DISCLAIMER
> ==============
>
> This work is an attempt to present a proof-of-concept of the ARM64 vCPU hotplug
> implementation to the community. This is *not* production-level code and might
> have bugs. Comprehensive testing is being done on HiSilicon Kunpeng920 SoC,
> Oracle, and Ampere servers. We are nearing stable code and a non-RFC
> version shall be floated soon.
>
> This work is *mostly* in the lines of the discussions that have happened in the
> previous years [see refs below] across different channels like the mailing list,
> Linaro Open Discussions platform, and various conferences like KVMForum, etc. This
> RFC is being used as a way to verify the idea mentioned in this cover letter and
> to get community views. Once this has been agreed upon, a formal patch shall be
> posted to the mailing list for review.
>
> [The concept being presented has been found to work!]
>
> (XI) ORGANIZATION OF PATCHES
> ============================
>   
> A. Architecture *specific* patches:
>
>     [Patch 1-8, 17, 27, 29] logic required during machine init.
>      (*) Some validation checks.
>      (*) Introduces core-id property and some util functions required later.
>      (*) Logic to pre-create vCPUs.
>      (*) GIC initialization pre-sized with possible vCPUs.
>      (*) Some refactoring to have common hot and cold plug logic together.
>      (*) Release of disabled QOM CPU objects in post_cpu_init().
>      (*) Support of ACPI _OSC method to negotiate platform hotplug capabilities.
>     [Patch 9-16] logic related to ACPI at machine init time.
>      (*) Changes required to Enable ACPI for CPU hotplug.
>      (*) Initialization of ACPI GED framework to cater to CPU Hotplug Events.
>      (*) ACPI MADT/MAT changes.
>     [Patch 18-26] logic required during vCPU hot-(un)plug.
>      (*) Basic framework changes to support vCPU hot-(un)plug.
>      (*) ACPI GED changes for hot-(un)plug hooks.
>      (*) Wire-unwire the IRQs.
>      (*) GIC notification logic.
>      (*) ARMCPU unrealize logic.
>      (*) Handling of SMCC Hypercall Exits by KVM to Qemu.
>     
> B. Architecture *agnostic* patches:
>
>     [PATCH V13 0/8] Add architecture agnostic code to support vCPU Hotplug.
>     https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
>      (*) Refactors vCPU create, Parking, unparking logic of vCPUs, and addition of traces.
>      (*) Build ACPI AML related to CPU control dev.
>      (*) Changes related to the destruction of CPU Address Space.
>      (*) Changes related to the uninitialization of GDB Stub.
>      (*) Updating of Docs.
>
> (XII) REFERENCES
> ================
>
> [1]https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
> [2]https://lore.kernel.org/linux-arm-kernel/20200625133757.22332-1-salil.mehta@huawei.com/
> [3]https://lore.kernel.org/lkml/20230203135043.409192-1-james.morse@arm.com/
> [4]https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/
> [5]https://lore.kernel.org/all/20230404154050.2270077-1-oliver.upton@linux.dev/
> [6]https://bugzilla.tianocore.org/show_bug.cgi?id=3706
> [7]https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gic-cpu-interface-gicc-structure
> [8]https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> [9]https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler
> [10]https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html
> [11]https://lkml.org/lkml/2019/7/10/235
> [12]https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
> [13]https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06517.html
> [14]https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/thread/7CGL6JTACPUZEYQC34CZ2ZBWJGSR74WE/
> [15]http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01168.html
> [16]https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00131.html
> [17]https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/
> [18]https://lore.kernel.org/lkml/20210608154805.216869-1-jean-philippe@linaro.org/
> [19]https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/  
> [20]https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gicc-cpu-interface-flags
> [21]https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
> [22]https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
>
> (XIII) ACKNOWLEDGEMENTS
> =======================
>
> I would like to take this opportunity to thank below people for various
> discussions with me over different channels during the development:
>
> Marc Zyngier (Google)               Catalin Marinas (ARM),
> James Morse(ARM),                   Will Deacon (Google),
> Jean-Phillipe Brucker (Linaro),     Sudeep Holla (ARM),
> Lorenzo Pieralisi (Linaro),         Gavin Shan (Redhat),
> Jonathan Cameron (Huawei),          Darren Hart (Ampere),
> Igor Mamedov (Redhat),              Ilkka Koskinen (Ampere),
> Andrew Jones (Redhat),              Karl Heubaum (Oracle),
> Keqian Zhu (Huawei),                Miguel Luis (Oracle),
> Xiongfeng Wang (Huawei),            Vishnu Pajjuri (Ampere),
> Shameerali Kolothum (Huawei)        Russell King (Oracle)
> Xuwei/Joy (Huawei),                 Peter Maydel (Linaro)
> Zengtao/Prime (Huawei),             And all those whom I have missed!
>
> Many thanks to the following people for their current or past contributions:
>
> 1. James Morse (ARM)
>     (Current Kernel part of vCPU Hotplug Support on AARCH64)
> 2. Jean-Philippe Brucker (Linaro)
>     (Prototyped one of the earlier PSCI-based POC [17][18] based on RFC V1)
> 3. Keqian Zhu (Huawei)
>     (Co-developed Qemu prototype)
> 4. Xiongfeng Wang (Huawei)
>     (Co-developed an earlier kernel prototype with me)
> 5. Vishnu Pajjuri (Ampere)
>     (Verification on Ampere ARM64 Platforms + fixes)
> 6. Miguel Luis (Oracle)
>     (Verification on Oracle ARM64 Platforms + fixes)
> 7. Russell King (Oracle) & Jonathan Cameron (Huawei)
>     (Helping in upstreaming James Morse's Kernel patches).
>
> (XIV) Change Log:
> =================
>
> RFC V2 -> RFC V3:
> -----------------
> 1. Miscellaneous:
>     - Split the RFC V2 into arch-agnostic and arch-specific patch sets.
> 2. Addressed Gavin Shan's (RedHat) comments:
>     - Made CPU property accessors inline.
>       https://lore.kernel.org/qemu-devel/6cd28639-2cfa-f233-c6d9-d5d2ec5b1c58@redhat.com/
>     - Collected Reviewed-bys [PATCH RFC V2 4/37, 14/37, 22/37].
>     - Dropped the patch as it was not required after init logic was refactored.
>       https://lore.kernel.org/qemu-devel/4fb2eef9-6742-1eeb-721a-b3db04b1be97@redhat.com/
>     - Fixed the range check for the core during vCPU Plug.
>       https://lore.kernel.org/qemu-devel/1c5fa24c-6bf3-750f-4f22-087e4a9311af@redhat.com/
>     - Added has_hotpluggable_vcpus check to make build_cpus_aml() conditional.
>       https://lore.kernel.org/qemu-devel/832342cb-74bc-58dd-c5d7-6f995baeb0f2@redhat.com/
>     - Fixed the states initialization in cpu_hotplug_hw_init() to accommodate previous refactoring.
>       https://lore.kernel.org/qemu-devel/da5e5609-1883-8650-c7d8-6868c7b74f1c@redhat.com/
>     - Fixed typos.
>       https://lore.kernel.org/qemu-devel/eb1ac571-7844-55e6-15e7-3dd7df21366b@redhat.com/
>     - Removed the unnecessary 'goto fail'.
>       https://lore.kernel.org/qemu-devel/4d8980ac-f402-60d4-fe52-787815af8a7d@redhat.com/#t
>     - Added check for hotpluggable vCPUs in the _OSC method.
>       https://lore.kernel.org/qemu-devel/20231017001326.FUBqQ1PTowF2GxQpnL3kIW0AhmSqbspazwixAHVSi6c@z/
> 3. Addressed Shaoqin Huang's (Intel) comments:
>     - Fixed the compilation break due to the absence of a call to virt_cpu_properties() missing
>       along with its definition.
>       https://lore.kernel.org/qemu-devel/3632ee24-47f7-ae68-8790-26eb2cf9950b@redhat.com/
> 4. Addressed Jonathan Cameron's (Huawei) comments:
>     - Gated the 'disabled vcpu message' for GIC version < 3.
>       https://lore.kernel.org/qemu-devel/20240116155911.00004fe1@Huawei.com/
>
> RFC V1 -> RFC V2:
> -----------------
> 1. Addressed James Morse's (ARM) requirement as per Linaro Open Discussion:
>     - Exposed all possible vCPUs as always ACPI _STA.present and available during boot time.
>     - Added the _OSC handling as required by James's patches.
>     - Introduction of 'online-capable' bit handling in the flag of MADT GICC.
>     - SMCC Hypercall Exit handling in Qemu.
> 2. Addressed Marc Zyngier's comment:
>     - Fixed the note about GIC CPU Interface in the cover letter.
> 3. Addressed issues raised by Vishnu Pajjuru (Ampere) & Miguel Luis (Oracle) during testing:
>     - Live/Pseudo Migration crashes.
> 4. Others:
>     - Introduced the concept of persistent vCPU at QOM.
>     - Introduced wrapper APIs of present, possible, and persistent.
>     - Change at ACPI hotplug H/W init leg accommodating initializing is_present and is_enabled states.
>     - Check to avoid unplugging cold-booted vCPUs.
>     - Disabled hotplugging with TCG/HVF/QTEST.
>     - Introduced CPU Topology, {socket, cluster, core, thread}-id property.
>     - Extract virt CPU properties as a common virt_vcpu_properties() function.
>
> Author Salil Mehta (1):
>    target/arm/kvm,tcg: Register/Handle SMCCC hypercall exits to VMM/Qemu
>
> Jean-Philippe Brucker (2):
>    hw/acpi: Make _MAT method optional
>    target/arm/kvm: Write CPU state back to KVM on reset
>
> Miguel Luis (1):
>    tcg/mttcg: enable threads to unregister in tcg_ctxs[]
>
> Salil Mehta (25):
>    arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id
>      property
>    cpu-common: Add common CPU utility for possible vCPUs
>    hw/arm/virt: Limit number of possible vCPUs for unsupported Accel or
>      GIC Type
>    hw/arm/virt: Move setting of common CPU properties in a function
>    arm/virt,target/arm: Machine init time change common to vCPU
>      {cold|hot}-plug
>    arm/virt,kvm: Pre-create disabled possible vCPUs @machine init
>    arm/virt,gicv3: Changes to pre-size GIC with possible vcpus @machine
>      init
>    arm/virt: Init PMU at host for all possible vcpus
>    arm/acpi: Enable ACPI support for vcpu hotplug
>    arm/virt: Add cpu hotplug events to GED during creation
>    arm/virt: Create GED dev before *disabled* CPU Objs are destroyed
>    arm/virt/acpi: Build CPUs AML with CPU Hotplug support
>    arm/virt: Make ARM vCPU *present* status ACPI *persistent*
>    hw/acpi: ACPI/AML Changes to reflect the correct _STA.{PRES,ENA} Bits
>      to Guest
>    hw/arm: MADT Tbl change to size the guest with possible vCPUs
>    arm/virt: Release objects for *disabled* possible vCPUs after init
>    arm/virt: Add/update basic hot-(un)plug framework
>    arm/virt: Changes to (un)wire GICC<->vCPU IRQs during hot-(un)plug
>    hw/arm,gicv3: Changes to update GIC with vCPU hot-plug notification
>    hw/intc/arm-gicv3*: Changes required to (re)init the vCPU register
>      info
>    arm/virt: Update the guest(via GED) about CPU hot-(un)plug events
>    hw/arm: Changes required for reset and to support next boot
>    target/arm: Add support of *unrealize* ARMCPU during vCPU Hot-unplug
>    hw/arm: Support hotplug capability check using _OSC method
>    hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled
>
>   accel/tcg/tcg-accel-ops-mttcg.c    |   1 +
>   cpu-common.c                       |  37 ++
>   hw/acpi/cpu.c                      |  62 +-
>   hw/acpi/generic_event_device.c     |  11 +
>   hw/arm/Kconfig                     |   1 +
>   hw/arm/boot.c                      |   2 +-
>   hw/arm/virt-acpi-build.c           | 113 +++-
>   hw/arm/virt.c                      | 877 +++++++++++++++++++++++------
>   hw/core/gpio.c                     |   2 +-
>   hw/intc/arm_gicv3.c                |   1 +
>   hw/intc/arm_gicv3_common.c         |  66 ++-
>   hw/intc/arm_gicv3_cpuif.c          | 269 +++++----
>   hw/intc/arm_gicv3_cpuif_common.c   |   5 +
>   hw/intc/arm_gicv3_kvm.c            |  39 +-
>   hw/intc/gicv3_internal.h           |   2 +
>   include/hw/acpi/cpu.h              |   2 +
>   include/hw/arm/boot.h              |   2 +
>   include/hw/arm/virt.h              |  38 +-
>   include/hw/core/cpu.h              |  78 +++
>   include/hw/intc/arm_gicv3_common.h |  23 +
>   include/hw/qdev-core.h             |   2 +
>   include/tcg/startup.h              |   7 +
>   target/arm/arm-powerctl.c          |  51 +-
>   target/arm/cpu-qom.h               |  18 +-
>   target/arm/cpu.c                   | 112 ++++
>   target/arm/cpu.h                   |  18 +
>   target/arm/cpu64.c                 |  15 +
>   target/arm/gdbstub.c               |   6 +
>   target/arm/helper.c                |  27 +-
>   target/arm/internals.h             |  14 +-
>   target/arm/kvm.c                   | 146 ++++-
>   target/arm/kvm_arm.h               |  25 +
>   target/arm/meson.build             |   1 +
>   target/arm/{tcg => }/psci.c        |   8 +
>   target/arm/tcg/meson.build         |   4 -
>   tcg/tcg.c                          |  24 +
>   36 files changed, 1749 insertions(+), 360 deletions(-)
>   rename target/arm/{tcg => }/psci.c (97%)
>

Salil Mehta June 26, 2024, 6:01 p.m. UTC | #2

Hi Vishnu,
 
> From: Vishnu Pajjuri <vishnu@amperemail.onmicrosoft.com> 
> Sent: Wednesday, June 26, 2024 10:53 AM
> To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org; qemu-arm@nongnu.org; mst@redhat.com
> 
> Hi Salil,
> On 14-06-2024 05:06, Salil Mehta wrote:
> PROLOGUE
> ========
> 
> To assist in review and set the right expectations from this RFC, please first
> read the sections *APPENDED AT THE END* of this cover letter:
> 
> 1. Important *DISCLAIMER* [Section (X)]
> 2. Work presented at KVMForum Conference (slides available) [Section (V)F]
> 3. Organization of patches [Section (XI)]
> 4. References [Section (XII)]
> 5. Detailed TODO list of leftover work or work-in-progress [Section (IX)]
> 
> There has been interest shown by other organizations in adapting this series
> for their architecture. Hence, RFC V2 [21] has been split into architecture
> *agnostic* [22] and *specific* patch sets.
> 
> This is an ARM architecture-specific patch set carved out of RFC V2. Please
> check section (XI)B for details of architecture agnostic patches.
> 
> SECTIONS [I - XIII] are as follows:
> 
> (I) Key Changes [details in last section (XIV)]
> ==============================================
> 
> RFC V2 -> RFC V3
> 
> 1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3) patch sets.
> 2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang (RedHat), Philippe Mathieu-Daudé (Linaro),
> > Jonathan Cameron (Huawei), Zhao Liu (Intel).
> I tried following test cases with rfc-v3 and kernel patches v10, and it's looking good on Ampere platforms.
> • Regular hotplug and hot unplug tests
> • Live migration with and with out hot-plugging vcpus tests
> Please feel free to add,
> Tested-by: Vishnu Pajjuri mailto:vishnu@os.amperecomputing.com


Many thanks for testing and confirming the functionality. Really appreciate this!

Best
Salil.


> 
> Regards,
> -Vishnu.
> 
> RFC V1 -> RFC V2
>

Miguel Luis July 1, 2024, 11:38 a.m. UTC | #3

Hi Salil,

> On 13 Jun 2024, at 23:36, Salil Mehta <salil.mehta@huawei.com> wrote:
> 
> PROLOGUE
> ========
> 
> To assist in review and set the right expectations from this RFC, please first
> read the sections *APPENDED AT THE END* of this cover letter:
> 
> 1. Important *DISCLAIMER* [Section (X)]
> 2. Work presented at KVMForum Conference (slides available) [Section (V)F]
> 3. Organization of patches [Section (XI)]
> 4. References [Section (XII)]
> 5. Detailed TODO list of leftover work or work-in-progress [Section (IX)]
> 
> There has been interest shown by other organizations in adapting this series
> for their architecture. Hence, RFC V2 [21] has been split into architecture
> *agnostic* [22] and *specific* patch sets.
> 
> This is an ARM architecture-specific patch set carved out of RFC V2. Please
> check section (XI)B for details of architecture agnostic patches.
> 
> SECTIONS [I - XIII] are as follows:
> 
> (I) Key Changes [details in last section (XIV)]
> ==============================================
> 
> RFC V2 -> RFC V3
> 
> 1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3) patch sets.
> 2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang (RedHat), Philippe Mathieu-Daudé (Linaro),
>   Jonathan Cameron (Huawei), Zhao Liu (Intel).
> 

I’ve tested this series along with v10 kernel patches from [1] on the following items:

Boot.
Hotplug up to maxcpus.
Hot unplug down to the number of boot cpus.
Hotplug vcpus then migrate to a new VM.
Hot unplug down to the number of boot cpus then migrate to a new VM.
Up to 6 successive live migrations.

And in which LGTM.

Please feel free to add,
Tested-by: Miguel Luis <miguel.luis@oracle.com>

Regards,
Miguel

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/vcpu-hotplug

> RFC V1 -> RFC V2
> 
> RFC V1: https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
> 
> 1. ACPI MADT Table GIC CPU Interface can now be presented [6] as ACPI
>   *online-capable* or *enabled* to the Guest OS at boot time. This means
>   associated CPUs can have ACPI _STA as *enabled* or *disabled* even after boot.
>   See UEFI ACPI 6.5 Spec, Section 05, Table 5.37 GICC CPU Interface Flags[20].
> 2. SMCC/HVC Hypercall exit handling in userspace/Qemu for PSCI CPU_{ON,OFF}
>   request. This is required to {dis}allow online'ing a vCPU.
> 3. Always presenting unplugged vCPUs in CPUs ACPI AML code as ACPI _STA.PRESENT 
>   to the Guest OS. Toggling ACPI _STA.Enabled to give an effect of the
>   hot{un}plug.
> 4. Live Migration works (some issues are still there).
> 5. TCG/HVF/qtest does not support Hotplug and falls back to default.
> 6. Code for TCG support exists in this release (it is a work-in-progress).
> 7. ACPI _OSC method can now be used by OSPM to negotiate Qemu VM platform
>   hotplug capability (_OSC Query support still pending).
> 8. Misc. Bug fixes.
> 
> (II) Summary
> ============
> 
> This patch set introduces virtual CPU hotplug support for the ARMv8 architecture
> in QEMU. The idea is to be able to hotplug and hot-unplug vCPUs while the guest VM
> is running, without requiring a reboot. This does *not* make any assumptions about
> the physical CPU hotplug availability within the host system but rather tries to
> solve the problem at the virtualizer/QEMU layer. It introduces ACPI CPU hotplug hooks
> and event handling to interface with the guest kernel, and code to initialize, plug,
> and unplug CPUs. No changes are required within the host kernel/KVM except the
> support of hypercall exit handling in the user-space/Qemu, which has recently
> been added to the kernel. Corresponding guest kernel changes have been
> posted on the mailing list [3] [4] by James Morse.
> 
> (III) Motivation
> ================
> 
> This allows scaling the guest VM compute capacity on-demand, which would be
> useful for the following example scenarios:
> 
> 1. Vertical Pod Autoscaling [9][10] in the cloud: Part of the orchestration
>   framework that could adjust resource requests (CPU and Mem requests) for
>   the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure providers could allocate and
>   restrict the total number of compute resources available to the guest VM
>   according to the SLA (Service Level Agreement). VM owners could request more
>   compute to be hot-plugged for some cost.
> 
> For example, Kata Container VM starts with a minimum amount of resources (i.e.,
> hotplug everything approach). Why?
> 
> 1. Allowing faster *boot time* and
> 2. Reduction in *memory footprint*
> 
> Kata Container VM can boot with just 1 vCPU, and then later more vCPUs can be
> hot-plugged as needed.
> 
> (IV) Terminology
> ================
> 
> (*) Possible CPUs: Total vCPUs that could ever exist in the VM. This includes
>                   any cold-booted CPUs plus any CPUs that could be later
>                   hot-plugged.
>                   - Qemu parameter (-smp maxcpus=N)
> (*) Present CPUs:  Possible CPUs that are ACPI 'present'. These might or might
>                   not be ACPI 'enabled'. 
>                   - Present vCPUs = Possible vCPUs (Always on ARM Arch)
> (*) Enabled CPUs:  Possible CPUs that are ACPI 'present' and 'enabled' and can
>                   now be ‘onlined’ (PSCI) for use by the Guest Kernel. All cold-
>                   booted vCPUs are ACPI 'enabled' at boot. Later, using
>                   device_add, more vCPUs can be hotplugged and made ACPI
>                   'enabled'.
>                   - Qemu parameter (-smp cpus=N). Can be used to specify some
>           cold-booted vCPUs during VM init. Some can be added using the
>           '-device' option.
> 
> (V) Constraints Due to ARMv8 CPU Architecture [+] Other Impediments
> ===================================================================
> 
> A. Physical Limitation to Support CPU Hotplug: (Architectural Constraint)
>   1. ARMv8 CPU architecture does not support the concept of the physical CPU
>      hotplug. 
>      a. There are many per-CPU components like PMU, SVE, MTE, Arch timers, etc.,
>         whose behavior needs to be clearly defined when the CPU is hot(un)plugged.
>         There is no specification for this.
> 
>   2. Other ARM components like GIC, etc., have not been designed to realize
>      physical CPU hotplug capability as of now. For example,
>      a. Every physical CPU has a unique GICC (GIC CPU Interface) by construct.
>         Architecture does not specify what CPU hot(un)plug would mean in
>         context to any of these.
>      b. CPUs/GICC are physically connected to unique GICR (GIC Redistributor).
>         GIC Redistributors are always part of the always-on power domain. Hence,
>         they cannot be powered off as per specification.
> 
> B. Impediments in Firmware/ACPI (Architectural Constraint)
> 
>   1. Firmware has to expose GICC, GICR, and other per-CPU features like PMU,
>      SVE, MTE, Arch Timers, etc., to the OS. Due to the architectural constraint
>      stated in section A1(a), all interrupt controller structures of
>      MADT describing GIC CPU Interfaces and the GIC Redistributors MUST be
>      presented by firmware to the OSPM during boot time.
>   2. Architectures that support CPU hotplug can evaluate the ACPI _MAT method to
>      get this kind of information from the firmware even after boot, and the
>      OSPM has the capability to process these. ARM kernel uses information in MADT
>      interrupt controller structures to identify the number of present CPUs during
>      boot and hence does not allow to change these after boot. The number of
>      present CPUs cannot be changed. It is an architectural constraint!
> 
> C. Impediments in KVM to Support Virtual CPU Hotplug (Architectural Constraint)
> 
>   1. KVM VGIC:
>      a. Sizing of various VGIC resources like memory regions, etc., related to
>         the redistributor happens only once and is fixed at the VM init time
>         and cannot be changed later after initialization has happened.
>         KVM statically configures these resources based on the number of vCPUs
>         and the number/size of redistributor ranges.
>      b. Association between vCPU and its VGIC redistributor is fixed at the
>         VM init time within the KVM, i.e., when redistributor iodevs gets
>         registered. VGIC does not allow to setup/change this association
>         after VM initialization has happened. Physically, every CPU/GICC is
>         uniquely connected with its redistributor, and there is no
>         architectural way to set this up.
>   2. KVM vCPUs:
>      a. Lack of specification means destruction of KVM vCPUs does not exist as
>         there is no reference to tell what to do with other per-vCPU
>         components like redistributors, arch timer, etc.
>      b. In fact, KVM does not implement the destruction of vCPUs for any
>         architecture. This is independent of whether the architecture
>         actually supports CPU Hotplug feature. For example, even for x86 KVM
>         does not implement the destruction of vCPUs.
> 
> D. Impediments in Qemu to Support Virtual CPU Hotplug (KVM Constraints->Arch)
> 
>   1. Qemu CPU Objects MUST be created to initialize all the Host KVM vCPUs to
>      overcome the KVM constraint. KVM vCPUs are created and initialized when Qemu
>      CPU Objects are realized. But keeping the QOM CPU objects realized for
>      'yet-to-be-plugged' vCPUs can create problems when these new vCPUs shall
>      be plugged using device_add and a new QOM CPU object shall be created.
>   2. GICV3State and GICV3CPUState objects MUST be sized over *possible vCPUs*
>      during VM init time while QOM GICV3 Object is realized. This is because
>      KVM VGIC can only be initialized once during init time. But every
>      GICV3CPUState has an associated QOM CPU Object. Later might correspond to
>      vCPU which are 'yet-to-be-plugged' (unplugged at init).
>   3. How should new QOM CPU objects be connected back to the GICV3CPUState
>      objects and disconnected from it in case the CPU is being hot(un)plugged?
>   4. How should 'unplugged' or 'yet-to-be-plugged' vCPUs be represented in the
>      QOM for which KVM vCPU already exists? For example, whether to keep,
>       a. No QOM CPU objects Or
>       b. Unrealized CPU Objects
>   5. How should vCPU state be exposed via ACPI to the Guest? Especially for
>      the unplugged/yet-to-be-plugged vCPUs whose CPU objects might not exist
>      within the QOM but the Guest always expects all possible vCPUs to be
>      identified as ACPI *present* during boot.
>   6. How should Qemu expose GIC CPU interfaces for the unplugged or
>      yet-to-be-plugged vCPUs using ACPI MADT Table to the Guest?
> 
> E. Summary of Approach ([+] Workarounds to problems in sections A, B, C & D)
> 
>   1. At VM Init, pre-create all the possible vCPUs in the Host KVM i.e., even
>      for the vCPUs which are yet-to-be-plugged in Qemu but keep them in the
>      powered-off state.
>   2. After the KVM vCPUs have been initialized in the Host, the KVM vCPU
>      objects corresponding to the unplugged/yet-to-be-plugged vCPUs are parked
>      at the existing per-VM "kvm_parked_vcpus" list in Qemu. (similar to x86)
>   3. GICV3State and GICV3CPUState objects are sized over possible vCPUs during
>      VM init time i.e., when Qemu GIC is realized. This, in turn, sizes KVM VGIC
>      resources like memory regions, etc., related to the redistributors with the
>      number of possible KVM vCPUs. This never changes after VM has initialized.
>   4. Qemu CPU objects corresponding to unplugged/yet-to-be-plugged vCPUs are
>      released post Host KVM CPU and GIC/VGIC initialization.
>   5. Build ACPI MADT Table with the following updates:
>      a. Number of GIC CPU interface entries (=possible vCPUs)
>      b. Present Boot vCPU as MADT.GICC.Enabled=1 (Not hot[un]pluggable) 
>      c. Present hot(un)pluggable vCPUs as MADT.GICC.online-capable=1  
>         - MADT.GICC.Enabled=0 (Mutually exclusive) [6][7]
> - vCPU can be ACPI enabled+onlined after Guest boots (Firmware Policy) 
> - Some issues with above (details in later sections)
>   6. Expose below ACPI Status to Guest kernel:
>      a. Always _STA.Present=1 (all possible vCPUs)
>      b. _STA.Enabled=1 (plugged vCPUs)
>      c. _STA.Enabled=0 (unplugged vCPUs)
>   7. vCPU hotplug *realizes* new QOM CPU object. The following happens:
>      a. Realizes, initializes QOM CPU Object & spawns Qemu vCPU thread.
>      b. Unparks the existing KVM vCPU ("kvm_parked_vcpus" list).
>         - Attaches to QOM CPU object.
>      c. Reinitializes KVM vCPU in the Host.
>         - Resets the core and sys regs, sets defaults, etc.
>      d. Runs KVM vCPU (created with "start-powered-off").
> - vCPU thread sleeps (waits for vCPU reset via PSCI). 
>      e. Updates Qemu GIC.
>         - Wires back IRQs related to this vCPU.
>         - GICV3CPUState association with QOM CPU Object.
>      f. Updates [6] ACPI _STA.Enabled=1.
>      g. Notifies Guest about the new vCPU (via ACPI GED interface).
> - Guest checks _STA.Enabled=1.
> - Guest adds processor (registers CPU with LDM) [3].
>      h. Plugs the QOM CPU object in the slot.
>         - slot-number = cpu-index {socket, cluster, core, thread}.
>      i. Guest online's vCPU (CPU_ON PSCI call over HVC/SMC).
>         - KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
>         - Qemu powers-on KVM vCPU in the Host.
>   8. vCPU hot-unplug *unrealizes* QOM CPU Object. The following happens:
>      a. Notifies Guest (via ACPI GED interface) vCPU hot-unplug event.
>         - Guest offline's vCPU (CPU_OFF PSCI call over HVC/SMC).
>      b. KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
>         - Qemu powers-off the KVM vCPU in the Host.
>      c. Guest signals *Eject* vCPU to Qemu.
>      d. Qemu updates [6] ACPI _STA.Enabled=0.
>      e. Updates GIC.
>         - Un-wires IRQs related to this vCPU.
>         - GICV3CPUState association with new QOM CPU Object is updated.
>      f. Unplugs the vCPU.
> - Removes from slot.
>         - Parks KVM vCPU ("kvm_parked_vcpus" list).
>         - Unrealizes QOM CPU Object & joins back Qemu vCPU thread.
> - Destroys QOM CPU object.
>      g. Guest checks ACPI _STA.Enabled=0.
>         - Removes processor (unregisters CPU with LDM) [3].
> 
> F. Work Presented at KVM Forum Conferences:
> ==========================================
> 
> Details of the above work have been presented at KVMForum2020 and KVMForum2023
> conferences. Slides & video are available at the links below:
> a. KVMForum 2023
>   - Challenges Revisited in Supporting Virt CPU Hotplug on architectures that don't Support CPU Hotplug (like ARM64).
>     https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>     https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
>     https://www.youtube.com/watch?v=hyrw4j2D6I0&t=23970s
>     https://kvm-forum.qemu.org/2023/talk/9SMPDQ/
> b. KVMForum 2020
>   - Challenges in Supporting Virtual CPU Hotplug on SoC Based Systems (like ARM64) - Salil Mehta, Huawei.
>     https://sched.co/eE4m
> 
> (VI) Commands Used
> ==================
> 
> A. Qemu launch commands to init the machine:
> 
>    $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
>      -cpu host -smp cpus=4,maxcpus=6 \
>      -m 300M \
>      -kernel Image \
>      -initrd rootfs.cpio.gz \
>      -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
>      -nographic \
>      -bios QEMU_EFI.fd \
> 
> B. Hot-(un)plug related commands:
> 
>  # Hotplug a host vCPU (accel=kvm):
>    $ device_add host-arm-cpu,id=core4,core-id=4
> 
>  # Hotplug a vCPU (accel=tcg):
>    $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
> 
>  # Delete the vCPU:
>    $ device_del core4
> 
> Sample output on guest after boot:
> 
>    $ cat /sys/devices/system/cpu/possible
>    0-5
>    $ cat /sys/devices/system/cpu/present
>    0-5
>    $ cat /sys/devices/system/cpu/enabled
>    0-3
>    $ cat /sys/devices/system/cpu/online
>    0-1
>    $ cat /sys/devices/system/cpu/offline
>    2-5
> 
> Sample output on guest after hotplug of vCPU=4:
> 
>    $ cat /sys/devices/system/cpu/possible
>    0-5
>    $ cat /sys/devices/system/cpu/present
>    0-5
>    $ cat /sys/devices/system/cpu/enabled
>    0-4
>    $ cat /sys/devices/system/cpu/online
>    0-1,4
>    $ cat /sys/devices/system/cpu/offline
>    2-3,5
> 
>    Note: vCPU=4 was explicitly 'onlined' after hot-plug
>    $ echo 1 > /sys/devices/system/cpu/cpu4/online
> 
> (VII) Latest Repository
> =======================
> 
> (*) Latest Qemu RFC V3 (Architecture Specific) patch set:
>    https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v3
> (*) Latest Qemu V13 (Architecture Agnostic) patch set:
>    https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v3.arch.agnostic.v13
> (*) QEMU changes for vCPU hotplug can be cloned from below site:
>    https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2
> (*) Guest Kernel changes (by James Morse, ARM) are available here:
>    https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git virtual_cpu_hotplug/rfc/v2
> (*) Leftover patches of the kernel are available here:
>    https://lore.kernel.org/lkml/20240529133446.28446-1-Jonathan.Cameron@huawei.com/
>    https://github.com/salil-mehta/linux/commits/virtual_cpu_hotplug/rfc/v6.jic/ (not latest)
> 
> (VIII) KNOWN ISSUES
> ===================
> 
> 1. Migration has been lightly tested but has been found working.
> 2. TCG is broken.
> 3. HVF and qtest are not supported yet.
> 4. ACPI MADT Table flags [7] MADT.GICC.Enabled and MADT.GICC.online-capable are
>   mutually exclusive, i.e., as per the change [6], a vCPU cannot be both
>   GICC.Enabled and GICC.online-capable. This means:
>      [ Link: https://bugzilla.tianocore.org/show_bug.cgi?id=3706 ]
>   a. If we have to support hot-unplug of the cold-booted vCPUs, then these MUST
>      be specified as GICC.online-capable in the MADT Table during boot by the
>      firmware/Qemu. But this requirement conflicts with the requirement to
>      support new Qemu changes with legacy OS that don't understand
>      MADT.GICC.online-capable Bit. Legacy OS during boot time will ignore this
>      bit, and hence these vCPUs will not appear on such OS. This is unexpected
>      behavior.
>   b. In case we decide to specify vCPUs as MADT.GICC.Enabled and try to unplug
>      these cold-booted vCPUs from OS (which in actuality should be blocked by
>      returning error at Qemu), then features like 'kexec' will break.
>   c. As I understand, removal of the cold-booted vCPUs is a required feature
>      and x86 world allows it.
>   d. Hence, either we need a specification change to make the MADT.GICC.Enabled
>      and MADT.GICC.online-capable Bits NOT mutually exclusive or NOT support
>      the removal of cold-booted vCPUs. In the latter case, a check can be introduced
>      to bar the users from unplugging vCPUs, which were cold-booted, using QMP
>      commands. (Needs discussion!)
>      Please check the patch part of this patch set:
>      [hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled].
> 
>      NOTE: This is definitely not a blocker!
> 5. Code related to the notification to GICV3 about the hot(un)plug of a vCPU event
>   might need further discussion.
> 
> 
> (IX) THINGS TO DO
> =================
> 
> 1. Fix issues related to TCG/Emulation support. (Not a blocker)
> 2. Comprehensive Testing is in progress. (Positive feedback from Oracle & Ampere)
> 3. Qemu Documentation (.rst) needs to be updated.
> 4. Fix qtest, HVF Support (Future).
> 5. Update the design issue related to ACPI MADT.GICC flags discussed in known
>   issues. This might require UEFI ACPI specification change (Not a blocker).
> 6. Add ACPI _OSC 'Query' support. Only part of _OSC support exists now. (Not a blocker).
> 
> The above is *not* a complete list. Will update later!
> 
> Best regards,  
> Salil.
> 
> (X) DISCLAIMER
> ==============
> 
> This work is an attempt to present a proof-of-concept of the ARM64 vCPU hotplug
> implementation to the community. This is *not* production-level code and might
> have bugs. Comprehensive testing is being done on HiSilicon Kunpeng920 SoC,
> Oracle, and Ampere servers. We are nearing stable code and a non-RFC
> version shall be floated soon.
> 
> This work is *mostly* in the lines of the discussions that have happened in the
> previous years [see refs below] across different channels like the mailing list,
> Linaro Open Discussions platform, and various conferences like KVMForum, etc. This
> RFC is being used as a way to verify the idea mentioned in this cover letter and
> to get community views. Once this has been agreed upon, a formal patch shall be
> posted to the mailing list for review.
> 
> [The concept being presented has been found to work!]
> 
> (XI) ORGANIZATION OF PATCHES
> ============================
> 
> A. Architecture *specific* patches:
> 
>   [Patch 1-8, 17, 27, 29] logic required during machine init.
>    (*) Some validation checks.
>    (*) Introduces core-id property and some util functions required later.
>    (*) Logic to pre-create vCPUs.
>    (*) GIC initialization pre-sized with possible vCPUs.
>    (*) Some refactoring to have common hot and cold plug logic together.
>    (*) Release of disabled QOM CPU objects in post_cpu_init().
>    (*) Support of ACPI _OSC method to negotiate platform hotplug capabilities.
>   [Patch 9-16] logic related to ACPI at machine init time.
>    (*) Changes required to Enable ACPI for CPU hotplug.
>    (*) Initialization of ACPI GED framework to cater to CPU Hotplug Events.
>    (*) ACPI MADT/MAT changes.
>   [Patch 18-26] logic required during vCPU hot-(un)plug.
>    (*) Basic framework changes to support vCPU hot-(un)plug.
>    (*) ACPI GED changes for hot-(un)plug hooks.
>    (*) Wire-unwire the IRQs.
>    (*) GIC notification logic.
>    (*) ARMCPU unrealize logic.
>    (*) Handling of SMCC Hypercall Exits by KVM to Qemu.
> 
> B. Architecture *agnostic* patches:
> 
>   [PATCH V13 0/8] Add architecture agnostic code to support vCPU Hotplug.
>   https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
>    (*) Refactors vCPU create, Parking, unparking logic of vCPUs, and addition of traces.
>    (*) Build ACPI AML related to CPU control dev.
>    (*) Changes related to the destruction of CPU Address Space.
>    (*) Changes related to the uninitialization of GDB Stub.
>    (*) Updating of Docs.
> 
> (XII) REFERENCES
> ================
> 
> [1] https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
> [2] https://lore.kernel.org/linux-arm-kernel/20200625133757.22332-1-salil.mehta@huawei.com/
> [3] https://lore.kernel.org/lkml/20230203135043.409192-1-james.morse@arm.com/
> [4] https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/
> [5] https://lore.kernel.org/all/20230404154050.2270077-1-oliver.upton@linux.dev/
> [6] https://bugzilla.tianocore.org/show_bug.cgi?id=3706
> [7] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gic-cpu-interface-gicc-structure
> [8] https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> [9] https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler
> [10] https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html
> [11] https://lkml.org/lkml/2019/7/10/235
> [12] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
> [13] https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06517.html
> [14] https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/thread/7CGL6JTACPUZEYQC34CZ2ZBWJGSR74WE/
> [15] http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01168.html
> [16] https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00131.html
> [17] https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/
> [18] https://lore.kernel.org/lkml/20210608154805.216869-1-jean-philippe@linaro.org/
> [19] https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/ 
> [20] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gicc-cpu-interface-flags
> [21] https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
> [22] https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
> 
> (XIII) ACKNOWLEDGEMENTS
> =======================
> 
> I would like to take this opportunity to thank below people for various
> discussions with me over different channels during the development:
> 
> Marc Zyngier (Google)               Catalin Marinas (ARM),         
> James Morse(ARM),                   Will Deacon (Google), 
> Jean-Phillipe Brucker (Linaro),     Sudeep Holla (ARM),
> Lorenzo Pieralisi (Linaro),         Gavin Shan (Redhat), 
> Jonathan Cameron (Huawei),          Darren Hart (Ampere),
> Igor Mamedov (Redhat),              Ilkka Koskinen (Ampere),
> Andrew Jones (Redhat),              Karl Heubaum (Oracle),
> Keqian Zhu (Huawei),                Miguel Luis (Oracle),
> Xiongfeng Wang (Huawei),            Vishnu Pajjuri (Ampere),
> Shameerali Kolothum (Huawei)        Russell King (Oracle)
> Xuwei/Joy (Huawei),                 Peter Maydel (Linaro)
> Zengtao/Prime (Huawei),             And all those whom I have missed! 
> 
> Many thanks to the following people for their current or past contributions:
> 
> 1. James Morse (ARM)
>   (Current Kernel part of vCPU Hotplug Support on AARCH64)
> 2. Jean-Philippe Brucker (Linaro)
>   (Prototyped one of the earlier PSCI-based POC [17][18] based on RFC V1)
> 3. Keqian Zhu (Huawei)
>   (Co-developed Qemu prototype)
> 4. Xiongfeng Wang (Huawei)
>   (Co-developed an earlier kernel prototype with me)
> 5. Vishnu Pajjuri (Ampere)
>   (Verification on Ampere ARM64 Platforms + fixes)
> 6. Miguel Luis (Oracle)
>   (Verification on Oracle ARM64 Platforms + fixes)
> 7. Russell King (Oracle) & Jonathan Cameron (Huawei)
>   (Helping in upstreaming James Morse's Kernel patches).
> 
> (XIV) Change Log:
> =================
> 
> RFC V2 -> RFC V3:
> -----------------
> 1. Miscellaneous:
>   - Split the RFC V2 into arch-agnostic and arch-specific patch sets.
> 2. Addressed Gavin Shan's (RedHat) comments:
>   - Made CPU property accessors inline.
>     https://lore.kernel.org/qemu-devel/6cd28639-2cfa-f233-c6d9-d5d2ec5b1c58@redhat.com/
>   - Collected Reviewed-bys [PATCH RFC V2 4/37, 14/37, 22/37].
>   - Dropped the patch as it was not required after init logic was refactored.
>     https://lore.kernel.org/qemu-devel/4fb2eef9-6742-1eeb-721a-b3db04b1be97@redhat.com/
>   - Fixed the range check for the core during vCPU Plug.
>     https://lore.kernel.org/qemu-devel/1c5fa24c-6bf3-750f-4f22-087e4a9311af@redhat.com/
>   - Added has_hotpluggable_vcpus check to make build_cpus_aml() conditional.
>     https://lore.kernel.org/qemu-devel/832342cb-74bc-58dd-c5d7-6f995baeb0f2@redhat.com/
>   - Fixed the states initialization in cpu_hotplug_hw_init() to accommodate previous refactoring.
>     https://lore.kernel.org/qemu-devel/da5e5609-1883-8650-c7d8-6868c7b74f1c@redhat.com/
>   - Fixed typos.
>     https://lore.kernel.org/qemu-devel/eb1ac571-7844-55e6-15e7-3dd7df21366b@redhat.com/
>   - Removed the unnecessary 'goto fail'.
>     https://lore.kernel.org/qemu-devel/4d8980ac-f402-60d4-fe52-787815af8a7d@redhat.com/#t
>   - Added check for hotpluggable vCPUs in the _OSC method.
>     https://lore.kernel.org/qemu-devel/20231017001326.FUBqQ1PTowF2GxQpnL3kIW0AhmSqbspazwixAHVSi6c@z/
> 3. Addressed Shaoqin Huang's (Intel) comments:
>   - Fixed the compilation break due to the absence of a call to virt_cpu_properties() missing
>     along with its definition.
>     https://lore.kernel.org/qemu-devel/3632ee24-47f7-ae68-8790-26eb2cf9950b@redhat.com/
> 4. Addressed Jonathan Cameron's (Huawei) comments:
>   - Gated the 'disabled vcpu message' for GIC version < 3.
>     https://lore.kernel.org/qemu-devel/20240116155911.00004fe1@Huawei.com/
> 
> RFC V1 -> RFC V2:
> -----------------
> 1. Addressed James Morse's (ARM) requirement as per Linaro Open Discussion:
>   - Exposed all possible vCPUs as always ACPI _STA.present and available during boot time.
>   - Added the _OSC handling as required by James's patches.
>   - Introduction of 'online-capable' bit handling in the flag of MADT GICC.
>   - SMCC Hypercall Exit handling in Qemu.
> 2. Addressed Marc Zyngier's comment:
>   - Fixed the note about GIC CPU Interface in the cover letter.
> 3. Addressed issues raised by Vishnu Pajjuru (Ampere) & Miguel Luis (Oracle) during testing:
>   - Live/Pseudo Migration crashes.
> 4. Others:
>   - Introduced the concept of persistent vCPU at QOM.
>   - Introduced wrapper APIs of present, possible, and persistent.
>   - Change at ACPI hotplug H/W init leg accommodating initializing is_present and is_enabled states.
>   - Check to avoid unplugging cold-booted vCPUs.
>   - Disabled hotplugging with TCG/HVF/QTEST.
>   - Introduced CPU Topology, {socket, cluster, core, thread}-id property.
>   - Extract virt CPU properties as a common virt_vcpu_properties() function.
> 
> Author Salil Mehta (1):
>  target/arm/kvm,tcg: Register/Handle SMCCC hypercall exits to VMM/Qemu
> 
> Jean-Philippe Brucker (2):
>  hw/acpi: Make _MAT method optional
>  target/arm/kvm: Write CPU state back to KVM on reset
> 
> Miguel Luis (1):
>  tcg/mttcg: enable threads to unregister in tcg_ctxs[]
> 
> Salil Mehta (25):
>  arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id
>    property
>  cpu-common: Add common CPU utility for possible vCPUs
>  hw/arm/virt: Limit number of possible vCPUs for unsupported Accel or
>    GIC Type
>  hw/arm/virt: Move setting of common CPU properties in a function
>  arm/virt,target/arm: Machine init time change common to vCPU
>    {cold|hot}-plug
>  arm/virt,kvm: Pre-create disabled possible vCPUs @machine init
>  arm/virt,gicv3: Changes to pre-size GIC with possible vcpus @machine
>    init
>  arm/virt: Init PMU at host for all possible vcpus
>  arm/acpi: Enable ACPI support for vcpu hotplug
>  arm/virt: Add cpu hotplug events to GED during creation
>  arm/virt: Create GED dev before *disabled* CPU Objs are destroyed
>  arm/virt/acpi: Build CPUs AML with CPU Hotplug support
>  arm/virt: Make ARM vCPU *present* status ACPI *persistent*
>  hw/acpi: ACPI/AML Changes to reflect the correct _STA.{PRES,ENA} Bits
>    to Guest
>  hw/arm: MADT Tbl change to size the guest with possible vCPUs
>  arm/virt: Release objects for *disabled* possible vCPUs after init
>  arm/virt: Add/update basic hot-(un)plug framework
>  arm/virt: Changes to (un)wire GICC<->vCPU IRQs during hot-(un)plug
>  hw/arm,gicv3: Changes to update GIC with vCPU hot-plug notification
>  hw/intc/arm-gicv3*: Changes required to (re)init the vCPU register
>    info
>  arm/virt: Update the guest(via GED) about CPU hot-(un)plug events
>  hw/arm: Changes required for reset and to support next boot
>  target/arm: Add support of *unrealize* ARMCPU during vCPU Hot-unplug
>  hw/arm: Support hotplug capability check using _OSC method
>  hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled
> 
> accel/tcg/tcg-accel-ops-mttcg.c    |   1 +
> cpu-common.c                       |  37 ++
> hw/acpi/cpu.c                      |  62 +-
> hw/acpi/generic_event_device.c     |  11 +
> hw/arm/Kconfig                     |   1 +
> hw/arm/boot.c                      |   2 +-
> hw/arm/virt-acpi-build.c           | 113 +++-
> hw/arm/virt.c                      | 877 +++++++++++++++++++++++------
> hw/core/gpio.c                     |   2 +-
> hw/intc/arm_gicv3.c                |   1 +
> hw/intc/arm_gicv3_common.c         |  66 ++-
> hw/intc/arm_gicv3_cpuif.c          | 269 +++++----
> hw/intc/arm_gicv3_cpuif_common.c   |   5 +
> hw/intc/arm_gicv3_kvm.c            |  39 +-
> hw/intc/gicv3_internal.h           |   2 +
> include/hw/acpi/cpu.h              |   2 +
> include/hw/arm/boot.h              |   2 +
> include/hw/arm/virt.h              |  38 +-
> include/hw/core/cpu.h              |  78 +++
> include/hw/intc/arm_gicv3_common.h |  23 +
> include/hw/qdev-core.h             |   2 +
> include/tcg/startup.h              |   7 +
> target/arm/arm-powerctl.c          |  51 +-
> target/arm/cpu-qom.h               |  18 +-
> target/arm/cpu.c                   | 112 ++++
> target/arm/cpu.h                   |  18 +
> target/arm/cpu64.c                 |  15 +
> target/arm/gdbstub.c               |   6 +
> target/arm/helper.c                |  27 +-
> target/arm/internals.h             |  14 +-
> target/arm/kvm.c                   | 146 ++++-
> target/arm/kvm_arm.h               |  25 +
> target/arm/meson.build             |   1 +
> target/arm/{tcg => }/psci.c        |   8 +
> target/arm/tcg/meson.build         |   4 -
> tcg/tcg.c                          |  24 +
> 36 files changed, 1749 insertions(+), 360 deletions(-)
> rename target/arm/{tcg => }/psci.c (97%)
> 
> -- 
> 2.34.1
>

Salil Mehta July 1, 2024, 4:30 p.m. UTC | #4

HI Miguel,

>  From: Miguel Luis <miguel.luis@oracle.com>
>  Sent: Monday, July 1, 2024 12:39 PM
>  To: Salil Mehta <salil.mehta@huawei.com>
>  
>  Hi Salil,
>  
>  > On 13 Jun 2024, at 23:36, Salil Mehta <salil.mehta@huawei.com> wrote:
>  >
>  > PROLOGUE
>  > ========
>  >
>  > To assist in review and set the right expectations from this RFC,
>  > please first read the sections *APPENDED AT THE END* of this cover
>  letter:
>  >
>  > 1. Important *DISCLAIMER* [Section (X)] 2. Work presented at
>  KVMForum
>  > Conference (slides available) [Section (V)F] 3. Organization of
>  > patches [Section (XI)] 4. References [Section (XII)] 5. Detailed TODO
>  > list of leftover work or work-in-progress [Section (IX)]
>  >
>  > There has been interest shown by other organizations in adapting this
>  > series for their architecture. Hence, RFC V2 [21] has been split into
>  > architecture
>  > *agnostic* [22] and *specific* patch sets.
>  >
>  > This is an ARM architecture-specific patch set carved out of RFC V2.
>  > Please check section (XI)B for details of architecture agnostic patches.
>  >
>  > SECTIONS [I - XIII] are as follows:
>  >
>  > (I) Key Changes [details in last section (XIV)]
>  > ==============================================
>  >
>  > RFC V2 -> RFC V3
>  >
>  > 1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3)
>  patch sets.
>  > 2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang
>  (RedHat), Philippe Mathieu-Daudé (Linaro),
>  >   Jonathan Cameron (Huawei), Zhao Liu (Intel).
>  >
>  
>  I’ve tested this series along with v10 kernel patches from [1] on the
>  following items:
>  
>  Boot.
>  Hotplug up to maxcpus.
>  Hot unplug down to the number of boot cpus.
>  Hotplug vcpus then migrate to a new VM.
>  Hot unplug down to the number of boot cpus then migrate to a new VM.
>  Up to 6 successive live migrations.
>  
>  And in which LGTM.
>  
>  Please feel free to add,
>  Tested-by: Miguel Luis <miguel.luis@oracle.com>

Many thanks for your efforts. Appreciate this.


Best
Salil.


>  
>  Regards,
>  Miguel
>  
>  [1]
>  https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for
>  -next/vcpu-hotplug
>  
>  > RFC V1 -> RFC V2

Gavin Shan Aug. 7, 2024, 9:53 a.m. UTC | #5

Hi Salil,

With this series and latest upstream Linux kernel (host), I ran into core dump as below.
I'm not sure if it's a known issue or not.

# uname -r
6.11.0-rc2-gavin+
# /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel kvm \
   -machine virt,gic-version=host,nvdimm=on -cpu host                 \
   -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1       \
   -m 4096M,slots=16,maxmem=128G                                      \
   -object memory-backend-ram,id=mem0,size=2048M                      \
   -object memory-backend-ram,id=mem1,size=2048M                      \
   -numa node,nodeid=0,memdev=mem0,cpus=0-0                           \
   -numa node,nodeid=1,memdev=mem1,cpus=1-1                           \
     :
qemu-system-aarch64: Failed to initialize host vcpu 1
Aborted (core dumped)

# gdb /var/lib/systemd/coredump/core.0 /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
(gdb) bt
#0  0x0000ffff9eec42e8 in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6
#2  0x0000ffff9ee69034 in abort () at /lib64/libc.so.6
#3  0x0000aaaac71152c0 in kvm_arm_create_host_vcpu (cpu=0xaaaae4c0cb80)
     at ../target/arm/kvm.c:1093
#4  0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at ../hw/arm/virt.c:2534
#5  0x0000aaaac6b0d31c in machine_run_board_init
     (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at ../hw/core/machine.c:1576
#6  0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620
#7  0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120 <error_fatal>)
     at ../system/vl.c:2712
#8  0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at ../system/vl.c:3758
#9  0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at ../system/main.c:47

Thanks,
Gavin

Salil Mehta Aug. 7, 2024, 1:27 p.m. UTC | #6

Hi Gavin,

Let me figure out this. Have you also included the below patch along with the
architecture agnostic patch-set accepted in this Qemu cycle?

https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@linaro.org/


Thanks
Salil.

>  From: Gavin Shan <gshan@redhat.com>
>  Sent: Wednesday, August 7, 2024 10:54 AM
>  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  qemu-arm@nongnu.org; mst@redhat.com
>  
>  Hi Salil,
>  
>  With this series and latest upstream Linux kernel (host), I ran into core
>  dump as below.
>  I'm not sure if it's a known issue or not.
>  
>  # uname -r
>  6.11.0-rc2-gavin+
>  # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel
>  kvm \
>     -machine virt,gic-version=host,nvdimm=on -cpu host                 \
>     -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1       \
>     -m 4096M,slots=16,maxmem=128G                                      \
>     -object memory-backend-ram,id=mem0,size=2048M                      \
>     -object memory-backend-ram,id=mem1,size=2048M                      \
>     -numa node,nodeid=0,memdev=mem0,cpus=0-0                           \
>     -numa node,nodeid=1,memdev=mem1,cpus=1-1                           \
>       :
>  qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core
>  dumped)
>  
>  # gdb /var/lib/systemd/coredump/core.0
>  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
>  (gdb) bt
>  #0  0x0000ffff9eec42e8 in __pthread_kill_implementation () at
>  /lib64/libc.so.6
>  #1  0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6
>  #2  0x0000ffff9ee69034 in abort () at /lib64/libc.so.6
>  #3  0x0000aaaac71152c0 in kvm_arm_create_host_vcpu
>  (cpu=0xaaaae4c0cb80)
>       at ../target/arm/kvm.c:1093
>  #4  0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at
>  ../hw/arm/virt.c:2534
>  #5  0x0000aaaac6b0d31c in machine_run_board_init
>       (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at
>  ../hw/core/machine.c:1576
>  #6  0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620
>  #7  0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120
>  <error_fatal>)
>       at ../system/vl.c:2712
>  #8  0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at
>  ../system/vl.c:3758
>  #9  0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at
>  ../system/main.c:47
>  
>  Thanks,
>  Gavin
>

Salil Mehta Aug. 7, 2024, 4:07 p.m. UTC | #7

Hi Gavin,

I tested ARM arch specific patches with the latest Qemu which contains below mentioned
fix and I cannot reproduce the crash. I used kernel linux-6.11-rc2 and it booted successfully.
Though I did see a kernel crash on attempting to hotplug first vCPU. 

(qemu) device_add host-arm-cpu,id=core4,core-id=4
(qemu) [  365.125477] Unable to handle kernel write to read-only memory at virtual address ffff800081ba4190
[  365.126366] Mem abort info:
[  365.126640]   ESR = 0x000000009600004e
[  365.127010]   EC = 0x25: DABT (current EL), IL = 32 bits
[  365.127524]   SET = 0, FnV = 0
[  365.127822]   EA = 0, S1PTW = 0
[  365.128130]   FSC = 0x0e: level 2 permission fault
[  365.128598] Data abort info:
[  365.128881]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
[  365.129447]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  365.129943]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  365.130442] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000045830000
[  365.131068] [ffff800081ba4190] pgd=0000000000000000, p4d=10000000467df003, pud=10000000467e0003, pmd=0060000045600781
[  365.132069] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
[  365.132661] Modules linked in:
[  365.132952] CPU: 0 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.11.0-rc2 #228
[  365.133699] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  365.134415] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  365.134969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  365.135679] pc : register_cpu+0x138/0x250
[  365.136093] lr : register_cpu+0x120/0x250
[  365.136506] sp : ffff800082cbba10
[  365.136847] x29: ffff800082cbba10 x28: ffff8000826479c0 x27: ffff000000a7e098
[  365.137575] x26: ffff8000827c2838 x25: 0000000000000004 x24: ffff80008264d9b0
[  365.138311] x23: 0000000000000004 x22: ffff000012a482d0 x21: ffff800081e30a00
[  365.139037] x20: 0000000000000000 x19: ffff800081ba4190 x18: ffffffffffffffff
[  365.139764] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000001adaa1c
[  365.140490] x14: ffffffffffffffff x13: ffff000001ada2e0 x12: 0000000000000000
[  365.141216] x11: ffff800081e32780 x10: 0000000000000000 x9 : 0000000000000001
[  365.141945] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 6f7274726e737460
[  365.142668] x5 : ffff0000027b1920 x4 : ffff0000027b1b40 x3 : ffff0000027b1880
[  365.143400] x2 : ffff0000001933c0 x1 : ffff800081ba4190 x0 : 0000000000000010
[  365.144129] Call trace:
[  365.144382]  register_cpu+0x138/0x250
[  365.144759]  arch_register_cpu+0x7c/0xc4
[  365.145166]  acpi_processor_add+0x468/0x590
[  365.145594]  acpi_bus_attach+0x1ac/0x2dc
[  365.146002]  acpi_dev_for_one_check+0x34/0x40
[  365.146449]  device_for_each_child+0x5c/0xb0
[  365.146887]  acpi_dev_for_each_child+0x3c/0x64
[  365.147341]  acpi_bus_attach+0x78/0x2dc
[  365.147734]  acpi_bus_scan+0x68/0x208
[  365.148110]  acpi_scan_rescan_bus+0x4c/0x78
[  365.148537]  acpi_device_hotplug+0x1f8/0x460
[  365.148975]  acpi_hotplug_work_fn+0x24/0x3c
[  365.149402]  process_one_work+0x150/0x294
[  365.149817]  worker_thread+0x2e4/0x3ec
[  365.150199]  kthread+0x118/0x11c
[  365.150536]  ret_from_fork+0x10/0x20
[  365.150903] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f)
[  365.151527] ---[ end trace 0000000000000000 ]---


Do let me know how the Qemu with Arch specific patches goes.

Thanks
Salil.

>  From: Salil Mehta
>  Sent: Wednesday, August 7, 2024 2:27 PM
>  To: 'Gavin Shan' <gshan@redhat.com>; qemu-devel@nongnu.org; qemu-
>  arm@nongnu.org; mst@redhat.com
>  
>  Hi Gavin,
>  
>  Let me figure out this. Have you also included the below patch along with
>  the architecture agnostic patch-set accepted in this Qemu cycle?
>  
>  https://lore.kernel.org/all/20240801142322.3948866-3-
>  peter.maydell@linaro.org/
>  
>  
>  Thanks
>  Salil.
>  
>  >  From: Gavin Shan <gshan@redhat.com>
>  >  Sent: Wednesday, August 7, 2024 10:54 AM
>  >  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  > qemu-arm@nongnu.org; mst@redhat.com
>  >
>  >  Hi Salil,
>  >
>  >  With this series and latest upstream Linux kernel (host), I ran into
>  > core  dump as below.
>  >  I'm not sure if it's a known issue or not.
>  >
>  >  # uname -r
>  >  6.11.0-rc2-gavin+
>  >  # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel
>  kvm
>  > \
>  >     -machine virt,gic-version=host,nvdimm=on -cpu host                 \
>  >     -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1       \
>  >     -m 4096M,slots=16,maxmem=128G                                      \
>  >     -object memory-backend-ram,id=mem0,size=2048M                      \
>  >     -object memory-backend-ram,id=mem1,size=2048M                      \
>  >     -numa node,nodeid=0,memdev=mem0,cpus=0-0                           \
>  >     -numa node,nodeid=1,memdev=mem1,cpus=1-1                           \
>  >       :
>  >  qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core
>  >  dumped)
>  >
>  >  # gdb /var/lib/systemd/coredump/core.0
>  >  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
>  >  (gdb) bt
>  >  #0  0x0000ffff9eec42e8 in __pthread_kill_implementation () at
>  >  /lib64/libc.so.6
>  >  #1  0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6
>  >  #2  0x0000ffff9ee69034 in abort () at /lib64/libc.so.6
>  >  #3  0x0000aaaac71152c0 in kvm_arm_create_host_vcpu
>  >  (cpu=0xaaaae4c0cb80)
>  >       at ../target/arm/kvm.c:1093
>  >  #4  0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at
>  >  ../hw/arm/virt.c:2534
>  >  #5  0x0000aaaac6b0d31c in machine_run_board_init
>  >       (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at
>  >  ../hw/core/machine.c:1576
>  >  #6  0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620
>  >  #7  0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120
>  >  <error_fatal>)
>  >       at ../system/vl.c:2712
>  >  #8  0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at
>  >  ../system/vl.c:3758
>  >  #9  0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at
>  >  ../system/main.c:47
>  >
>  >  Thanks,
>  >  Gavin
>  >

Gavin Shan Aug. 7, 2024, 11:41 p.m. UTC | #8

Hi Salil,

On 8/7/24 11:27 PM, Salil Mehta wrote:
> 
> Let me figure out this. Have you also included the below patch along with the
> architecture agnostic patch-set accepted in this Qemu cycle?
> 
> https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@linaro.org/
>  

There are no vCPU fd to be parked and unparked when the core dump happenes. I
tried it, but didn't help. I added more debugging messages and the core dump is
triggered in the following path. It seems 'cpu->sve_vq.map' isn't correct since
it's populated in CPU realization path, and those non-cold-booted CPUs aren't
realized in the booting stage.

# dmesg | grep "Scalable Vector Extension"
[    0.117121] CPU features: detected: Scalable Vector Extension

# start_vm
===> machvirt_init: create CPU object (idx=0, type=[host-arm-cpu])
cpu_common_initfn
arm_cpu_initfn
aarch64_cpu_initfn
aarch64_cpu_instance_init
aarch64_host_initfn
arm_cpu_post_init
===> machvirt_init: realize CPU object (idx=0)
virt_cpu_pre_plug
arm_cpu_realizefn
cpu_common_realizefn
virt_cpu_plug
===> machvirt_init: create CPU object (idx=1, type=[host-arm-cpu])
cpu_common_initfn
arm_cpu_initfn
aarch64_cpu_initfn
aarch64_cpu_instance_init
aarch64_host_initfn
arm_cpu_post_init
kvm_arch_init_vcpu: Error -22 from kvm_arm_sve_set_vls()
qemu-system-aarch64: Failed to initialize host vcpu 1
Aborted (core dumped)

Thanks,
Gavin

>>   
>>   With this series and latest upstream Linux kernel (host), I ran into core
>>   dump as below.
>>   I'm not sure if it's a known issue or not.
>>   
>>   # uname -r
>>   6.11.0-rc2-gavin+
>>   # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel
>>   kvm \
>>      -machine virt,gic-version=host,nvdimm=on -cpu host                 \
>>      -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1       \
>>      -m 4096M,slots=16,maxmem=128G                                      \
>>      -object memory-backend-ram,id=mem0,size=2048M                      \
>>      -object memory-backend-ram,id=mem1,size=2048M                      \
>>      -numa node,nodeid=0,memdev=mem0,cpus=0-0                           \
>>      -numa node,nodeid=1,memdev=mem1,cpus=1-1                           \
>>        :
>>   qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core
>>   dumped)
>>   
>>   # gdb /var/lib/systemd/coredump/core.0
>>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
>>   (gdb) bt
>>   #0  0x0000ffff9eec42e8 in __pthread_kill_implementation () at
>>   /lib64/libc.so.6
>>   #1  0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6
>>   #2  0x0000ffff9ee69034 in abort () at /lib64/libc.so.6
>>   #3  0x0000aaaac71152c0 in kvm_arm_create_host_vcpu
>>   (cpu=0xaaaae4c0cb80)
>>        at ../target/arm/kvm.c:1093
>>   #4  0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at
>>   ../hw/arm/virt.c:2534
>>   #5  0x0000aaaac6b0d31c in machine_run_board_init
>>        (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at
>>   ../hw/core/machine.c:1576
>>   #6  0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620
>>   #7  0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120
>>   <error_fatal>)
>>        at ../system/vl.c:2712
>>   #8  0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at
>>   ../system/vl.c:3758
>>   #9  0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at
>>   ../system/main.c:47
>>   
>>   Thanks,
>>   Gavin
>>   
>

Salil Mehta Aug. 7, 2024, 11:48 p.m. UTC | #9

Hi Gavin,

Thanks for further information.

>  From: Gavin Shan <gshan@redhat.com>
>  Sent: Thursday, August 8, 2024 12:41 AM
>  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  qemu-arm@nongnu.org; mst@redhat.com
>  
>  Hi Salil,
>  
>  On 8/7/24 11:27 PM, Salil Mehta wrote:
>  >
>  > Let me figure out this. Have you also included the below patch along
>  > with the architecture agnostic patch-set accepted in this Qemu cycle?
>  >
>  > https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@lin
>  > aro.org/
>  >
>  
>  There are no vCPU fd to be parked and unparked when the core dump
>  happenes. I tried it, but didn't help. I added more debugging messages and
>  the core dump is triggered in the following path. It seems 'cpu-
>  >sve_vq.map' isn't correct since it's populated in CPU realization path, and
>  those non-cold-booted CPUs aren't realized in the booting stage.


Ah, I've to fix the SVE support. I'm already working on it and will be part of
the RFC V4.

Have you tried booting VM by disabling the SVE support?


>  
>  # dmesg | grep "Scalable Vector Extension"
>  [    0.117121] CPU features: detected: Scalable Vector Extension
>  
>  # start_vm
>  ===> machvirt_init: create CPU object (idx=0, type=[host-arm-cpu])
>  cpu_common_initfn arm_cpu_initfn aarch64_cpu_initfn
>  aarch64_cpu_instance_init aarch64_host_initfn arm_cpu_post_init ===>
>  machvirt_init: realize CPU object (idx=0) virt_cpu_pre_plug
>  arm_cpu_realizefn cpu_common_realizefn virt_cpu_plug ===>
>  machvirt_init: create CPU object (idx=1, type=[host-arm-cpu])
>  cpu_common_initfn arm_cpu_initfn aarch64_cpu_initfn
>  aarch64_cpu_instance_init aarch64_host_initfn arm_cpu_post_init
>  kvm_arch_init_vcpu: Error -22 from kvm_arm_sve_set_vls()
>  qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core
>  dumped)

Yes, sure. 

Thanks
Salil.


>  
>  Thanks,
>  Gavin
>  
>  >>
>  >>   With this series and latest upstream Linux kernel (host), I ran into core
>  >>   dump as below.
>  >>   I'm not sure if it's a known issue or not.
>  >>
>  >>   # uname -r
>  >>   6.11.0-rc2-gavin+
>  >>   # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -
>  accel
>  >>   kvm \
>  >>      -machine virt,gic-version=host,nvdimm=on -cpu host                 \
>  >>      -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1       \
>  >>      -m 4096M,slots=16,maxmem=128G                                      \
>  >>      -object memory-backend-ram,id=mem0,size=2048M                      \
>  >>      -object memory-backend-ram,id=mem1,size=2048M                      \
>  >>      -numa node,nodeid=0,memdev=mem0,cpus=0-0                           \
>  >>      -numa node,nodeid=1,memdev=mem1,cpus=1-1                           \
>  >>        :
>  >>   qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core
>  >>   dumped)
>  >>
>  >>   # gdb /var/lib/systemd/coredump/core.0
>  >>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
>  >>   (gdb) bt
>  >>   #0  0x0000ffff9eec42e8 in __pthread_kill_implementation () at
>  >>   /lib64/libc.so.6
>  >>   #1  0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6
>  >>   #2  0x0000ffff9ee69034 in abort () at /lib64/libc.so.6
>  >>   #3  0x0000aaaac71152c0 in kvm_arm_create_host_vcpu
>  >>   (cpu=0xaaaae4c0cb80)
>  >>        at ../target/arm/kvm.c:1093
>  >>   #4  0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at
>  >>   ../hw/arm/virt.c:2534
>  >>   #5  0x0000aaaac6b0d31c in machine_run_board_init
>  >>        (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at
>  >>   ../hw/core/machine.c:1576
>  >>   #6  0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620
>  >>   #7  0x0000aaaac6f590dc in qmp_x_exit_preconfig
>  (errp=0xaaaac8911120
>  >>   <error_fatal>)
>  >>        at ../system/vl.c:2712
>  >>   #8  0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at
>  >>   ../system/vl.c:3758
>  >>   #9  0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at
>  >>   ../system/main.c:47
>  >>
>  >>   Thanks,
>  >>   Gavin
>  >>
>  >
>

Gavin Shan Aug. 8, 2024, 12:29 a.m. UTC | #10

Hi Salil,

On 8/8/24 9:48 AM, Salil Mehta wrote:
>>   On 8/7/24 11:27 PM, Salil Mehta wrote:
>>   >
>>   > Let me figure out this. Have you also included the below patch along
>>   > with the architecture agnostic patch-set accepted in this Qemu cycle?
>>   >
>>   > https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@lin
>>   > aro.org/
>>   >
>>   
>>   There are no vCPU fd to be parked and unparked when the core dump
>>   happenes. I tried it, but didn't help. I added more debugging messages and
>>   the core dump is triggered in the following path. It seems 'cpu-
>>   >sve_vq.map' isn't correct since it's populated in CPU realization path, and
>>   those non-cold-booted CPUs aren't realized in the booting stage.
> 
> 
> Ah, I've to fix the SVE support. I'm already working on it and will be part of
> the RFC V4.
> 
> Have you tried booting VM by disabling the SVE support?
> 

I'm able to boot the guest after SVE is disabled by clearing the corresponding
bits in ID_AA64PFR0, as below.

static bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
{
     :

     /*
      * SVE is explicitly disabled. Otherwise, the non-cold-booted
      * CPUs can't be initialized in the vCPU hotplug scenario.
      */
     err = read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr0,
                          ARM64_SYS_REG(3, 0, 0, 4, 0));
     ahcf->isar.id_aa64pfr0 &= ~R_ID_AA64PFR0_SVE_MASK;
}

However, I'm unable to hot-add a vCPU and haven't get a chance to look
at it closely.

(qemu) device_add host-arm-cpu,id=cpu,socket-id=1
(qemu) [  258.901027] Unable to handle kernel write to read-only memory at virtual address ffff800080fa7190
[  258.901686] Mem abort info:
[  258.901889]   ESR = 0x000000009600004e
[  258.902160]   EC = 0x25: DABT (current EL), IL = 32 bits
[  258.902543]   SET = 0, FnV = 0
[  258.902763]   EA = 0, S1PTW = 0
[  258.902991]   FSC = 0x0e: level 2 permission fault
[  258.903338] Data abort info:
[  258.903547]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
[  258.903943]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  258.904304]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  258.904687] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e24000
[  258.905258] [ffff800080fa7190] pgd=10000000b95b0003, p4d=10000000b95b0003, pud=10000000b95b1003, pmd=00600000b8c00781
[  258.906026] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
[  258.906474] Modules linked in:
[  258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted 6.11.0-rc2-gavin-gb446a2dae984 #7
[  258.907338] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024
[  258.908009] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  258.908401] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  258.908899] pc : register_cpu+0x140/0x290
[  258.909195] lr : register_cpu+0x128/0x290
[  258.909487] sp : ffff8000817fba10
[  258.909727] x29: ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098
[  258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24: ffff80008153dad0
[  258.910762] x23: 0000000000000001 x22: ffff0000ff7de210 x21: ffff8000811b9a00
[  258.911279] x20: 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff
[  258.911798] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000005a46a1c
[  258.912326] x14: ffffffffffffffff x13: ffff000005a4632b x12: 0000000000000000
[  258.912854] x11: 0000000000000040 x10: 0000000000000000 x9 : ffff8000808a6cd4
[  258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefefefefefeff
[  258.913906] x5 : ffff0000053fab40 x4 : ffff0000053fa920 x3 : ffff0000053fabb0
[  258.914439] x2 : ffff000000de1100 x1 : ffff800080fa7190 x0 : 0000000000000002
[  258.914968] Call trace:
[  258.915154]  register_cpu+0x140/0x290
[  258.915429]  arch_register_cpu+0x84/0xd8
[  258.915726]  acpi_processor_add+0x480/0x5b0
[  258.916042]  acpi_bus_attach+0x1c4/0x300
[  258.916334]  acpi_dev_for_one_check+0x3c/0x50
[  258.916689]  device_for_each_child+0x68/0xc8
[  258.917012]  acpi_dev_for_each_child+0x48/0x80
[  258.917344]  acpi_bus_attach+0x84/0x300
[  258.917629]  acpi_bus_scan+0x74/0x220
[  258.917902]  acpi_scan_rescan_bus+0x54/0x88
[  258.918211]  acpi_device_hotplug+0x208/0x478
[  258.918529]  acpi_hotplug_work_fn+0x2c/0x50
[  258.918839]  process_one_work+0x15c/0x3c0
[  258.919139]  worker_thread+0x2ec/0x400
[  258.919417]  kthread+0x120/0x130
[  258.919658]  ret_from_fork+0x10/0x20
[  258.919924] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f)
[  258.920373] ---[ end trace 0000000000000000 ]---

Thanks,
Gavin

Gavin Shan Aug. 8, 2024, 4:15 a.m. UTC | #11

Hi Salil,

On 8/8/24 10:29 AM, Gavin Shan wrote:
> On 8/8/24 9:48 AM, Salil Mehta wrote:
> 
> However, I'm unable to hot-add a vCPU and haven't get a chance to look
> at it closely.
> 
> (qemu) device_add host-arm-cpu,id=cpu,socket-id=1
> (qemu) [  258.901027] Unable to handle kernel write to read-only memory at virtual address ffff800080fa7190
> [  258.901686] Mem abort info:
> [  258.901889]   ESR = 0x000000009600004e
> [  258.902160]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  258.902543]   SET = 0, FnV = 0
> [  258.902763]   EA = 0, S1PTW = 0
> [  258.902991]   FSC = 0x0e: level 2 permission fault
> [  258.903338] Data abort info:
> [  258.903547]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
> [  258.903943]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> [  258.904304]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  258.904687] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e24000
> [  258.905258] [ffff800080fa7190] pgd=10000000b95b0003, p4d=10000000b95b0003, pud=10000000b95b1003, pmd=00600000b8c00781
> [  258.906026] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
> [  258.906474] Modules linked in:
> [  258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted 6.11.0-rc2-gavin-gb446a2dae984 #7
> [  258.907338] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024
> [  258.908009] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  258.908401] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> [  258.908899] pc : register_cpu+0x140/0x290
> [  258.909195] lr : register_cpu+0x128/0x290
> [  258.909487] sp : ffff8000817fba10
> [  258.909727] x29: ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098
> [  258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24: ffff80008153dad0
> [  258.910762] x23: 0000000000000001 x22: ffff0000ff7de210 x21: ffff8000811b9a00
> [  258.911279] x20: 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff
> [  258.911798] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000005a46a1c
> [  258.912326] x14: ffffffffffffffff x13: ffff000005a4632b x12: 0000000000000000
> [  258.912854] x11: 0000000000000040 x10: 0000000000000000 x9 : ffff8000808a6cd4
> [  258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefefefefefeff
> [  258.913906] x5 : ffff0000053fab40 x4 : ffff0000053fa920 x3 : ffff0000053fabb0
> [  258.914439] x2 : ffff000000de1100 x1 : ffff800080fa7190 x0 : 0000000000000002
> [  258.914968] Call trace:
> [  258.915154]  register_cpu+0x140/0x290
> [  258.915429]  arch_register_cpu+0x84/0xd8
> [  258.915726]  acpi_processor_add+0x480/0x5b0
> [  258.916042]  acpi_bus_attach+0x1c4/0x300
> [  258.916334]  acpi_dev_for_one_check+0x3c/0x50
> [  258.916689]  device_for_each_child+0x68/0xc8
> [  258.917012]  acpi_dev_for_each_child+0x48/0x80
> [  258.917344]  acpi_bus_attach+0x84/0x300
> [  258.917629]  acpi_bus_scan+0x74/0x220
> [  258.917902]  acpi_scan_rescan_bus+0x54/0x88
> [  258.918211]  acpi_device_hotplug+0x208/0x478
> [  258.918529]  acpi_hotplug_work_fn+0x2c/0x50
> [  258.918839]  process_one_work+0x15c/0x3c0
> [  258.919139]  worker_thread+0x2ec/0x400
> [  258.919417]  kthread+0x120/0x130
> [  258.919658]  ret_from_fork+0x10/0x20
> [  258.919924] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f)
> [  258.920373] ---[ end trace 0000000000000000 ]---
> 

The fix [1] is needed by the guest kernel. With this, I'm able to hot add
vCPU and hot remove vCPU successfully.

[1] https://lkml.org/lkml/2024/8/8/155

Thanks,
Gavin

Gavin Shan Aug. 8, 2024, 5 a.m. UTC | #12

Hi Salil,

On 8/8/24 2:07 AM, Salil Mehta wrote:
> I tested ARM arch specific patches with the latest Qemu which contains below mentioned
> fix and I cannot reproduce the crash. I used kernel linux-6.11-rc2 and it booted successfully.
> Though I did see a kernel crash on attempting to hotplug first vCPU.
> 
> (qemu) device_add host-arm-cpu,id=core4,core-id=4
> (qemu) [  365.125477] Unable to handle kernel write to read-only memory at virtual address ffff800081ba4190
> [  365.126366] Mem abort info:
> [  365.126640]   ESR = 0x000000009600004e
> [  365.127010]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  365.127524]   SET = 0, FnV = 0
> [  365.127822]   EA = 0, S1PTW = 0
> [  365.128130]   FSC = 0x0e: level 2 permission fault
> [  365.128598] Data abort info:
> [  365.128881]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
> [  365.129447]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> [  365.129943]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  365.130442] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000045830000
> [  365.131068] [ffff800081ba4190] pgd=0000000000000000, p4d=10000000467df003, pud=10000000467e0003, pmd=0060000045600781
> [  365.132069] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
> [  365.132661] Modules linked in:
> [  365.132952] CPU: 0 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.11.0-rc2 #228
> [  365.133699] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [  365.134415] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  365.134969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  365.135679] pc : register_cpu+0x138/0x250
> [  365.136093] lr : register_cpu+0x120/0x250
> [  365.136506] sp : ffff800082cbba10
> [  365.136847] x29: ffff800082cbba10 x28: ffff8000826479c0 x27: ffff000000a7e098
> [  365.137575] x26: ffff8000827c2838 x25: 0000000000000004 x24: ffff80008264d9b0
> [  365.138311] x23: 0000000000000004 x22: ffff000012a482d0 x21: ffff800081e30a00
> [  365.139037] x20: 0000000000000000 x19: ffff800081ba4190 x18: ffffffffffffffff
> [  365.139764] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000001adaa1c
> [  365.140490] x14: ffffffffffffffff x13: ffff000001ada2e0 x12: 0000000000000000
> [  365.141216] x11: ffff800081e32780 x10: 0000000000000000 x9 : 0000000000000001
> [  365.141945] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 6f7274726e737460
> [  365.142668] x5 : ffff0000027b1920 x4 : ffff0000027b1b40 x3 : ffff0000027b1880
> [  365.143400] x2 : ffff0000001933c0 x1 : ffff800081ba4190 x0 : 0000000000000010
> [  365.144129] Call trace:
> [  365.144382]  register_cpu+0x138/0x250
> [  365.144759]  arch_register_cpu+0x7c/0xc4
> [  365.145166]  acpi_processor_add+0x468/0x590
> [  365.145594]  acpi_bus_attach+0x1ac/0x2dc
> [  365.146002]  acpi_dev_for_one_check+0x34/0x40
> [  365.146449]  device_for_each_child+0x5c/0xb0
> [  365.146887]  acpi_dev_for_each_child+0x3c/0x64
> [  365.147341]  acpi_bus_attach+0x78/0x2dc
> [  365.147734]  acpi_bus_scan+0x68/0x208
> [  365.148110]  acpi_scan_rescan_bus+0x4c/0x78
> [  365.148537]  acpi_device_hotplug+0x1f8/0x460
> [  365.148975]  acpi_hotplug_work_fn+0x24/0x3c
> [  365.149402]  process_one_work+0x150/0x294
> [  365.149817]  worker_thread+0x2e4/0x3ec
> [  365.150199]  kthread+0x118/0x11c
> [  365.150536]  ret_from_fork+0x10/0x20
> [  365.150903] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f)
> [  365.151527] ---[ end trace 0000000000000000 ]---
> 

Should be fixed by: https://lkml.org/lkml/2024/8/8/155

Thanks,
Gavin

Salil Mehta Aug. 8, 2024, 8:36 a.m. UTC | #13

Hi Gavin,

>  From: Gavin Shan <gshan@redhat.com>
>  Sent: Thursday, August 8, 2024 1:29 AM
>  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  qemu-arm@nongnu.org; mst@redhat.com
>  
>  Hi Salil,
>  
>  On 8/8/24 9:48 AM, Salil Mehta wrote:
>  >>   On 8/7/24 11:27 PM, Salil Mehta wrote:
>  >>   >
>  >>   > Let me figure out this. Have you also included the below patch along
>  >>   > with the architecture agnostic patch-set accepted in this Qemu cycle?
>  >>   >
>  >>   > https://lore.kernel.org/all/20240801142322.3948866-3-
>  peter.maydell@lin
>  >>   > aro.org/
>  >>   >
>  >>
>  >>   There are no vCPU fd to be parked and unparked when the core dump
>  >>   happenes. I tried it, but didn't help. I added more debugging messages
>  and
>  >>   the core dump is triggered in the following path. It seems 'cpu-
>  >>   >sve_vq.map' isn't correct since it's populated in CPU realization path,
>  and
>  >>   those non-cold-booted CPUs aren't realized in the booting stage.
>  >
>  >
>  > Ah, I've to fix the SVE support. I'm already working on it and will be
>  > part of the RFC V4.
>  >
>  > Have you tried booting VM by disabling the SVE support?
>  >
>  
>  I'm able to boot the guest after SVE is disabled by clearing the
>  corresponding bits in ID_AA64PFR0, as below.
>  
>  static bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
>  {
>       :
>  
>       /*
>        * SVE is explicitly disabled. Otherwise, the non-cold-booted
>        * CPUs can't be initialized in the vCPU hotplug scenario.
>        */
>       err = read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr0,
>                            ARM64_SYS_REG(3, 0, 0, 4, 0));
>       ahcf->isar.id_aa64pfr0 &= ~R_ID_AA64PFR0_SVE_MASK; }
>  
>  However, I'm unable to hot-add a vCPU and haven't get a chance to look at
>  it closely.
>  
>  (qemu) device_add host-arm-cpu,id=cpu,socket-id=1
>  (qemu) [  258.901027] Unable to handle kernel write to read-only memory
>  at virtual address ffff800080fa7190 [  258.901686] Mem abort info:
>  [  258.901889]   ESR = 0x000000009600004e
>  [  258.902160]   EC = 0x25: DABT (current EL), IL = 32 bits
>  [  258.902543]   SET = 0, FnV = 0
>  [  258.902763]   EA = 0, S1PTW = 0
>  [  258.902991]   FSC = 0x0e: level 2 permission fault
>  [  258.903338] Data abort info:
>  [  258.903547]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
>  [  258.903943]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>  [  258.904304]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>  [  258.904687] swapper pgtable: 4k pages, 48-bit VAs,
>  pgdp=00000000b8e24000 [  258.905258] [ffff800080fa7190]
>  pgd=10000000b95b0003, p4d=10000000b95b0003, pud=10000000b95b1003,
>  pmd=00600000b8c00781 [  258.906026] Internal error: Oops:
>  000000009600004e [#1] PREEMPT SMP [  258.906474] Modules linked in:
>  [  258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted 6.11.0-
>  rc2-gavin-gb446a2dae984 #7 [  258.907338] Hardware name: QEMU KVM
>  Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024 [
>  258.908009] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [
>  258.908401] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS
>  BTYPE=--) [  258.908899] pc : register_cpu+0x140/0x290 [  258.909195] lr :
>  register_cpu+0x128/0x290 [  258.909487] sp : ffff8000817fba10 [  258.909727]
>  x29: ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098 [
>  258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24:
>  ffff80008153dad0 [  258.910762] x23: 0000000000000001 x22:
>  ffff0000ff7de210 x21: ffff8000811b9a00 [  258.911279] x20:
>  0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff [  258.911798]
>  x17: 0000000000000000 x16: 0000000000000000 x15: ffff000005a46a1c [
>  258.912326] x14: ffffffffffffffff x13: ffff000005a4632b x12: 0000000000000000
>  [  258.912854] x11: 0000000000000040 x10: 0000000000000000 x9 :
>  ffff8000808a6cd4 [  258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f
>  x6 : fefefefefefefeff [  258.913906] x5 : ffff0000053fab40 x4 :
>  ffff0000053fa920 x3 : ffff0000053fabb0 [  258.914439] x2 : ffff000000de1100
>  x1 : ffff800080fa7190 x0 : 0000000000000002 [  258.914968] Call trace:
>  [  258.915154]  register_cpu+0x140/0x290 [  258.915429]
>  arch_register_cpu+0x84/0xd8 [  258.915726]
>  acpi_processor_add+0x480/0x5b0 [  258.916042]
>  acpi_bus_attach+0x1c4/0x300 [  258.916334]
>  acpi_dev_for_one_check+0x3c/0x50 [  258.916689]
>  device_for_each_child+0x68/0xc8 [  258.917012]
>  acpi_dev_for_each_child+0x48/0x80 [  258.917344]
>  acpi_bus_attach+0x84/0x300 [  258.917629]  acpi_bus_scan+0x74/0x220 [
>  258.917902]  acpi_scan_rescan_bus+0x54/0x88 [  258.918211]
>  acpi_device_hotplug+0x208/0x478 [  258.918529]
>  acpi_hotplug_work_fn+0x2c/0x50 [  258.918839]
>  process_one_work+0x15c/0x3c0 [  258.919139]
>  worker_thread+0x2ec/0x400 [  258.919417]  kthread+0x120/0x130 [
>  258.919658]  ret_from_fork+0x10/0x20 [  258.919924] Code: 91064021
>  9ad72000 8b130c33 d503201f (f820327f) [  258.920373] ---[ end trace
>  0000000000000000 ]---


Yes, this crash. Thanks for confirming!


>  
>  Thanks,
>  Gavin
>  
>

Salil Mehta Aug. 8, 2024, 8:39 a.m. UTC | #14

Hi Gavin,

>  From: Gavin Shan <gshan@redhat.com>
>  Sent: Thursday, August 8, 2024 5:15 AM
>  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  qemu-arm@nongnu.org; mst@redhat.com
>  
>  Hi Salil,
>  
>  On 8/8/24 10:29 AM, Gavin Shan wrote:
>  > On 8/8/24 9:48 AM, Salil Mehta wrote:
>  >
>  > However, I'm unable to hot-add a vCPU and haven't get a chance to look
>  > at it closely.
>  >
>  > (qemu) device_add host-arm-cpu,id=cpu,socket-id=1
>  > (qemu) [  258.901027] Unable to handle kernel write to read-only
>  > memory at virtual address ffff800080fa7190 [  258.901686] Mem abort info:
>  > [  258.901889]   ESR = 0x000000009600004e [  258.902160]   EC = 0x25:
>  > DABT (current EL), IL = 32 bits [  258.902543]   SET = 0, FnV = 0 [
>  > 258.902763]   EA = 0, S1PTW = 0 [  258.902991]   FSC = 0x0e: level 2
>  > permission fault [  258.903338] Data abort info:
>  > [  258.903547]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 [
>  > 258.903943]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [  258.904304]
>  > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [  258.904687] swapper
>  > pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e24000 [  258.905258]
>  > [ffff800080fa7190] pgd=10000000b95b0003, p4d=10000000b95b0003,
>  > pud=10000000b95b1003, pmd=00600000b8c00781 [  258.906026] Internal
>  > error: Oops: 000000009600004e [#1] PREEMPT SMP [  258.906474] Modules
>  linked in:
>  > [  258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted
>  > 6.11.0-rc2-gavin-gb446a2dae984 #7 [  258.907338] Hardware name: QEMU
>  > KVM Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org
>  > 02/14/2024 [  258.908009] Workqueue: kacpi_hotplug
>  > acpi_hotplug_work_fn [  258.908401] pstate: 63400005 (nZCv daif +PAN
>  > -UAO +TCO +DIT -SSBS BTYPE=--) [  258.908899] pc :
>  > register_cpu+0x140/0x290 [  258.909195] lr : register_cpu+0x128/0x290
>  > [  258.909487] sp : ffff8000817fba10 [  258.909727] x29:
>  > ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098 [
>  > 258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24:
>  > ffff80008153dad0 [  258.910762] x23: 0000000000000001 x22:
>  > ffff0000ff7de210 x21: ffff8000811b9a00 [  258.911279] x20:
>  > 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff [
>  > 258.911798] x17: 0000000000000000 x16: 0000000000000000 x15:
>  > ffff000005a46a1c [  258.912326] x14: ffffffffffffffff x13:
>  > ffff000005a4632b x12: 0000000000000000 [  258.912854] x11:
>  > 0000000000000040 x10: 0000000000000000 x9 : ffff8000808a6cd4 [
>  > 258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 :
>  fefefefefefefeff [  258.913906] x5 : ffff0000053fab40 x4 : ffff0000053fa920 x3
>  : ffff0000053fabb0 [  258.914439] x2 : ffff000000de1100 x1 : ffff800080fa7190
>  x0 : 0000000000000002 [  258.914968] Call trace:
>  > [  258.915154]  register_cpu+0x140/0x290 [  258.915429]
>  > arch_register_cpu+0x84/0xd8 [  258.915726]
>  > acpi_processor_add+0x480/0x5b0 [  258.916042]
>  > acpi_bus_attach+0x1c4/0x300 [  258.916334]
>  > acpi_dev_for_one_check+0x3c/0x50 [  258.916689]
>  > device_for_each_child+0x68/0xc8 [  258.917012]
>  > acpi_dev_for_each_child+0x48/0x80 [  258.917344]
>  > acpi_bus_attach+0x84/0x300 [  258.917629]  acpi_bus_scan+0x74/0x220 [
>  > 258.917902]  acpi_scan_rescan_bus+0x54/0x88 [  258.918211]
>  > acpi_device_hotplug+0x208/0x478 [  258.918529]
>  > acpi_hotplug_work_fn+0x2c/0x50 [  258.918839]
>  > process_one_work+0x15c/0x3c0
>  [  258.919139]  worker_thread+0x2ec/0x400
>  > [  258.919417]  kthread+0x120/0x130 [  258.919658]
>  > ret_from_fork+0x10/0x20 [  258.919924] Code: 91064021 9ad72000
>  > 8b130c33 d503201f (f820327f) [  258.920373] ---[ end trace
>  > 0000000000000000 ]---
>  >
>  
>  The fix [1] is needed by the guest kernel. With this, I'm able to hot add vCPU
>  and hot remove vCPU successfully.
>  
>  [1] https://lkml.org/lkml/2024/8/8/155


Good catch in the kernel. And many thanks for fixing as well.


>  
>  Thanks,
>  Gavin
>

Gustavo Romero Aug. 28, 2024, 8:35 p.m. UTC | #15

Hi Salil,

On 6/13/24 8:36 PM, Salil Mehta via wrote:
> PROLOGUE
> ========
> 
> To assist in review and set the right expectations from this RFC, please first
> read the sections *APPENDED AT THE END* of this cover letter:
> 
> 1. Important *DISCLAIMER* [Section (X)]
> 2. Work presented at KVMForum Conference (slides available) [Section (V)F]
> 3. Organization of patches [Section (XI)]
> 4. References [Section (XII)]
> 5. Detailed TODO list of leftover work or work-in-progress [Section (IX)]
> 
> There has been interest shown by other organizations in adapting this series
> for their architecture. Hence, RFC V2 [21] has been split into architecture
> *agnostic* [22] and *specific* patch sets.
> 
> This is an ARM architecture-specific patch set carved out of RFC V2. Please
> check section (XI)B for details of architecture agnostic patches.
> 
> SECTIONS [I - XIII] are as follows:
> 
> (I) Key Changes [details in last section (XIV)]
> ==============================================
> 
> RFC V2 -> RFC V3
> 
> 1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3) patch sets.
> 2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang (RedHat), Philippe Mathieu-Daudé (Linaro),
>     Jonathan Cameron (Huawei), Zhao Liu (Intel).
> 
> RFC V1 -> RFC V2
> 
> RFC V1: https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
> 
> 1. ACPI MADT Table GIC CPU Interface can now be presented [6] as ACPI
>     *online-capable* or *enabled* to the Guest OS at boot time. This means
>     associated CPUs can have ACPI _STA as *enabled* or *disabled* even after boot.
>     See UEFI ACPI 6.5 Spec, Section 05, Table 5.37 GICC CPU Interface Flags[20].
> 2. SMCC/HVC Hypercall exit handling in userspace/Qemu for PSCI CPU_{ON,OFF}
>     request. This is required to {dis}allow online'ing a vCPU.
> 3. Always presenting unplugged vCPUs in CPUs ACPI AML code as ACPI _STA.PRESENT
>     to the Guest OS. Toggling ACPI _STA.Enabled to give an effect of the
>     hot{un}plug.
> 4. Live Migration works (some issues are still there).
> 5. TCG/HVF/qtest does not support Hotplug and falls back to default.
> 6. Code for TCG support exists in this release (it is a work-in-progress).
> 7. ACPI _OSC method can now be used by OSPM to negotiate Qemu VM platform
>     hotplug capability (_OSC Query support still pending).
> 8. Misc. Bug fixes.
> 
> (II) Summary
> ============
> 
> This patch set introduces virtual CPU hotplug support for the ARMv8 architecture
> in QEMU. The idea is to be able to hotplug and hot-unplug vCPUs while the guest VM
> is running, without requiring a reboot. This does *not* make any assumptions about
> the physical CPU hotplug availability within the host system but rather tries to
> solve the problem at the virtualizer/QEMU layer. It introduces ACPI CPU hotplug hooks
> and event handling to interface with the guest kernel, and code to initialize, plug,
> and unplug CPUs. No changes are required within the host kernel/KVM except the
> support of hypercall exit handling in the user-space/Qemu, which has recently
> been added to the kernel. Corresponding guest kernel changes have been
> posted on the mailing list [3] [4] by James Morse.
> 
> (III) Motivation
> ================
> 
> This allows scaling the guest VM compute capacity on-demand, which would be
> useful for the following example scenarios:
> 
> 1. Vertical Pod Autoscaling [9][10] in the cloud: Part of the orchestration
>     framework that could adjust resource requests (CPU and Mem requests) for
>     the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure providers could allocate and
>     restrict the total number of compute resources available to the guest VM
>     according to the SLA (Service Level Agreement). VM owners could request more
>     compute to be hot-plugged for some cost.
> 
> For example, Kata Container VM starts with a minimum amount of resources (i.e.,
> hotplug everything approach). Why?
> 
> 1. Allowing faster *boot time* and
> 2. Reduction in *memory footprint*
> 
> Kata Container VM can boot with just 1 vCPU, and then later more vCPUs can be
> hot-plugged as needed.
> 
> (IV) Terminology
> ================
> 
> (*) Possible CPUs: Total vCPUs that could ever exist in the VM. This includes
>                     any cold-booted CPUs plus any CPUs that could be later
>                     hot-plugged.
>                     - Qemu parameter (-smp maxcpus=N)
> (*) Present CPUs:  Possible CPUs that are ACPI 'present'. These might or might
>                     not be ACPI 'enabled'.
>                     - Present vCPUs = Possible vCPUs (Always on ARM Arch)
> (*) Enabled CPUs:  Possible CPUs that are ACPI 'present' and 'enabled' and can
>                     now be ‘onlined’ (PSCI) for use by the Guest Kernel. All cold-
>                     booted vCPUs are ACPI 'enabled' at boot. Later, using
>                     device_add, more vCPUs can be hotplugged and made ACPI
>                     'enabled'.
>                     - Qemu parameter (-smp cpus=N). Can be used to specify some
> 	           cold-booted vCPUs during VM init. Some can be added using the
> 	           '-device' option.
> 
> (V) Constraints Due to ARMv8 CPU Architecture [+] Other Impediments
> ===================================================================
> 
> A. Physical Limitation to Support CPU Hotplug: (Architectural Constraint)
>     1. ARMv8 CPU architecture does not support the concept of the physical CPU
>        hotplug.
>        a. There are many per-CPU components like PMU, SVE, MTE, Arch timers, etc.,
>           whose behavior needs to be clearly defined when the CPU is hot(un)plugged.
>           There is no specification for this.
> 
>     2. Other ARM components like GIC, etc., have not been designed to realize
>        physical CPU hotplug capability as of now. For example,
>        a. Every physical CPU has a unique GICC (GIC CPU Interface) by construct.
>           Architecture does not specify what CPU hot(un)plug would mean in
>           context to any of these.
>        b. CPUs/GICC are physically connected to unique GICR (GIC Redistributor).
>           GIC Redistributors are always part of the always-on power domain. Hence,
>           they cannot be powered off as per specification.
> 
> B. Impediments in Firmware/ACPI (Architectural Constraint)
> 
>     1. Firmware has to expose GICC, GICR, and other per-CPU features like PMU,
>        SVE, MTE, Arch Timers, etc., to the OS. Due to the architectural constraint
>        stated in section A1(a), all interrupt controller structures of
>        MADT describing GIC CPU Interfaces and the GIC Redistributors MUST be
>        presented by firmware to the OSPM during boot time.
>     2. Architectures that support CPU hotplug can evaluate the ACPI _MAT method to
>        get this kind of information from the firmware even after boot, and the
>        OSPM has the capability to process these. ARM kernel uses information in MADT
>        interrupt controller structures to identify the number of present CPUs during
>        boot and hence does not allow to change these after boot. The number of
>        present CPUs cannot be changed. It is an architectural constraint!
> 
> C. Impediments in KVM to Support Virtual CPU Hotplug (Architectural Constraint)
> 
>     1. KVM VGIC:
>        a. Sizing of various VGIC resources like memory regions, etc., related to
>           the redistributor happens only once and is fixed at the VM init time
>           and cannot be changed later after initialization has happened.
>           KVM statically configures these resources based on the number of vCPUs
>           and the number/size of redistributor ranges.
>        b. Association between vCPU and its VGIC redistributor is fixed at the
>           VM init time within the KVM, i.e., when redistributor iodevs gets
>           registered. VGIC does not allow to setup/change this association
>           after VM initialization has happened. Physically, every CPU/GICC is
>           uniquely connected with its redistributor, and there is no
>           architectural way to set this up.
>     2. KVM vCPUs:
>        a. Lack of specification means destruction of KVM vCPUs does not exist as
>           there is no reference to tell what to do with other per-vCPU
>           components like redistributors, arch timer, etc.
>        b. In fact, KVM does not implement the destruction of vCPUs for any
>           architecture. This is independent of whether the architecture
>           actually supports CPU Hotplug feature. For example, even for x86 KVM
>           does not implement the destruction of vCPUs.
> 
> D. Impediments in Qemu to Support Virtual CPU Hotplug (KVM Constraints->Arch)
> 
>     1. Qemu CPU Objects MUST be created to initialize all the Host KVM vCPUs to
>        overcome the KVM constraint. KVM vCPUs are created and initialized when Qemu
>        CPU Objects are realized. But keeping the QOM CPU objects realized for
>        'yet-to-be-plugged' vCPUs can create problems when these new vCPUs shall
>        be plugged using device_add and a new QOM CPU object shall be created.
>     2. GICV3State and GICV3CPUState objects MUST be sized over *possible vCPUs*
>        during VM init time while QOM GICV3 Object is realized. This is because
>        KVM VGIC can only be initialized once during init time. But every
>        GICV3CPUState has an associated QOM CPU Object. Later might correspond to
>        vCPU which are 'yet-to-be-plugged' (unplugged at init).
>     3. How should new QOM CPU objects be connected back to the GICV3CPUState
>        objects and disconnected from it in case the CPU is being hot(un)plugged?
>     4. How should 'unplugged' or 'yet-to-be-plugged' vCPUs be represented in the
>        QOM for which KVM vCPU already exists? For example, whether to keep,
>         a. No QOM CPU objects Or
>         b. Unrealized CPU Objects
>     5. How should vCPU state be exposed via ACPI to the Guest? Especially for
>        the unplugged/yet-to-be-plugged vCPUs whose CPU objects might not exist
>        within the QOM but the Guest always expects all possible vCPUs to be
>        identified as ACPI *present* during boot.
>     6. How should Qemu expose GIC CPU interfaces for the unplugged or
>        yet-to-be-plugged vCPUs using ACPI MADT Table to the Guest?
> 
> E. Summary of Approach ([+] Workarounds to problems in sections A, B, C & D)
> 
>     1. At VM Init, pre-create all the possible vCPUs in the Host KVM i.e., even
>        for the vCPUs which are yet-to-be-plugged in Qemu but keep them in the
>        powered-off state.
>     2. After the KVM vCPUs have been initialized in the Host, the KVM vCPU
>        objects corresponding to the unplugged/yet-to-be-plugged vCPUs are parked
>        at the existing per-VM "kvm_parked_vcpus" list in Qemu. (similar to x86)
>     3. GICV3State and GICV3CPUState objects are sized over possible vCPUs during
>        VM init time i.e., when Qemu GIC is realized. This, in turn, sizes KVM VGIC
>        resources like memory regions, etc., related to the redistributors with the
>        number of possible KVM vCPUs. This never changes after VM has initialized.
>     4. Qemu CPU objects corresponding to unplugged/yet-to-be-plugged vCPUs are
>        released post Host KVM CPU and GIC/VGIC initialization.
>     5. Build ACPI MADT Table with the following updates:
>        a. Number of GIC CPU interface entries (=possible vCPUs)
>        b. Present Boot vCPU as MADT.GICC.Enabled=1 (Not hot[un]pluggable)
>        c. Present hot(un)pluggable vCPUs as MADT.GICC.online-capable=1
>           - MADT.GICC.Enabled=0 (Mutually exclusive) [6][7]
> 	 - vCPU can be ACPI enabled+onlined after Guest boots (Firmware Policy)
> 	 - Some issues with above (details in later sections)
>     6. Expose below ACPI Status to Guest kernel:
>        a. Always _STA.Present=1 (all possible vCPUs)
>        b. _STA.Enabled=1 (plugged vCPUs)
>        c. _STA.Enabled=0 (unplugged vCPUs)
>     7. vCPU hotplug *realizes* new QOM CPU object. The following happens:
>        a. Realizes, initializes QOM CPU Object & spawns Qemu vCPU thread.
>        b. Unparks the existing KVM vCPU ("kvm_parked_vcpus" list).
>           - Attaches to QOM CPU object.
>        c. Reinitializes KVM vCPU in the Host.
>           - Resets the core and sys regs, sets defaults, etc.
>        d. Runs KVM vCPU (created with "start-powered-off").
> 	 - vCPU thread sleeps (waits for vCPU reset via PSCI).
>        e. Updates Qemu GIC.
>           - Wires back IRQs related to this vCPU.
>           - GICV3CPUState association with QOM CPU Object.
>        f. Updates [6] ACPI _STA.Enabled=1.
>        g. Notifies Guest about the new vCPU (via ACPI GED interface).
> 	 - Guest checks _STA.Enabled=1.
> 	 - Guest adds processor (registers CPU with LDM) [3].
>        h. Plugs the QOM CPU object in the slot.
>           - slot-number = cpu-index {socket, cluster, core, thread}.
>        i. Guest online's vCPU (CPU_ON PSCI call over HVC/SMC).
>           - KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
>           - Qemu powers-on KVM vCPU in the Host.
>     8. vCPU hot-unplug *unrealizes* QOM CPU Object. The following happens:
>        a. Notifies Guest (via ACPI GED interface) vCPU hot-unplug event.
>           - Guest offline's vCPU (CPU_OFF PSCI call over HVC/SMC).
>        b. KVM exits HVC/SMC Hypercall [5] to Qemu (Policy Check).
>           - Qemu powers-off the KVM vCPU in the Host.
>        c. Guest signals *Eject* vCPU to Qemu.
>        d. Qemu updates [6] ACPI _STA.Enabled=0.
>        e. Updates GIC.
>           - Un-wires IRQs related to this vCPU.
>           - GICV3CPUState association with new QOM CPU Object is updated.
>        f. Unplugs the vCPU.
> 	 - Removes from slot.
>           - Parks KVM vCPU ("kvm_parked_vcpus" list).
>           - Unrealizes QOM CPU Object & joins back Qemu vCPU thread.
> 	 - Destroys QOM CPU object.
>        g. Guest checks ACPI _STA.Enabled=0.
>           - Removes processor (unregisters CPU with LDM) [3].
> 
> F. Work Presented at KVM Forum Conferences:
> ==========================================
> 
> Details of the above work have been presented at KVMForum2020 and KVMForum2023
> conferences. Slides & video are available at the links below:
> a. KVMForum 2023
>     - Challenges Revisited in Supporting Virt CPU Hotplug on architectures that don't Support CPU Hotplug (like ARM64).
>       https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>       https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
>       https://www.youtube.com/watch?v=hyrw4j2D6I0&t=23970s
>       https://kvm-forum.qemu.org/2023/talk/9SMPDQ/
> b. KVMForum 2020
>     - Challenges in Supporting Virtual CPU Hotplug on SoC Based Systems (like ARM64) - Salil Mehta, Huawei.
>       https://sched.co/eE4m
> 
> (VI) Commands Used
> ==================
> 
> A. Qemu launch commands to init the machine:
> 
>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
>        -cpu host -smp cpus=4,maxcpus=6 \
>        -m 300M \
>        -kernel Image \
>        -initrd rootfs.cpio.gz \
>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
>        -nographic \
>        -bios QEMU_EFI.fd \
> 
> B. Hot-(un)plug related commands:
> 
>    # Hotplug a host vCPU (accel=kvm):
>      $ device_add host-arm-cpu,id=core4,core-id=4
> 
>    # Hotplug a vCPU (accel=tcg):
>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4

Since support for hotplug is disabled on TCG, remove
these two lines in v4 cover letter?


Cheers,
Gustavo

>    # Delete the vCPU:
>      $ device_del core4
> 
> Sample output on guest after boot:
> 
>      $ cat /sys/devices/system/cpu/possible
>      0-5
>      $ cat /sys/devices/system/cpu/present
>      0-5
>      $ cat /sys/devices/system/cpu/enabled
>      0-3
>      $ cat /sys/devices/system/cpu/online
>      0-1
>      $ cat /sys/devices/system/cpu/offline
>      2-5
> 
> Sample output on guest after hotplug of vCPU=4:
> 
>      $ cat /sys/devices/system/cpu/possible
>      0-5
>      $ cat /sys/devices/system/cpu/present
>      0-5
>      $ cat /sys/devices/system/cpu/enabled
>      0-4
>      $ cat /sys/devices/system/cpu/online
>      0-1,4
>      $ cat /sys/devices/system/cpu/offline
>      2-3,5
> 
>      Note: vCPU=4 was explicitly 'onlined' after hot-plug
>      $ echo 1 > /sys/devices/system/cpu/cpu4/online
> 
> (VII) Latest Repository
> =======================
> 
> (*) Latest Qemu RFC V3 (Architecture Specific) patch set:
>      https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v3
> (*) Latest Qemu V13 (Architecture Agnostic) patch set:
>      https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v3.arch.agnostic.v13
> (*) QEMU changes for vCPU hotplug can be cloned from below site:
>      https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2
> (*) Guest Kernel changes (by James Morse, ARM) are available here:
>      https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git virtual_cpu_hotplug/rfc/v2
> (*) Leftover patches of the kernel are available here:
>      https://lore.kernel.org/lkml/20240529133446.28446-1-Jonathan.Cameron@huawei.com/
>      https://github.com/salil-mehta/linux/commits/virtual_cpu_hotplug/rfc/v6.jic/ (not latest)
> 
> (VIII) KNOWN ISSUES
> ===================
> 
> 1. Migration has been lightly tested but has been found working.
> 2. TCG is broken.
> 3. HVF and qtest are not supported yet.
> 4. ACPI MADT Table flags [7] MADT.GICC.Enabled and MADT.GICC.online-capable are
>     mutually exclusive, i.e., as per the change [6], a vCPU cannot be both
>     GICC.Enabled and GICC.online-capable. This means:
>        [ Link: https://bugzilla.tianocore.org/show_bug.cgi?id=3706 ]
>     a. If we have to support hot-unplug of the cold-booted vCPUs, then these MUST
>        be specified as GICC.online-capable in the MADT Table during boot by the
>        firmware/Qemu. But this requirement conflicts with the requirement to
>        support new Qemu changes with legacy OS that don't understand
>        MADT.GICC.online-capable Bit. Legacy OS during boot time will ignore this
>        bit, and hence these vCPUs will not appear on such OS. This is unexpected
>        behavior.
>     b. In case we decide to specify vCPUs as MADT.GICC.Enabled and try to unplug
>        these cold-booted vCPUs from OS (which in actuality should be blocked by
>        returning error at Qemu), then features like 'kexec' will break.
>     c. As I understand, removal of the cold-booted vCPUs is a required feature
>        and x86 world allows it.
>     d. Hence, either we need a specification change to make the MADT.GICC.Enabled
>        and MADT.GICC.online-capable Bits NOT mutually exclusive or NOT support
>        the removal of cold-booted vCPUs. In the latter case, a check can be introduced
>        to bar the users from unplugging vCPUs, which were cold-booted, using QMP
>        commands. (Needs discussion!)
>        Please check the patch part of this patch set:
>        [hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled].
>     
>        NOTE: This is definitely not a blocker!
> 5. Code related to the notification to GICV3 about the hot(un)plug of a vCPU event
>     might need further discussion.
> 
> 
> (IX) THINGS TO DO
> =================
> 
> 1. Fix issues related to TCG/Emulation support. (Not a blocker)
> 2. Comprehensive Testing is in progress. (Positive feedback from Oracle & Ampere)
> 3. Qemu Documentation (.rst) needs to be updated.
> 4. Fix qtest, HVF Support (Future).
> 5. Update the design issue related to ACPI MADT.GICC flags discussed in known
>     issues. This might require UEFI ACPI specification change (Not a blocker).
> 6. Add ACPI _OSC 'Query' support. Only part of _OSC support exists now. (Not a blocker).
> 
> The above is *not* a complete list. Will update later!
> 
> Best regards,
> Salil.
> 
> (X) DISCLAIMER
> ==============
> 
> This work is an attempt to present a proof-of-concept of the ARM64 vCPU hotplug
> implementation to the community. This is *not* production-level code and might
> have bugs. Comprehensive testing is being done on HiSilicon Kunpeng920 SoC,
> Oracle, and Ampere servers. We are nearing stable code and a non-RFC
> version shall be floated soon.
> 
> This work is *mostly* in the lines of the discussions that have happened in the
> previous years [see refs below] across different channels like the mailing list,
> Linaro Open Discussions platform, and various conferences like KVMForum, etc. This
> RFC is being used as a way to verify the idea mentioned in this cover letter and
> to get community views. Once this has been agreed upon, a formal patch shall be
> posted to the mailing list for review.
> 
> [The concept being presented has been found to work!]
> 
> (XI) ORGANIZATION OF PATCHES
> ============================
>   
> A. Architecture *specific* patches:
> 
>     [Patch 1-8, 17, 27, 29] logic required during machine init.
>      (*) Some validation checks.
>      (*) Introduces core-id property and some util functions required later.
>      (*) Logic to pre-create vCPUs.
>      (*) GIC initialization pre-sized with possible vCPUs.
>      (*) Some refactoring to have common hot and cold plug logic together.
>      (*) Release of disabled QOM CPU objects in post_cpu_init().
>      (*) Support of ACPI _OSC method to negotiate platform hotplug capabilities.
>     [Patch 9-16] logic related to ACPI at machine init time.
>      (*) Changes required to Enable ACPI for CPU hotplug.
>      (*) Initialization of ACPI GED framework to cater to CPU Hotplug Events.
>      (*) ACPI MADT/MAT changes.
>     [Patch 18-26] logic required during vCPU hot-(un)plug.
>      (*) Basic framework changes to support vCPU hot-(un)plug.
>      (*) ACPI GED changes for hot-(un)plug hooks.
>      (*) Wire-unwire the IRQs.
>      (*) GIC notification logic.
>      (*) ARMCPU unrealize logic.
>      (*) Handling of SMCC Hypercall Exits by KVM to Qemu.
>     
> B. Architecture *agnostic* patches:
> 
>     [PATCH V13 0/8] Add architecture agnostic code to support vCPU Hotplug.
>     https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
>      (*) Refactors vCPU create, Parking, unparking logic of vCPUs, and addition of traces.
>      (*) Build ACPI AML related to CPU control dev.
>      (*) Changes related to the destruction of CPU Address Space.
>      (*) Changes related to the uninitialization of GDB Stub.
>      (*) Updating of Docs.
> 
> (XII) REFERENCES
> ================
> 
> [1] https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
> [2] https://lore.kernel.org/linux-arm-kernel/20200625133757.22332-1-salil.mehta@huawei.com/
> [3] https://lore.kernel.org/lkml/20230203135043.409192-1-james.morse@arm.com/
> [4] https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/
> [5] https://lore.kernel.org/all/20230404154050.2270077-1-oliver.upton@linux.dev/
> [6] https://bugzilla.tianocore.org/show_bug.cgi?id=3706
> [7] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gic-cpu-interface-gicc-structure
> [8] https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> [9] https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler
> [10] https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html
> [11] https://lkml.org/lkml/2019/7/10/235
> [12] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
> [13] https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06517.html
> [14] https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/thread/7CGL6JTACPUZEYQC34CZ2ZBWJGSR74WE/
> [15] http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01168.html
> [16] https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00131.html
> [17] https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/
> [18] https://lore.kernel.org/lkml/20210608154805.216869-1-jean-philippe@linaro.org/
> [19] https://lore.kernel.org/all/20230913163823.7880-1-james.morse@arm.com/
> [20] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gicc-cpu-interface-flags
> [21] https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
> [22] https://lore.kernel.org/qemu-devel/20240607115649.214622-1-salil.mehta@huawei.com/T/#md0887eb07976bc76606a8204614ccc7d9a01c1f7
> 
> (XIII) ACKNOWLEDGEMENTS
> =======================
> 
> I would like to take this opportunity to thank below people for various
> discussions with me over different channels during the development:
> 
> Marc Zyngier (Google)               Catalin Marinas (ARM),
> James Morse(ARM),                   Will Deacon (Google),
> Jean-Phillipe Brucker (Linaro),     Sudeep Holla (ARM),
> Lorenzo Pieralisi (Linaro),         Gavin Shan (Redhat),
> Jonathan Cameron (Huawei),          Darren Hart (Ampere),
> Igor Mamedov (Redhat),              Ilkka Koskinen (Ampere),
> Andrew Jones (Redhat),              Karl Heubaum (Oracle),
> Keqian Zhu (Huawei),                Miguel Luis (Oracle),
> Xiongfeng Wang (Huawei),            Vishnu Pajjuri (Ampere),
> Shameerali Kolothum (Huawei)        Russell King (Oracle)
> Xuwei/Joy (Huawei),                 Peter Maydel (Linaro)
> Zengtao/Prime (Huawei),             And all those whom I have missed!
> 
> Many thanks to the following people for their current or past contributions:
> 
> 1. James Morse (ARM)
>     (Current Kernel part of vCPU Hotplug Support on AARCH64)
> 2. Jean-Philippe Brucker (Linaro)
>     (Prototyped one of the earlier PSCI-based POC [17][18] based on RFC V1)
> 3. Keqian Zhu (Huawei)
>     (Co-developed Qemu prototype)
> 4. Xiongfeng Wang (Huawei)
>     (Co-developed an earlier kernel prototype with me)
> 5. Vishnu Pajjuri (Ampere)
>     (Verification on Ampere ARM64 Platforms + fixes)
> 6. Miguel Luis (Oracle)
>     (Verification on Oracle ARM64 Platforms + fixes)
> 7. Russell King (Oracle) & Jonathan Cameron (Huawei)
>     (Helping in upstreaming James Morse's Kernel patches).
> 
> (XIV) Change Log:
> =================
> 
> RFC V2 -> RFC V3:
> -----------------
> 1. Miscellaneous:
>     - Split the RFC V2 into arch-agnostic and arch-specific patch sets.
> 2. Addressed Gavin Shan's (RedHat) comments:
>     - Made CPU property accessors inline.
>       https://lore.kernel.org/qemu-devel/6cd28639-2cfa-f233-c6d9-d5d2ec5b1c58@redhat.com/
>     - Collected Reviewed-bys [PATCH RFC V2 4/37, 14/37, 22/37].
>     - Dropped the patch as it was not required after init logic was refactored.
>       https://lore.kernel.org/qemu-devel/4fb2eef9-6742-1eeb-721a-b3db04b1be97@redhat.com/
>     - Fixed the range check for the core during vCPU Plug.
>       https://lore.kernel.org/qemu-devel/1c5fa24c-6bf3-750f-4f22-087e4a9311af@redhat.com/
>     - Added has_hotpluggable_vcpus check to make build_cpus_aml() conditional.
>       https://lore.kernel.org/qemu-devel/832342cb-74bc-58dd-c5d7-6f995baeb0f2@redhat.com/
>     - Fixed the states initialization in cpu_hotplug_hw_init() to accommodate previous refactoring.
>       https://lore.kernel.org/qemu-devel/da5e5609-1883-8650-c7d8-6868c7b74f1c@redhat.com/
>     - Fixed typos.
>       https://lore.kernel.org/qemu-devel/eb1ac571-7844-55e6-15e7-3dd7df21366b@redhat.com/
>     - Removed the unnecessary 'goto fail'.
>       https://lore.kernel.org/qemu-devel/4d8980ac-f402-60d4-fe52-787815af8a7d@redhat.com/#t
>     - Added check for hotpluggable vCPUs in the _OSC method.
>       https://lore.kernel.org/qemu-devel/20231017001326.FUBqQ1PTowF2GxQpnL3kIW0AhmSqbspazwixAHVSi6c@z/
> 3. Addressed Shaoqin Huang's (Intel) comments:
>     - Fixed the compilation break due to the absence of a call to virt_cpu_properties() missing
>       along with its definition.
>       https://lore.kernel.org/qemu-devel/3632ee24-47f7-ae68-8790-26eb2cf9950b@redhat.com/
> 4. Addressed Jonathan Cameron's (Huawei) comments:
>     - Gated the 'disabled vcpu message' for GIC version < 3.
>       https://lore.kernel.org/qemu-devel/20240116155911.00004fe1@Huawei.com/
> 
> RFC V1 -> RFC V2:
> -----------------
> 1. Addressed James Morse's (ARM) requirement as per Linaro Open Discussion:
>     - Exposed all possible vCPUs as always ACPI _STA.present and available during boot time.
>     - Added the _OSC handling as required by James's patches.
>     - Introduction of 'online-capable' bit handling in the flag of MADT GICC.
>     - SMCC Hypercall Exit handling in Qemu.
> 2. Addressed Marc Zyngier's comment:
>     - Fixed the note about GIC CPU Interface in the cover letter.
> 3. Addressed issues raised by Vishnu Pajjuru (Ampere) & Miguel Luis (Oracle) during testing:
>     - Live/Pseudo Migration crashes.
> 4. Others:
>     - Introduced the concept of persistent vCPU at QOM.
>     - Introduced wrapper APIs of present, possible, and persistent.
>     - Change at ACPI hotplug H/W init leg accommodating initializing is_present and is_enabled states.
>     - Check to avoid unplugging cold-booted vCPUs.
>     - Disabled hotplugging with TCG/HVF/QTEST.
>     - Introduced CPU Topology, {socket, cluster, core, thread}-id property.
>     - Extract virt CPU properties as a common virt_vcpu_properties() function.
> 
> Author Salil Mehta (1):
>    target/arm/kvm,tcg: Register/Handle SMCCC hypercall exits to VMM/Qemu
> 
> Jean-Philippe Brucker (2):
>    hw/acpi: Make _MAT method optional
>    target/arm/kvm: Write CPU state back to KVM on reset
> 
> Miguel Luis (1):
>    tcg/mttcg: enable threads to unregister in tcg_ctxs[]
> 
> Salil Mehta (25):
>    arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id
>      property
>    cpu-common: Add common CPU utility for possible vCPUs
>    hw/arm/virt: Limit number of possible vCPUs for unsupported Accel or
>      GIC Type
>    hw/arm/virt: Move setting of common CPU properties in a function
>    arm/virt,target/arm: Machine init time change common to vCPU
>      {cold|hot}-plug
>    arm/virt,kvm: Pre-create disabled possible vCPUs @machine init
>    arm/virt,gicv3: Changes to pre-size GIC with possible vcpus @machine
>      init
>    arm/virt: Init PMU at host for all possible vcpus
>    arm/acpi: Enable ACPI support for vcpu hotplug
>    arm/virt: Add cpu hotplug events to GED during creation
>    arm/virt: Create GED dev before *disabled* CPU Objs are destroyed
>    arm/virt/acpi: Build CPUs AML with CPU Hotplug support
>    arm/virt: Make ARM vCPU *present* status ACPI *persistent*
>    hw/acpi: ACPI/AML Changes to reflect the correct _STA.{PRES,ENA} Bits
>      to Guest
>    hw/arm: MADT Tbl change to size the guest with possible vCPUs
>    arm/virt: Release objects for *disabled* possible vCPUs after init
>    arm/virt: Add/update basic hot-(un)plug framework
>    arm/virt: Changes to (un)wire GICC<->vCPU IRQs during hot-(un)plug
>    hw/arm,gicv3: Changes to update GIC with vCPU hot-plug notification
>    hw/intc/arm-gicv3*: Changes required to (re)init the vCPU register
>      info
>    arm/virt: Update the guest(via GED) about CPU hot-(un)plug events
>    hw/arm: Changes required for reset and to support next boot
>    target/arm: Add support of *unrealize* ARMCPU during vCPU Hot-unplug
>    hw/arm: Support hotplug capability check using _OSC method
>    hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled
> 
>   accel/tcg/tcg-accel-ops-mttcg.c    |   1 +
>   cpu-common.c                       |  37 ++
>   hw/acpi/cpu.c                      |  62 +-
>   hw/acpi/generic_event_device.c     |  11 +
>   hw/arm/Kconfig                     |   1 +
>   hw/arm/boot.c                      |   2 +-
>   hw/arm/virt-acpi-build.c           | 113 +++-
>   hw/arm/virt.c                      | 877 +++++++++++++++++++++++------
>   hw/core/gpio.c                     |   2 +-
>   hw/intc/arm_gicv3.c                |   1 +
>   hw/intc/arm_gicv3_common.c         |  66 ++-
>   hw/intc/arm_gicv3_cpuif.c          | 269 +++++----
>   hw/intc/arm_gicv3_cpuif_common.c   |   5 +
>   hw/intc/arm_gicv3_kvm.c            |  39 +-
>   hw/intc/gicv3_internal.h           |   2 +
>   include/hw/acpi/cpu.h              |   2 +
>   include/hw/arm/boot.h              |   2 +
>   include/hw/arm/virt.h              |  38 +-
>   include/hw/core/cpu.h              |  78 +++
>   include/hw/intc/arm_gicv3_common.h |  23 +
>   include/hw/qdev-core.h             |   2 +
>   include/tcg/startup.h              |   7 +
>   target/arm/arm-powerctl.c          |  51 +-
>   target/arm/cpu-qom.h               |  18 +-
>   target/arm/cpu.c                   | 112 ++++
>   target/arm/cpu.h                   |  18 +
>   target/arm/cpu64.c                 |  15 +
>   target/arm/gdbstub.c               |   6 +
>   target/arm/helper.c                |  27 +-
>   target/arm/internals.h             |  14 +-
>   target/arm/kvm.c                   | 146 ++++-
>   target/arm/kvm_arm.h               |  25 +
>   target/arm/meson.build             |   1 +
>   target/arm/{tcg => }/psci.c        |   8 +
>   target/arm/tcg/meson.build         |   4 -
>   tcg/tcg.c                          |  24 +
>   36 files changed, 1749 insertions(+), 360 deletions(-)
>   rename target/arm/{tcg => }/psci.c (97%)
>

Alex Bennée Aug. 29, 2024, 9:59 a.m. UTC | #16

Gustavo Romero <gustavo.romero@linaro.org> writes:

> Hi Salil,
>
> On 6/13/24 8:36 PM, Salil Mehta via wrote:
<snip>
>> (VI) Commands Used
>> ==================
>> A. Qemu launch commands to init the machine:
>>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3
>> \
>>        -cpu host -smp cpus=4,maxcpus=6 \
>>        -m 300M \
>>        -kernel Image \
>>        -initrd rootfs.cpio.gz \
>>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
>>        -nographic \
>>        -bios QEMU_EFI.fd \
>> B. Hot-(un)plug related commands:
>>    # Hotplug a host vCPU (accel=kvm):
>>      $ device_add host-arm-cpu,id=core4,core-id=4
>>    # Hotplug a vCPU (accel=tcg):
>>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>
> Since support for hotplug is disabled on TCG, remove
> these two lines in v4 cover letter?

Why is it disabled for TCG? We should aim for TCG being as close to KVM
as possible for developers even if it is not a production solution.

Salil Mehta Sept. 4, 2024, 2:03 p.m. UTC | #17

Hi Gustavo,

>  From: Gustavo Romero <gustavo.romero@linaro.org>
>  Sent: Wednesday, August 28, 2024 9:36 PM
>  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  qemu-arm@nongnu.org; mst@redhat.com
>  
>  Hi Salil,
>  
>  On 6/13/24 8:36 PM, Salil Mehta via wrote:
>  > PROLOGUE
>  > ========
>  >
>  > To assist in review and set the right expectations from this RFC,
>  > please first read the sections *APPENDED AT THE END* of this cover
>  letter:
>  >
>  > 1. Important *DISCLAIMER* [Section (X)] 2. Work presented at
>  KVMForum
>  > Conference (slides available) [Section (V)F] 3. Organization of
>  > patches [Section (XI)] 4. References [Section (XII)] 5. Detailed TODO
>  > list of leftover work or work-in-progress [Section (IX)]
>  >
>  > There has been interest shown by other organizations in adapting this
>  > series for their architecture. Hence, RFC V2 [21] has been split into
>  > architecture
>  > *agnostic* [22] and *specific* patch sets.
>  >
>  > This is an ARM architecture-specific patch set carved out of RFC V2.
>  > Please check section (XI)B for details of architecture agnostic patches.
>  >
>  > SECTIONS [I - XIII] are as follows:
>  >
>  > (I) Key Changes [details in last section (XIV)]
>  > ==============================================
>  >
>  > RFC V2 -> RFC V3
>  >

[...]

>  >
>  > (VI) Commands Used
>  > ==================
>  >
>  > A. Qemu launch commands to init the machine:
>  >
>  >      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
>  >        -cpu host -smp cpus=4,maxcpus=6 \
>  >        -m 300M \
>  >        -kernel Image \
>  >        -initrd rootfs.cpio.gz \
>  >        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2
>  acpi=force" \
>  >        -nographic \
>  >        -bios QEMU_EFI.fd \
>  >
>  > B. Hot-(un)plug related commands:
>  >
>  >    # Hotplug a host vCPU (accel=kvm):
>  >      $ device_add host-arm-cpu,id=core4,core-id=4
>  >
>  >    # Hotplug a vCPU (accel=tcg):
>  >      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>  
>  Since support for hotplug is disabled on TCG, remove these two lines in v4
>  cover letter?


We are fixing that and it should be part of RFC V4.


Thanks
Salil.


>  
>  
>  Cheers,
>  Gustavo
>  
>  >    # Delete the vCPU:
>  >      $ device_del core4
>  >

[...]

Salil Mehta Sept. 4, 2024, 2:24 p.m. UTC | #18

Hi Alex,

>  -----Original Message-----
>  From: Alex Bennée <alex.bennee@linaro.org>
>  Sent: Thursday, August 29, 2024 11:00 AM
>  To: Gustavo Romero <gustavo.romero@linaro.org>
>  
>  Gustavo Romero <gustavo.romero@linaro.org> writes:
>  
>  > Hi Salil,
>  >
>  > On 6/13/24 8:36 PM, Salil Mehta via wrote:
>  <snip>
>  >> (VI) Commands Used
>  >> ==================
>  >> A. Qemu launch commands to init the machine:
>  >>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
>  >>        -cpu host -smp cpus=4,maxcpus=6 \
>  >>        -m 300M \
>  >>        -kernel Image \
>  >>        -initrd rootfs.cpio.gz \
>  >>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2
>  acpi=force" \
>  >>        -nographic \
>  >>        -bios QEMU_EFI.fd \
>  >> B. Hot-(un)plug related commands:
>  >>    # Hotplug a host vCPU (accel=kvm):
>  >>      $ device_add host-arm-cpu,id=core4,core-id=4
>  >>    # Hotplug a vCPU (accel=tcg):
>  >>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>  >
>  > Since support for hotplug is disabled on TCG, remove these two lines
>  > in v4 cover letter?
>  
>  Why is it disabled for TCG? We should aim for TCG being as close to KVM as
>  possible for developers even if it is not a production solution.

Agreed In principle. Yes, that would be of help.


Context why it was disabled although most code to support TCG exist:

I had reported a crash in the RFC V1 (June 2020) about TCGContext counter
overflow assertion during repeated hot(un)plug operation. Miguel from Oracle
was able to reproduce this problem last year in Feb and also suggested a fix but he
later found out in his testing that there was a problem during migration.

RFC V1 June 2020:
https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
Scroll to below:
[...]
THINGS TO DO:
 (*) Migration support 
 (*) TCG/Emulation support is not proper right now. Works to a certain extent
     but is not complete. especially the unrealize part in which there is a
     overflow of tcg contexts. The last is due to the fact tcg maintains a 
     count on number of context(per thread instance) so as we hotplug the vcpus
     this counter keeps on incrementing. But during hot-unplug the counter is
     not decremented.

@ Feb 2023, [Linaro-open-discussions] Re: Qemu TCG support for virtual-cpuhotplug/online-policy 

https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/GMDFTEZE6WUUI7LZAYOWLXFHAPXLCND5/

Last status reported by Miguel was that there was problem with the TCG and he intended
to fix this. He was on paternity leave so I will try to gather the exact status of the TCG today.

Thanks
Salil


>  
>  --
>  Alex Bennée
>  Virtualisation Tech Lead @ Linaro

Alex Bennée Sept. 4, 2024, 3:45 p.m. UTC | #19

Salil Mehta <salil.mehta@huawei.com> writes:

> Hi Alex,
>
>>  -----Original Message-----
>>  From: Alex Bennée <alex.bennee@linaro.org>
>>  Sent: Thursday, August 29, 2024 11:00 AM
>>  To: Gustavo Romero <gustavo.romero@linaro.org>
>>  
>>  Gustavo Romero <gustavo.romero@linaro.org> writes:
>>  
>>  > Hi Salil,
>>  >
>>  > On 6/13/24 8:36 PM, Salil Mehta via wrote:
>>  <snip>
>>  >> (VI) Commands Used
>>  >> ==================
>>  >> A. Qemu launch commands to init the machine:
>>  >>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
>>  >>        -cpu host -smp cpus=4,maxcpus=6 \
>>  >>        -m 300M \
>>  >>        -kernel Image \
>>  >>        -initrd rootfs.cpio.gz \
>>  >>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2
>>  acpi=force" \
>>  >>        -nographic \
>>  >>        -bios QEMU_EFI.fd \
>>  >> B. Hot-(un)plug related commands:
>>  >>    # Hotplug a host vCPU (accel=kvm):
>>  >>      $ device_add host-arm-cpu,id=core4,core-id=4
>>  >>    # Hotplug a vCPU (accel=tcg):
>>  >>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>>  >
>>  > Since support for hotplug is disabled on TCG, remove these two lines
>>  > in v4 cover letter?
>>  
>>  Why is it disabled for TCG? We should aim for TCG being as close to KVM as
>>  possible for developers even if it is not a production solution.
>
> Agreed In principle. Yes, that would be of help.
>
>
> Context why it was disabled although most code to support TCG exist:
>
> I had reported a crash in the RFC V1 (June 2020) about TCGContext counter
> overflow assertion during repeated hot(un)plug operation. Miguel from Oracle
> was able to reproduce this problem last year in Feb and also suggested a fix but he
> later found out in his testing that there was a problem during migration.
>
> RFC V1 June 2020:
> https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.mehta@huawei.com/
> Scroll to below:
> [...]
> THINGS TO DO:
>  (*) Migration support 
>  (*) TCG/Emulation support is not proper right now. Works to a certain extent
>      but is not complete. especially the unrealize part in which there is a
>      overflow of tcg contexts. The last is due to the fact tcg maintains a 
>      count on number of context(per thread instance) so as we hotplug the vcpus
>      this counter keeps on incrementing. But during hot-unplug the counter is
>      not decremented.

Right so the translation cache is segmented by vCPU to support parallel
JIT operations. The easiest solution would be to ensure we dimension for
the maximum number of vCPUs, which it should already, see tcg_init_machine():

  unsigned max_cpus = ms->smp.max_cpus;
  ...
  tcg_init(s->tb_size * MiB, s->splitwx_enabled, max_cpus);

>
> @ Feb 2023, [Linaro-open-discussions] Re: Qemu TCG support for virtual-cpuhotplug/online-policy 
>
> https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/GMDFTEZE6WUUI7LZAYOWLXFHAPXLCND5/
>
> Last status reported by Miguel was that there was problem with the TCG and he intended
> to fix this. He was on paternity leave so I will try to gather the exact status of the TCG today.
>
> Thanks
> Salil
>
>
>>  
>>  --
>>  Alex Bennée
>>  Virtualisation Tech Lead @ Linaro

Salil Mehta Sept. 4, 2024, 3:59 p.m. UTC | #20

Hi Alex,

>  From: Alex Bennée <alex.bennee@linaro.org>
>  Sent: Wednesday, September 4, 2024 4:46 PM
>  To: Salil Mehta <salil.mehta@huawei.com>
>  
>  Salil Mehta <salil.mehta@huawei.com> writes:
>  
>  > Hi Alex,
>  >
>  >>  -----Original Message-----
>  >>  From: Alex Bennée <alex.bennee@linaro.org>
>  >>  Sent: Thursday, August 29, 2024 11:00 AM
>  >>  To: Gustavo Romero <gustavo.romero@linaro.org>
>  >>
>  >>  Gustavo Romero <gustavo.romero@linaro.org> writes:
>  >>
>  >>  > Hi Salil,
>  >>  >
>  >>  > On 6/13/24 8:36 PM, Salil Mehta via wrote:
>  >>  <snip>
>  >>  >> (VI) Commands Used
>  >>  >> ==================
>  >>  >> A. Qemu launch commands to init the machine:
>  >>  >>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3
>  \
>  >>  >>        -cpu host -smp cpus=4,maxcpus=6 \
>  >>  >>        -m 300M \
>  >>  >>        -kernel Image \
>  >>  >>        -initrd rootfs.cpio.gz \
>  >>  >>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init
>  maxcpus=2
>  >>  acpi=force" \
>  >>  >>        -nographic \
>  >>  >>        -bios QEMU_EFI.fd \
>  >>  >> B. Hot-(un)plug related commands:
>  >>  >>    # Hotplug a host vCPU (accel=kvm):
>  >>  >>      $ device_add host-arm-cpu,id=core4,core-id=4
>  >>  >>    # Hotplug a vCPU (accel=tcg):
>  >>  >>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>  >>  >
>  >>  > Since support for hotplug is disabled on TCG, remove these two
>  >> lines  > in v4 cover letter?
>  >>
>  >>  Why is it disabled for TCG? We should aim for TCG being as close to
>  >> KVM as  possible for developers even if it is not a production solution.
>  >
>  > Agreed In principle. Yes, that would be of help.
>  >
>  >
>  > Context why it was disabled although most code to support TCG exist:
>  >
>  > I had reported a crash in the RFC V1 (June 2020) about TCGContext
>  > counter overflow assertion during repeated hot(un)plug operation.
>  > Miguel from Oracle was able to reproduce this problem last year in Feb
>  > and also suggested a fix but he later found out in his testing that there was
>  a problem during migration.
>  >
>  > RFC V1 June 2020:
>  > https://lore.kernel.org/qemu-devel/20200613213629.21984-1-
>  salil.mehta@
>  > huawei.com/
>  > Scroll to below:
>  > [...]
>  > THINGS TO DO:
>  >  (*) Migration support
>  >  (*) TCG/Emulation support is not proper right now. Works to a certain
>  extent
>  >      but is not complete. especially the unrealize part in which there is a
>  >      overflow of tcg contexts. The last is due to the fact tcg maintains a
>  >      count on number of context(per thread instance) so as we hotplug the
>  vcpus
>  >      this counter keeps on incrementing. But during hot-unplug the counter
>  is
>  >      not decremented.
>  
>  Right so the translation cache is segmented by vCPU to support parallel JIT
>  operations. The easiest solution would be to ensure we dimension for the
>  maximum number of vCPUs, which it should already, see
>  tcg_init_machine():
>  
>    unsigned max_cpus = ms->smp.max_cpus;
>    ...
>    tcg_init(s->tb_size * MiB, s->splitwx_enabled, max_cpus);


Agreed. We have done that and have a patch for that as well. But it is still
a work-in-progress and I've lost context a bit.

https://github.com/salil-mehta/qemu/commit/107cf5ca7cf3716bc0f8c68e98e1da3939f449ce

For now, I've very quickly tried to enable and run the TCG to gain back the context.
I've now hit a different problem during TCG vCPU unrealization phase, while
pthread_join() waits on halt condition variable for MTTCG vCPU thread to exit,
there is a crash somewhere. Look like some race condition. Will dig this further.
 

Best regards
Salil.

>  > @ Feb 2023, [Linaro-open-discussions] Re: Qemu TCG support for
>  > virtual-cpuhotplug/online-policy
>  >
>  > https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-l
>  > ists.linaro.org/message/GMDFTEZE6WUUI7LZAYOWLXFHAPXLCND5/
>  >
>  > Last status reported by Miguel was that there was problem with the TCG
>  > and he intended to fix this. He was on paternity leave so I will try to gather
>  the exact status of the TCG today.
>  >
>  > Thanks
>  > Salil
>  >
>  >
>  >>
>  >>  --
>  >>  Alex Bennée
>  >>  Virtualisation Tech Lead @ Linaro
>  
>  --
>  Alex Bennée
>  Virtualisation Tech Lead @ Linaro

Salil Mehta Sept. 6, 2024, 3:06 p.m. UTC | #21

Hi Alex,

>  From: qemu-arm-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
>  arm-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
>  Mehta via
>  Sent: Wednesday, September 4, 2024 5:00 PM
>  To: Alex Bennée <alex.bennee@linaro.org>
>  
>  Hi Alex,
>  
>  >  From: Alex Bennée <alex.bennee@linaro.org>
>  >  Sent: Wednesday, September 4, 2024 4:46 PM
>  >  To: Salil Mehta <salil.mehta@huawei.com>
>  >
>  >  Salil Mehta <salil.mehta@huawei.com> writes:
>  >
>  >  > Hi Alex,
>  >  >
>  >  >>  -----Original Message-----
>  >  >>  From: Alex Bennée <alex.bennee@linaro.org>  >>  Sent: Thursday,
>  > August 29, 2024 11:00 AM  >>  To: Gustavo Romero
>  > <gustavo.romero@linaro.org>  >>  >>  Gustavo Romero
>  > <gustavo.romero@linaro.org> writes:
>  >  >>
>  >  >>  > Hi Salil,
>  >  >>  >
>  >  >>  > On 6/13/24 8:36 PM, Salil Mehta via wrote:
>  >  >>  <snip>
>  >  >>  >> (VI) Commands Used
>  >  >>  >> ==================
>  >  >>  >> A. Qemu launch commands to init the machine:
>  >  >>  >>      $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3   \
>  >  >>  >>        -cpu host -smp cpus=4,maxcpus=6 \
>  >  >>  >>        -m 300M \
>  >  >>  >>        -kernel Image \
>  >  >>  >>        -initrd rootfs.cpio.gz \
>  >  >>  >>        -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2
>  >  >>  acpi=force" \
>  >  >>  >>        -nographic \
>  >  >>  >>        -bios QEMU_EFI.fd \
>  >  >>  >> B. Hot-(un)plug related commands:
>  >  >>  >>    # Hotplug a host vCPU (accel=kvm):
>  >  >>  >>      $ device_add host-arm-cpu,id=core4,core-id=4
>  >  >>  >>    # Hotplug a vCPU (accel=tcg):
>  >  >>  >>      $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
>  >  >>  >
>  >  >>  > Since support for hotplug is disabled on TCG, remove these two
>  > >> lines  > in v4 cover letter?
>  >  >>
>  >  >>  Why is it disabled for TCG? We should aim for TCG being as close
>  > to  >> KVM as  possible for developers even if it is not a production solution.
>  >  >
>  >  > Agreed In principle. Yes, that would be of help.
>  >  >
>  >  >
>  >  > Context why it was disabled although most code to support TCG exist:
>  >  >
>  >  > I had reported a crash in the RFC V1 (June 2020) about TCGContext
>  > > counter overflow assertion during repeated hot(un)plug operation.
>  >  > Miguel from Oracle was able to reproduce this problem last year in
>  > Feb  > and also suggested a fix but he later found out in his testing
>  > that there was  a problem during migration.
>  >  >
>  >  > RFC V1 June 2020:
>  >  > https://lore.kernel.org/qemu-devel/20200613213629.21984-1-
>  >  salil.mehta@
>  >  > huawei.com/
>  >  > Scroll to below:
>  >  > [...]
>  >  > THINGS TO DO:
>  >  >  (*) Migration support
>  >  >  (*) TCG/Emulation support is not proper right now. Works to a certain  extent
>  >  >      but is not complete. especially the unrealize part in which there is a
>  >  >      overflow of tcg contexts. The last is due to the fact tcg maintains a
>  >  >      count on number of context(per thread instance) so as we hotplug the vcpus
>  >  >      this counter keeps on incrementing. But during hot-unplug the counter is
>  >  >      not decremented.
>  >
>  >  Right so the translation cache is segmented by vCPU to support
>  > parallel JIT  operations. The easiest solution would be to ensure we
>  > dimension for the  maximum number of vCPUs, which it should already, see
>  >  tcg_init_machine():
>  >
>  >    unsigned max_cpus = ms->smp.max_cpus;
>  >    ...
>  >    tcg_init(s->tb_size * MiB, s->splitwx_enabled, max_cpus);
>  
>  
>  Agreed. We have done that and have a patch for that as well. But it is still a
>  work-in-progress and I've lost context a bit.
>  
>  https://github.com/salil-
>  mehta/qemu/commit/107cf5ca7cf3716bc0f8c68e98e1da3939f449ce
>  
>  For now, I've very quickly tried to enable and run the TCG to gain back the
>  context.
>  I've now hit a different problem during TCG vCPU unrealization phase, while
>  pthread_join() waits on halt condition variable for MTTCG vCPU thread to
>  exit, there is a crash somewhere. Look like some race condition. Will dig this
>  further.


It appears that there was a race condition occurring between destruction of the
CPU Address Space and the delayed processing of the tcg_commit_cpu() function.
The latter is primarily responsible for:

1. Updating of memory dispatch pointer 
2. Performing the tlb_flush() operation.

This process involves calling the CPU Address Space Memory listener's
tcg_commit(),  which queues this work item for the CPU to be executed by
the vCPU at the earliest opportunity. During ARM vCPU unrealization, we
were destroying Address Space first, followed by calling cpu_remove_sync().
This resulted vCPU thread being licked out of IO wait state, leading to
processing of the vCPU work queue items. Since the CPU Address Space
had already been destroyed, this caused the Segmentation fault.

I've resolved this issue by delaying the destruction of CPU Address Space
until the cpu_remove_sync() operation has been completed, but before
the  parent is unrealized. This has resolved the crash. The vCPU Hotplug
operation seems to be working on TCG now. I still need to test the migration
process, which I plan to do in the next couple of days. Please have a look
at below patch and the repository.

https://github.com/salil-mehta/qemu/commit/9fbb8ecbc61c6405db342cc243b2be17b1c97e03
https://github.com/salil-mehta/qemu/commit/1900893449c1b6a10e1534635f29bfb545b825d0


Please check the below branch:
https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v4-rc5


Best regards
Salil.


>  >  > @ Feb 2023, [Linaro-open-discussions] Re: Qemu TCG support for  >
>  > virtual-cpuhotplug/online-policy  >  >
>  > https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-l
>  >  > ists.linaro.org/message/GMDFTEZE6WUUI7LZAYOWLXFHAPXLCND5/
>  >  >
>  >  > Last status reported by Miguel was that there was problem with the
>  > TCG  > and he intended to fix this. He was on paternity leave so I
>  > will try to gather  the exact status of the TCG today.
>  >  >
>  >  > Thanks
>  >  > Salil
>  >  >
>  >  >
>  >  >>
>  >  >>  --
>  >  >>  Alex Bennée
>  >  >>  Virtualisation Tech Lead @ Linaro
>  >
>  >  --
>  >  Alex Bennée
>  >  Virtualisation Tech Lead @ Linaro

[RFC,V3,00/29] Support of Virtual CPU Hotplug for ARMv8 Arch

Message

Comments