[RFC,00/22] Support of Virtual CPU Hotplug for ARMv8 Arch

Message ID	20200613213629.21984-1-salil.mehta@huawei.com (mailing list archive)
Headers	show Return-Path: <SRS0=A7cj=72=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ED05520789 From: Salil Mehta <salil.mehta@huawei.com> To: <qemu-devel@nongnu.org>, <qemu-arm@nongnu.org> Subject: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch Date: Sat, 13 Jun 2020 22:36:07 +0100 Message-ID: <20200613213629.21984-1-salil.mehta@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=45.249.212.32; envelope-from=salil.mehta@huawei.com; helo=huawei.com Precedence: list Cc: peter.maydell@linaro.org, drjones@redhat.com, sudeep.holla@arm.com, gshan@redhat.com, mst@redhat.com, jiakernel2@gmail.com, maz@kernel.org, zhukeqian1@huawei.com, david@redhat.com, richard.henderson@linaro.org, linuxarm@huawei.com, eric.auger@redhat.com, james.morse@arm.com, catalin.marinas@arm.com, imammedo@redhat.com, Salil Mehta <salil.mehta@huawei.com>, pbonzini@redhat.com, mehta.salil.lnk@gmail.com, maran.wilson@oracle.com, will@kernel.org, wangxiongfeng2@huawei.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	Support of Virtual CPU Hotplug for ARMv8 Arch \| expand [RFC,00/22] Support of Virtual CPU Hotplug for ARMv8 Arch [RFC,01/22] arm/cpuhp: Add QMP vcpu params validation support [RFC,02/22] arm/cpuhp: Add new ARMCPU core-id property [RFC,03/22] arm/cpuhp: Add common cpu utility for possible vcpus [RFC,04/22] arm/cpuhp: Machine init time change common to vcpu {cold\|hot}-plug [RFC,05/22] arm/cpuhp: Pre-create disabled possible vcpus @machine init [RFC,06/22] arm/cpuhp: Changes to pre-size GIC with possible vcpus @machine init [RFC,07/22] arm/cpuhp: Init PMU at host for all possible vcpus [RFC,08/22] arm/cpuhp: Enable ACPI support for vcpu hotplug [RFC,09/22] arm/cpuhp: Init GED framework with cpu hotplug events [RFC,10/22] arm/cpuhp: Update CPUs AML with cpu-(ctrl)dev change [RFC,11/22] arm/cpuhp: Update GED _EVT method AML with cpu scan [RFC,12/22] arm/cpuhp: MADT Tbl change to size the guest with possible vcpus [RFC,13/22] arm/cpuhp: Add ACPI _MAT entry for Processor object [RFC,14/22] arm/cpuhp: Release objects for disabled possible vcpus after init [RFC,15/22] arm/cpuhp: Update ACPI GED framework to support vcpu hotplug [RFC,16/22] arm/cpuhp: Add/update basic hot-(un)plug framework [RFC,17/22] arm/cpuhp: Changes to (un)wire GICC<->VCPU IRQs during hot-(un)plug [RFC,18/22] arm/cpuhp: Changes to update GIC with vcpu hot-plug notification [RFC,19/22] arm/cpuhp: Changes required to (re)init the vcpu register info [RFC,20/22] arm/cpuhp: Update the guest(via GED) about cpu hot-(un)plug events [RFC,21/22] arm/cpuhp: Changes required for reset and to support next boot [RFC,22/22] arm/cpuhp: Add support of unrealize ARMCPU during vcpu hot-unplug

Salil Mehta June 13, 2020, 9:36 p.m. UTC

This patch-set introduces the virtual cpu hotplug support for ARMv8
architecture in QEMU. Idea is to be able to hotplug and hot-unplug the vcpus
while guest VM is running and no reboot is required. This does *not* makes any
assumption of the physical cpu hotplug availability within the host system but
rather tries to solve the problem at virtualizer/QEMU layer and by introducing
cpu hotplug hooks and event handling within the guest kernel. No changes are
required within the host kernel/KVM.

Motivation:
This allows scaling the guest VM compute capacity on-demand which would be
useful for the following example scenarios,
1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
   framework which could adjust resource requests (CPU and Mem requests) for
   the containers in a pod, based on usage.
2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
   restrict the total number of compute resources available to the guest VM
   according to the SLA(Service Level Agreement). VM owner could request for
   more compute to be hot-plugged for some cost.

Terminology:

(*) Present cpus: Total cpus with which guest has/will boot and are available
                  to guest for use and can be onlined. Qemu parameter(-smp)
(*) Disabled cpus: Possible cpus which will not be available for the guest to
                   use. These can be hotplugged and made present. These can be
		   thought of as un-plugged vcpus. These will be included as
		   part of sizing.
(*) Posssible cpus: Total vcpus which could ever exist in VM. This includes
                    booted cpus plus any cpus which could be later plugged.
		    - Qemu parameter(-maxcpus)
	            - Possible vcpus = Present vcpus (+) Disabled vcpus


Limitations of ARMv8 Architecture:

A. Physical Limitation to CPU Hotplug:
1. ARMv8 architecture does not support the concept of the physical cpu hotplug.
   The closest thing which is recomended to achieve the cpu hotplug on ARM is
   to bring down power state of the cpu using PSCI.
2. Other ARM components like GIC etc. have not been designed to realize
   physical cpu hotplug capability as of now. 

B. Limitations of GIC to Support Virtual CPU Hotplug:
1. GIC requires various resources(related to GICR/redistributor, GICC/cpu
   interface etc) like memory regions to be fixed at the VM init time and these
   could not be changed later on after VM has inited.
2. Associations between GICC(GIC cpu interface) and vcpu get fixed at the VM
   init time and GIC does not allows to change this association once GIC has
   initialized.

C. Known Limitation of the KVM:
1. As of now KVM allows to create VCPUs but does not allows to delete the
   already created vcpus. QEMU already provides an interface to manage created
   vcpus at KVM level and then to re-use them.
2. Inconsistency in interpretation of the MPIDR generated by KVM for vcpus
   vis-a-vis SMT/threads. This does not looks to be compliant to the MPIDR
   format(SMT is present) as mentioned in the ARMv8 spec. (Please correct my
   understanding if I am wrong here?)
   

Workaround to the problems mentioned in Section B & C1:
1. We pre-size the GIC with possible vcpus at VM init time
2. Pre-create all possible vcpus at KVM and associate them with GICC 
3. Park the unplugged vcpus (similar to x86)


(*) For all of above please refer to Marc's suggestion here[1]


Overview of the Approach:
At the time of machvirt_init() we pre-create all of the possible ARMCPU
objects along with the corresponding KVM vcpus at the host. Disabled KVM vcpu
(which are *not* "present" vcpus but are part of "possible" vcpu list) are
parked at per VM list "kvm_parked_vcpus" after their initialization.

We create the ARMCPU objects(but these are not *realized* in QOM sense) even
for the disabled vcpus to facilitate the GIC initialization (pre-sized with
possible vcpus). After Initialization of the machine is complete we release
the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
re-created at the time when vcpu is hot plugged. This new object is then
re-attached with the earlier parked KVM vcpu which also gets unparked. The
ARMCPU object gets now "realized" in QEMU, which means creation of the
corresponding threads, pre_plug/plug phases, and event notification to the
guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
"unrealization" of the vcpus and will lead to similar ACPI GED events to the
guest for unplug and cleanup and eventually ARMCPU object shall be released and
KVM vcpus shall be parked again.

During machine init, ACPI MADT Table is sized with *possible* vcpus GICC
entries. The unplugged/disabled vcpus are presented as MADT GICC DISABLED
entries to the guest. This means the guest will have its resources pre-sized
with possible vcpus(=present+disabled)

Other approaches to deal with ARMCPU object release(after machine init):
1. The ARMCPU objects for the disabled vcpus are released in context to the
   virt_machine_done() notifier(approach adopted in this patch-set). 
2. Defer the release of current ARMCPU object till the new vcpu object is
   hot plugged.
3. Never release and keep on reusing them and release once at VM exit. This
   solves many problems with above 2 approaches but requires change in the way
   qdev_device_add() fetches/creates the ARMCPU object for the new vcpus being
   hotplugged. For the arm cpu hotplug case we need to figure out way how to
   get access to old object and use it to "re-realize" instead of the new
   ARMCPU object.

Concerns/Questions:
1. In ARM arch a cpu is uniquely represented in hierarchy using various
   affinity levels which could represent thread, core, cluster, package. This
   is generally represented by a value in MPIDR register as per the format
   mentioned in specification. Now, the way MPIDR value is derived for vcpus is
   done using vcpu-index. The concept of thread is not quite as same and rather
   gets lost in the derivation of MPIDR for vcpus.
2. The topology info used to specify the vcpu while hot-plugging might not
   match with the MPIDR value given back by the KVM for the vcpu at the time of
   init. Concept of SMT bit in MPIDR gets lost as per the derivation being done
   in the KVM. Hence, concept of thread-id, core-id, socket-id if used as a
   topology info to derive MPIDR value as per ARM specification will not match
   with MPIDR actually assigned by the KVM? 
   Perhaps need to carry forward work of Andrew? please check here[2]
3. Further if this info is supplied to the guest using PPTT(once introduced in
   QEMU) or even derived using MPIDR shall be inconsistent with the host vcpu. 
4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
   *pending* state for the cpus which have been hot-unplugged? IMHO it looks
   okay but will need Marc's confirmation on this. 
5. If the ARMCPU object is released after the machine init, UEFI could call
   back virt_update_table() to re-build the ACPI tables which might need an
   ARMCPU object. Please check the discussion here[5]


Commands Used:

A. Qemu launch commands to init the machine

$ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
-cpu host -smp cpus=4,maxcpus=6 \
-m 300M \
-kernel Image \
-initrd rootfs.cpio.gz \
-append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
-nographic \
-bios  QEMU_EFI.fd \

B. Hot-(un)plug related commands

# Hotplug a host vcpu(accel=kvm)
$ device_add host-arm-cpu,id=core4,core-id=4

# Hotplug a vcpu(accel=tcg)
$ device_add cortex-a57-arm-cpu,id=core4,core-id=4

# Delete the vcpu
$ device_del core4

NOTE: I have not tested the current solution with '-device' interface. The use
      is suggested by Igor here[6]. I will test this in coming times but looks
      it should work with existing changes. 


Sample output on guest after boot:

$ cat /sys/devices/system/cpu/possible
0-5
$ cat /sys/devices/system/cpu/present
0-3
$ cat /sys/devices/system/cpu/online
0-1
$ cat /sys/devices/system/cpu/offline
2-5


Sample output on guest after hotplug of vcpu=4:

$ cat /sys/devices/system/cpu/possible
0-5
$ cat /sys/devices/system/cpu/present
0-4
$ cat /sys/devices/system/cpu/online
0-1,4
$ cat /sys/devices/system/cpu/offline
2-3,5

Note: vcpu=4 was explicitly 'onlined' after hot-plug
$ echo 1 > /sys/devices/system/cpu/cpu4/online


Repository:
 (*) QEMU changes for vcpu hotplug could be cloned from below site,
     https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1

 (*) Guest Kernel changes required to co-work with the QEMU shall be posted soon
     and repo made available at above site. 


THINGS TO DO:
 (*) Migration support 
 (*) TCG/Emulation support is not proper right now. Works to a certain extent
     but is not complete. especially the unrealize part in which there is a
     overflow of tcg contexts. The last is due to the fact tcg maintains a 
     count on number of context(per thread instance) so as we hotplug the vcpus
     this counter keeps on incrementing. But during hot-unplug the counter is
     not decremented.
 (*) Support of hotplug with NUMA is not proper
 (*) CPU Topology right now is not specified using thread/core/socket but 
     rather flatly indexed using core-id. This needs consideration[2].
 (*) Do we need PPTT Support for to specify right topology info to guest about
     hot-plugged or unplugged vcpus?
 (*) Test cases
 (*) Docs need to be updated.


DISCLAIMER:
This is not a complete work but an effort to present the arm vcpu hotplug
implementation to the community. This work is *mostly* in the lines of the
discussions which have happened in the previous years[see refs below]. This
RFC is being used as a way to verify the idea mentioned above and to further
get community views. As of now this is *not* a production level code and might
have bugs. Only a basic testing has been done on HiSilicon Kunpeng920 SoC for
Servers to verify the proof-of-concept. But the concept works!

ACKNOWLEDGEMENTS:
Although this is just a start of work but I would like to take this opportunity
to thank Marc Zyngier, Igor mammedov, James Morse, Andre, Sudeep Holla, Andrew
Jones, Jonathan Cameron, Shameer, Xuwei/Joy, Zengtao/Prime for discussions at
various points with me. Special thanks to Zhukeqian & Wangxiongfeng for their
contributions in the QEMU and the kernel part. And to the other people of
community who have been involved in the discussions in the past related to this
work and have pitched-in their ideas.

Best regards
Salil.

REFERENCES:
[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01168.html
[3] https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler
[4] https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html 
[5] https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00131.html
[6] https://lkml.org/lkml/2019/7/10/235
[7] https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06517.html
 

Organization of Patches:
[Patch 1-3] Misc logic pre-cursor to machine init
 (*) some validation checks
 (*) introduces core-id property and some util functions required later.
[Patch 4-7,14] logic required during machine init
 (*) Logic to pre-create vcpus
 (*) GIC initialization pre-sized with possible vcpus.
 (*) some refactoring to have common hot and cold plug logic together.
[Patch 8-13] logic related to ACPI at machine init time
 (*) Changes required to Enable ACPI for cpu hotplug
 (*) Changes to enhance ACPI GED to cater CPU Hotplug Events
 (*) ACPI MADT/MAT changes
[Patch 15-22] Logic required during vcpu hot-(un)plug
 (*) Basic framework changes to suppport vcpu hot-(un)plug
 (*) ACPI GED changes for hot-(un)plug hooks.
 (*) wire-unwire the IRQs 
 (*) GIC notification logic
 (*) ARMCPU unrealize logic
 

Salil Mehta (22):
  arm/cpuhp: Add QMP vcpu params validation support
  arm/cpuhp: Add new ARMCPU core-id property
  arm/cpuhp: Add common cpu utility for possible vcpus
  arm/cpuhp: Machine init time change common to vcpu {cold|hot}-plug
  arm/cpuhp: Pre-create disabled possible vcpus @machine init
  arm/cpuhp: Changes to pre-size GIC with possible vcpus @machine init
  arm/cpuhp: Init PMU at host for all possible vcpus
  arm/cpuhp: Enable ACPI support for vcpu hotplug
  arm/cpuhp: Init GED framework with cpu hotplug events
  arm/cpuhp: Update CPUs AML with cpu-(ctrl)dev change
  arm/cpuhp: Update GED _EVT method AML with cpu scan
  arm/cpuhp: MADT Tbl change to size the guest with possible vcpus
  arm/cpuhp: Add ACPI _MAT entry for Processor object
  arm/cpuhp: Release objects for *disabled* possible vcpus after init
  arm/cpuhp: Update ACPI GED framework to support vcpu hotplug
  arm/cpuhp: Add/update basic hot-(un)plug framework
  arm/cpuhp: Changes to (un)wire GICC<->VCPU IRQs during hot-(un)plug
  arm/cpuhp: Changes to update GIC with vcpu hot-plug notification
  arm/cpuhp: Changes required to (re)init the vcpu register info
  arm/cpuhp: Update the guest(via GED) about cpu hot-(un)plug events
  arm/cpuhp: Changes required for reset and to support next boot
  arm/cpuhp: Add support of *unrealize* ARMCPU during vcpu hot-unplug

 accel/kvm/kvm-all.c                    |  31 ++
 cpus-common.c                          |  20 +
 exec.c                                 |  24 +
 gdbstub.c                              |  13 +
 hw/acpi/cpu.c                          |  34 +-
 hw/acpi/generic_event_device.c         |  54 ++
 hw/arm/Kconfig                         |   1 +
 hw/arm/boot.c                          |   2 +-
 hw/arm/virt-acpi-build.c               |  51 +-
 hw/arm/virt.c                          | 717 +++++++++++++++++++++----
 hw/core/qdev.c                         |   2 +-
 hw/i386/acpi-build.c                   |   2 +-
 hw/intc/arm_gicv3.c                    |   1 +
 hw/intc/arm_gicv3_common.c             |  66 ++-
 hw/intc/arm_gicv3_cpuif.c              | 145 ++---
 hw/intc/arm_gicv3_kvm.c                |  34 +-
 hw/intc/gicv3_internal.h               |   2 +
 include/exec/exec-all.h                |   8 +
 include/exec/gdbstub.h                 |   1 +
 include/hw/acpi/cpu.h                  |   5 +-
 include/hw/acpi/cpu_hotplug.h          |   5 +
 include/hw/acpi/generic_event_device.h |   5 +
 include/hw/arm/boot.h                  |   2 +
 include/hw/arm/virt.h                  |   9 +-
 include/hw/core/cpu.h                  |  23 +
 include/hw/intc/arm_gicv3_common.h     |   2 +
 include/hw/qdev-core.h                 |   2 +
 include/sysemu/kvm.h                   |   2 +
 target/arm/cpu-qom.h                   |   3 +
 target/arm/cpu.c                       |  98 ++++
 target/arm/cpu.h                       |  14 +
 target/arm/cpu64.c                     |   9 +
 target/arm/helper.c                    |  31 ++
 target/arm/internals.h                 |   1 +
 target/arm/kvm.c                       |  32 ++
 target/arm/kvm64.c                     |   7 +-
 target/arm/kvm_arm.h                   |  11 +
 37 files changed, 1260 insertions(+), 209 deletions(-)

no-reply@patchew.org June 13, 2020, 10:24 p.m. UTC | #1

Patchew URL: https://patchew.org/QEMU/20200613213629.21984-1-salil.mehta@huawei.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    check-unit: tests/test-char
  TEST    check-qtest-aarch64: tests/qtest/tpm-tis-device-swtpm-test
  TEST    check-qtest-aarch64: tests/qtest/numa-test
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(8) with smp-cpus(8)
  TEST    iotest-qcow2: 003
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(8) with smp-cpus(8)
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(8) with smp-cpus(8)
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(8) with smp-cpus(8)
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(2) with smp-cpus(2)
  TEST    check-qtest-aarch64: tests/qtest/boot-serial-test
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(1) with smp-cpus(1)
  TEST    check-qtest-aarch64: tests/qtest/migration-test
  TEST    check-unit: tests/check-qnum
  TEST    iotest-qcow2: 004
---
qemu-system-aarch64: falling back to tcg
  TEST    iotest-qcow2: 060
  TEST    check-qtest-aarch64: tests/qtest/bios-tables-test
qemu-system-aarch64: warning: For GICv2 max-cpus must be equal to smp-cpus
qemu-system-aarch64: warning: Overriding specified max-cpus(1) with smp-cpus(1)
  TEST    iotest-qcow2: 061
  TEST    iotest-qcow2: 062
  TEST    iotest-qcow2: 063
---
acpi-test: Warning! DSDT binary file mismatch. Actual [aml:/tmp/aml-3PFIM0], Expected [aml:tests/data/acpi/virt/DSDT].
See source file tests/qtest/bios-tables-test.c for instructions on how to update expected files.
to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and re-run tests with V=1 environment variable set**
ERROR:/tmp/qemu-test/src/tests/qtest/bios-tables-test.c:494:test_acpi_asl: assertion failed: (all_tables_match)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/qtest/bios-tables-test.c:494:test_acpi_asl: assertion failed: (all_tables_match)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 073
  TEST    iotest-qcow2: 074
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=85b55b455d55449297469bf3a981cabd', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-i8pzxavb/src/docker-src.2020-06-13-18.08.03.29364:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=85b55b455d55449297469bf3a981cabd
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-i8pzxavb/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    16m30.871s
user    0m9.410s


The full log is available at
http://patchew.org/logs/20200613213629.21984-1-salil.mehta@huawei.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

no-reply@patchew.org June 13, 2020, 10:26 p.m. UTC | #2

Patchew URL: https://patchew.org/QEMU/20200613213629.21984-1-salil.mehta@huawei.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200613213629.21984-1-salil.mehta@huawei.com
Subject: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
dc56c53 arm/cpuhp: Add support of *unrealize* ARMCPU during vcpu hot-unplug
988141e arm/cpuhp: Changes required for reset and to support next boot
d5feac6 arm/cpuhp: Update the guest(via GED) about cpu hot-(un)plug events
185a83e arm/cpuhp: Changes required to (re)init the vcpu register info
659f80f arm/cpuhp: Changes to update GIC with vcpu hot-plug notification
bcfc8e7 arm/cpuhp: Changes to (un)wire GICC<->VCPU IRQs during hot-(un)plug
643af9e arm/cpuhp: Add/update basic hot-(un)plug framework
8e17131 arm/cpuhp: Update ACPI GED framework to support vcpu hotplug
a8d5f8e arm/cpuhp: Release objects for *disabled* possible vcpus after init
1f520fd arm/cpuhp: Add ACPI _MAT entry for Processor object
e9543c1 arm/cpuhp: MADT Tbl change to size the guest with possible vcpus
a41164f arm/cpuhp: Update GED _EVT method AML with cpu scan
72b8a98 arm/cpuhp: Update CPUs AML with cpu-(ctrl)dev change
9d81444 arm/cpuhp: Init GED framework with cpu hotplug events
96ef87b arm/cpuhp: Enable ACPI support for vcpu hotplug
be81157 arm/cpuhp: Init PMU at host for all possible vcpus
cb588d9 arm/cpuhp: Changes to pre-size GIC with possible vcpus @machine init
75fe8dd arm/cpuhp: Pre-create disabled possible vcpus @machine init
55adeca arm/cpuhp: Machine init time change common to vcpu {cold|hot}-plug
9c53779 arm/cpuhp: Add common cpu utility for possible vcpus
36a71f4 arm/cpuhp: Add new ARMCPU core-id property
ecab7ce arm/cpuhp: Add QMP vcpu params validation support

=== OUTPUT BEGIN ===
1/22 Checking commit ecab7ce0dc60 (arm/cpuhp: Add QMP vcpu params validation support)
2/22 Checking commit 36a71f45080c (arm/cpuhp: Add new ARMCPU core-id property)
3/22 Checking commit 9c53779a7878 (arm/cpuhp: Add common cpu utility for possible vcpus)
ERROR: return is not a function, parentheses are not required
#51: FILE: cpus-common.c:104:
+    return (cpu && !cpu->disabled);

total: 1 errors, 0 warnings, 65 lines checked

Patch 3/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/22 Checking commit 55adeca4f0c2 (arm/cpuhp: Machine init time change common to vcpu {cold|hot}-plug)
ERROR: space prohibited between function name and open parenthesis '('
#206: FILE: hw/arm/virt.c:2143:
+    assert (found_cpu);

ERROR: space required before the open brace '{'
#341: FILE: hw/arm/virt.c:2338:
+       (object_dynamic_cast(OBJECT(dev), TYPE_CPU))){

total: 2 errors, 0 warnings, 375 lines checked

Patch 4/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/22 Checking commit 75fe8dd750d3 (arm/cpuhp: Pre-create disabled possible vcpus @machine init)
ERROR: space required after that ',' (ctx:VxV)
#93: FILE: hw/arm/virt.c:1841:
+            qdev_set_id(DEVICE(cpuobj),core_id);
                                       ^

ERROR: suspect code indent for conditional statements (12, 15)
#109: FILE: hw/arm/virt.c:1857:
+            if (kvm_enabled()) {
+               kvm_arm_create_host_vcpu(ARM_CPU(cs));

WARNING: line over 80 characters
#113: FILE: hw/arm/virt.c:1861:
+             * Add disabled vcpu to cpu slot during the init phase of the virt machine.

WARNING: line over 80 characters
#119: FILE: hw/arm/virt.c:1867:
+             * 2. Now, after initialization of the virt machine is complete we could use

WARNING: line over 80 characters
#127: FILE: hw/arm/virt.c:1875:
+             *    We will use the (ii) approach and release the ARMCPU objects after GIC

total: 2 errors, 3 warnings, 159 lines checked

Patch 5/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/22 Checking commit cb588d911d62 (arm/cpuhp: Changes to pre-size GIC with possible vcpus @machine init)
ERROR: braces {} are necessary for all arms of this statement
#105: FILE: hw/intc/arm_gicv3_common.c:355:
+        if (qemu_present_cpu(cpu))
[...]
+        else
[...]

ERROR: braces {} are necessary for all arms of this statement
#121: FILE: hw/intc/arm_gicv3_cpuif.c:782:
+    if (!qemu_present_cpu(cs->cpu))
[...]

ERROR: braces {} are necessary for all arms of this statement
#161: FILE: hw/intc/arm_gicv3_kvm.c:470:
+        if (!qemu_present_cpu(c->cpu))
[...]

ERROR: braces {} are necessary for all arms of this statement
#178: FILE: hw/intc/arm_gicv3_kvm.c:699:
+    if (!qemu_present_cpu(c->cpu))
[...]

ERROR: braces {} are necessary for all arms of this statement
#184: FILE: hw/intc/arm_gicv3_kvm.c:705:
+    if (!qemu_present_cpu(c->cpu))
[...]

ERROR: braces {} are necessary for all arms of this statement
#197: FILE: hw/intc/arm_gicv3_kvm.c:815:
+        if (qemu_present_cpu(cs))
[...]

total: 6 errors, 0 warnings, 160 lines checked

Patch 6/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/22 Checking commit be81157f5799 (arm/cpuhp: Init PMU at host for all possible vcpus)
8/22 Checking commit 96ef87bd215d (arm/cpuhp: Enable ACPI support for vcpu hotplug)
ERROR: braces {} are necessary for all arms of this statement
#43: FILE: hw/acpi/cpu.c:222:
+        if (qemu_present_cpu(cpu))
[...]
+        else
[...]

total: 1 errors, 0 warnings, 48 lines checked

Patch 8/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

9/22 Checking commit 9d81444f4907 (arm/cpuhp: Init GED framework with cpu hotplug events)
10/22 Checking commit 72b8a98eb62a (arm/cpuhp: Update CPUs AML with cpu-(ctrl)dev change)
11/22 Checking commit a41164f24088 (arm/cpuhp: Update GED _EVT method AML with cpu scan)
12/22 Checking commit e9543c111d5d (arm/cpuhp: MADT Tbl change to size the guest with possible vcpus)
ERROR: space prohibited after that open parenthesis '('
#53: FILE: hw/arm/virt-acpi-build.c:632:
+        if ( i < vms->smp_cpus ) {

ERROR: space prohibited before that close parenthesis ')'
#53: FILE: hw/arm/virt-acpi-build.c:632:
+        if ( i < vms->smp_cpus ) {

total: 2 errors, 0 warnings, 39 lines checked

Patch 12/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

13/22 Checking commit 1f520fd3c10b (arm/cpuhp: Add ACPI _MAT entry for Processor object)
ERROR: space required after that ',' (ctx:VxV)
#45: FILE: hw/arm/virt-acpi-build.c:600:
+    AcpiMadtGenericCpuInterface *gicc = acpi_data_push(entry,sizeof(*gicc));
                                                             ^

total: 1 errors, 0 warnings, 50 lines checked

Patch 13/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

14/22 Checking commit a8d5f8e441b4 (arm/cpuhp: Release objects for *disabled* possible vcpus after init)
15/22 Checking commit 8e171312cdaf (arm/cpuhp: Update ACPI GED framework to support vcpu hotplug)
WARNING: line over 80 characters
#49: FILE: hw/acpi/generic_event_device.c:213:
+        error_setg(errp, "virt: device unplug request for the unsupported device"

total: 0 errors, 1 warnings, 75 lines checked

Patch 15/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
16/22 Checking commit 643af9ec7c7d (arm/cpuhp: Add/update basic hot-(un)plug framework)
WARNING: line over 80 characters
#69: FILE: hw/arm/virt.c:2396:
+        /* TODO: update acpi hotplug state and send cpu hotplug event to guest */

WARNING: line over 80 characters
#70: FILE: hw/arm/virt.c:2397:
+        /* TODO: register this cpu for reset & update F/W info for the next boot */

ERROR: that open brace { should be on the previous line
#94: FILE: hw/arm/virt.c:2421:
+    if (cs->cpu_index == first_cpu->cpu_index)
+    {

WARNING: line over 80 characters
#126: FILE: hw/arm/virt.c:2453:
+    /* TODO: unregister this cpu for reset & update F/W info for the next boot */

total: 1 errors, 3 warnings, 146 lines checked

Patch 16/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

17/22 Checking commit bcfc8e7af6d2 (arm/cpuhp: Changes to (un)wire GICC<->VCPU IRQs during hot-(un)plug)
WARNING: Block comments use a leading /* on a separate line
#33: FILE: hw/arm/virt.c:627:
+    /* Mapping from the output timer irq lines from the CPU to the

WARNING: line over 80 characters
#48: FILE: hw/arm/virt.c:642:
+        qdev_disconnect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0);

ERROR: line over 90 characters
#50: FILE: hw/arm/virt.c:644:
+        qdev_disconnect_gpio_out_named(gicdev, SYSBUS_DEVICE_GPIO_IRQ, cpu + 4 * max_cpus);

WARNING: line over 80 characters
#54: FILE: hw/arm/virt.c:648:
+     * RFC: Question: This currently does not takes care of intimating the devices

WARNING: line over 80 characters
#55: FILE: hw/arm/virt.c:649:
+     * which might be sitting on system bus. Do we need a sysbus_disconnect_irq()

WARNING: Block comments use a leading /* on a separate line
#81: FILE: hw/arm/virt.c:675:
+    /* Mapping from the output timer irq lines from the CPU to the

total: 1 errors, 5 warnings, 190 lines checked

Patch 17/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

18/22 Checking commit 659f80fdff39 (arm/cpuhp: Changes to update GIC with vcpu hot-plug notification)
ERROR: "foo * bar" should be "foo *bar"
#99: FILE: hw/intc/arm_gicv3_common.c:329:
+static void arm_gicv3_cpu_update_notifier(Notifier * notifier, void * data)

total: 1 errors, 0 warnings, 147 lines checked

Patch 18/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

19/22 Checking commit 185a83e914a7 (arm/cpuhp: Changes required to (re)init the vcpu register info)
WARNING: Block comments use a leading /* on a separate line
#65: FILE: hw/intc/arm_gicv3_cpuif.c:2613:
+    /* Note that we can't just use the GICv3CPUState as an opaque pointer

WARNING: Block comments use a leading /* on a separate line
#83: FILE: hw/intc/arm_gicv3_cpuif.c:2631:
+        /* Check against architectural constraints: getting these

WARNING: Block comments use a leading /* on a separate line
#94: FILE: hw/intc/arm_gicv3_cpuif.c:2642:
+            /* Note that the AArch64 LRs are 64-bit; the AArch32 LRs

total: 0 errors, 3 warnings, 211 lines checked

Patch 19/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
20/22 Checking commit d5feac63a862 (arm/cpuhp: Update the guest(via GED) about cpu hot-(un)plug events)
ERROR: braces {} are necessary for all arms of this statement
#44: FILE: hw/arm/virt.c:2469:
+        if (local_err)
[...]

ERROR: braces {} are necessary for all arms of this statement
#73: FILE: hw/arm/virt.c:2509:
+    if (local_err)
[...]

ERROR: braces {} are necessary for all arms of this statement
#100: FILE: hw/arm/virt.c:2537:
+    if (local_err)
[...]

total: 3 errors, 0 warnings, 84 lines checked

Patch 20/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

21/22 Checking commit 988141e42492 (arm/cpuhp: Changes required for reset and to support next boot)
22/22 Checking commit dc56c5324bae (arm/cpuhp: Add support of *unrealize* ARMCPU during vcpu hot-unplug)
ERROR: space required before the open parenthesis '('
#50: FILE: exec.c:900:
+    if(cpu->cpu_ases_ref_count == 1) {

ERROR: suspect code indent for conditional statements (4, 11)
#217: FILE: target/arm/cpu.c:1805:
+    if (cpu->sau_sregion && arm_feature(env, ARM_FEATURE_M_SECURITY)) {
+           g_free(env->sau.rbar);

ERROR: code indent should never use tabs
#272: FILE: target/arm/cpu.c:2233:
+^I^I^I^I      &acc->parent_unrealize);$

ERROR: braces {} are necessary for all arms of this statement
#394: FILE: target/arm/kvm64.c:845:
+    if (qemu_present_cpu(cs))
[...]

total: 4 errors, 0 warnings, 316 lines checked

Patch 22/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200613213629.21984-1-salil.mehta@huawei.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Marc Zyngier June 14, 2020, 11:54 a.m. UTC | #3

Hi Salil,

On 2020-06-13 22:36, Salil Mehta wrote:
> This patch-set introduces the virtual cpu hotplug support for ARMv8
> architecture in QEMU. Idea is to be able to hotplug and hot-unplug the 
> vcpus
> while guest VM is running and no reboot is required. This does *not* 
> makes any
> assumption of the physical cpu hotplug availability within the host 
> system but
> rather tries to solve the problem at virtualizer/QEMU layer and by 
> introducing
> cpu hotplug hooks and event handling within the guest kernel. No 
> changes are
> required within the host kernel/KVM.
> 
> Motivation:
> This allows scaling the guest VM compute capacity on-demand which would 
> be
> useful for the following example scenarios,
> 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the 
> orchestration
>    framework which could adjust resource requests (CPU and Mem 
> requests) for
>    the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure provider could 
> allocate and
>    restrict the total number of compute resources available to the 
> guest VM
>    according to the SLA(Service Level Agreement). VM owner could 
> request for
>    more compute to be hot-plugged for some cost.
> 
> Terminology:
> 
> (*) Present cpus: Total cpus with which guest has/will boot and are 
> available
>                   to guest for use and can be onlined. Qemu 
> parameter(-smp)
> (*) Disabled cpus: Possible cpus which will not be available for the 
> guest to
>                    use. These can be hotplugged and made present. These 
> can be
> 		   thought of as un-plugged vcpus. These will be included as
> 		   part of sizing.
> (*) Posssible cpus: Total vcpus which could ever exist in VM. This 
> includes
>                     booted cpus plus any cpus which could be later 
> plugged.
> 		    - Qemu parameter(-maxcpus)
> 	            - Possible vcpus = Present vcpus (+) Disabled vcpus
> 
> 
> Limitations of ARMv8 Architecture:
> 
> A. Physical Limitation to CPU Hotplug:
> 1. ARMv8 architecture does not support the concept of the physical cpu 
> hotplug.
>    The closest thing which is recomended to achieve the cpu hotplug on 
> ARM is
>    to bring down power state of the cpu using PSCI.

It isn't so much that the ARMv8 architecture doesn't support CPU 
hotplug. It is that CPU hotplug is largely out of the scope of the ARMv8 
architecture, which is a CPU architecture and not a system architecture. 
Yes, the lack of a comprehensive system architecture is *very* annoying, 
but let's put the blame where it belongs... ;-)

> 2. Other ARM components like GIC etc. have not been designed to realize
>    physical cpu hotplug capability as of now.
> 
> B. Limitations of GIC to Support Virtual CPU Hotplug:
> 1. GIC requires various resources(related to GICR/redistributor, 
> GICC/cpu
>    interface etc) like memory regions to be fixed at the VM init time 
> and these
>    could not be changed later on after VM has inited.
> 2. Associations between GICC(GIC cpu interface) and vcpu get fixed at 
> the VM
>    init time and GIC does not allows to change this association once 
> GIC has
>    initialized.

There isn't an association, really. the GIC CPU interface is part of the 
CPU itself, and not an external entity. KVM doesn't split the two 
either. It is the association between the CPU and its redistributor that 
is being done. There is no architectural way to set this up this, so KVM 
just statically configures these based on the number of vcpus and the 
number/size of redistributor ranges.

> 
> C. Known Limitation of the KVM:
> 1. As of now KVM allows to create VCPUs but does not allows to delete 
> the
>    already created vcpus. QEMU already provides an interface to manage 
> created
>    vcpus at KVM level and then to re-use them.
> 2. Inconsistency in interpretation of the MPIDR generated by KVM for 
> vcpus
>    vis-a-vis SMT/threads. This does not looks to be compliant to the 
> MPIDR
>    format(SMT is present) as mentioned in the ARMv8 spec. (Please 
> correct my
>    understanding if I am wrong here?)

I'm unsure of which part of the architecture KVM doesn't follow when 
assigning a MPIDR to a vcpu. By having MPIDR_EL1.MT to 0, KVM tells the 
guest OS that vcpus having the same Aff3-1 fields are mostly 
independent.

This is because:
- we make no placement guarantee whatsoever (this is userspace's 
business)
- ARMv8.2 CPUs designed by ARM (as opposed to architecture licensees) 
always set this bit to 1 as they can hypothetically be coupled to SMT 
CPUs in a big-little system. This makes MT totally meaningless.
- the one SMT implementation available in the wild (ThunderX-2) is 
broken enough that you really should consider turning SMT off (see 
CAVIUM_TX2_ERRATUM_219)
- KVM actively prevents the sharing of resources such as TLBs across 
vcpus

Given the above, I fail to see the point in setting the SMT bit to 
anything but 0.

> 
> 
> Workaround to the problems mentioned in Section B & C1:
> 1. We pre-size the GIC with possible vcpus at VM init time
> 2. Pre-create all possible vcpus at KVM and associate them with GICC

Again, I'm not sure what you mean here. The CPU interface is by 
construction associated with the CPU. Do you mean the redistributor?

> 3. Park the unplugged vcpus (similar to x86)
> 
> 
> (*) For all of above please refer to Marc's suggestion here[1]
> 
> 
> Overview of the Approach:
> At the time of machvirt_init() we pre-create all of the possible ARMCPU
> objects along with the corresponding KVM vcpus at the host. Disabled 
> KVM vcpu
> (which are *not* "present" vcpus but are part of "possible" vcpu list) 
> are
> parked at per VM list "kvm_parked_vcpus" after their initialization.
> 
> We create the ARMCPU objects(but these are not *realized* in QOM sense) 
> even
> for the disabled vcpus to facilitate the GIC initialization (pre-sized 
> with
> possible vcpus). After Initialization of the machine is complete we 
> release
> the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
> re-created at the time when vcpu is hot plugged. This new object is 
> then
> re-attached with the earlier parked KVM vcpu which also gets unparked. 
> The
> ARMCPU object gets now "realized" in QEMU, which means creation of the
> corresponding threads, pre_plug/plug phases, and event notification to 
> the
> guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
> "unrealization" of the vcpus and will lead to similar ACPI GED events 
> to the
> guest for unplug and cleanup and eventually ARMCPU object shall be 
> released and
> KVM vcpus shall be parked again.
> 
> During machine init, ACPI MADT Table is sized with *possible* vcpus 
> GICC
> entries. The unplugged/disabled vcpus are presented as MADT GICC 
> DISABLED
> entries to the guest. This means the guest will have its resources 
> pre-sized
> with possible vcpus(=present+disabled)
> 
> Other approaches to deal with ARMCPU object release(after machine 
> init):
> 1. The ARMCPU objects for the disabled vcpus are released in context to 
> the
>    virt_machine_done() notifier(approach adopted in this patch-set).
> 2. Defer the release of current ARMCPU object till the new vcpu object 
> is
>    hot plugged.
> 3. Never release and keep on reusing them and release once at VM exit. 
> This
>    solves many problems with above 2 approaches but requires change in 
> the way
>    qdev_device_add() fetches/creates the ARMCPU object for the new 
> vcpus being
>    hotplugged. For the arm cpu hotplug case we need to figure out way 
> how to
>    get access to old object and use it to "re-realize" instead of the 
> new
>    ARMCPU object.
> 
> Concerns/Questions:
> 1. In ARM arch a cpu is uniquely represented in hierarchy using various
>    affinity levels which could represent thread, core, cluster, 
> package. This
>    is generally represented by a value in MPIDR register as per the 
> format
>    mentioned in specification. Now, the way MPIDR value is derived for 
> vcpus is
>    done using vcpu-index. The concept of thread is not quite as same 
> and rather
>    gets lost in the derivation of MPIDR for vcpus.

I think we disagree on what a thread is when MPIDR_EL1.MT==0.

> 2. The topology info used to specify the vcpu while hot-plugging might 
> not
>    match with the MPIDR value given back by the KVM for the vcpu at the 
> time of
>    init. Concept of SMT bit in MPIDR gets lost as per the derivation 
> being done
>    in the KVM. Hence, concept of thread-id, core-id, socket-id if used 
> as a
>    topology info to derive MPIDR value as per ARM specification will 
> not match
>    with MPIDR actually assigned by the KVM?

The architecture doesn't map the affinity levels on any of these 
concepts. They merely imply increased independence as you look at higher 
affinity levels. I don't see any value in exposing any meaning to them 
unless you start moving some notion of placement into the kernel, or 
allow userspace to setup MPIDR_EL1 by itself.

>    Perhaps need to carry forward work of Andrew? please check here[2]
> 3. Further if this info is supplied to the guest using PPTT(once 
> introduced in
>    QEMU) or even derived using MPIDR shall be inconsistent with the 
> host vcpu.
> 4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
>    *pending* state for the cpus which have been hot-unplugged? IMHO it 
> looks
>    okay but will need Marc's confirmation on this.

I don't see why interrupts wouldn't be pending if the storage resource 
still exists. Given that distributor and redistributors are persistent, 
it seems logical that pending bits stick around.

Slightly more selfish question: is there any kernel change associated 
with this series? I'd be perfectly happy to hear a "no!"... ;-)

Thanks,

         M.

Salil Mehta June 15, 2020, 10:19 a.m. UTC | #4

Hi Marc,
Thanks for the review.

> From: Marc Zyngier [mailto:maz@kernel.org]
> Sent: Sunday, June 14, 2020 12:55 PM
> 
> Hi Salil,
> 
> On 2020-06-13 22:36, Salil Mehta wrote:
> > This patch-set introduces the virtual cpu hotplug support for ARMv8
> > architecture in QEMU. Idea is to be able to hotplug and hot-unplug the
> > vcpus
> > while guest VM is running and no reboot is required. This does *not*
> > makes any
> > assumption of the physical cpu hotplug availability within the host
> > system but
> > rather tries to solve the problem at virtualizer/QEMU layer and by
> > introducing
> > cpu hotplug hooks and event handling within the guest kernel. No
> > changes are
> > required within the host kernel/KVM.
> >
> > Motivation:
> > This allows scaling the guest VM compute capacity on-demand which would
> > be
> > useful for the following example scenarios,
> > 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the
> > orchestration
> >    framework which could adjust resource requests (CPU and Mem
> > requests) for
> >    the containers in a pod, based on usage.
> > 2. Pay-as-you-grow Business Model: Infrastructure provider could
> > allocate and
> >    restrict the total number of compute resources available to the
> > guest VM
> >    according to the SLA(Service Level Agreement). VM owner could
> > request for
> >    more compute to be hot-plugged for some cost.
> >
> > Terminology:
> >
> > (*) Present cpus: Total cpus with which guest has/will boot and are
> > available
> >                   to guest for use and can be onlined. Qemu
> > parameter(-smp)
> > (*) Disabled cpus: Possible cpus which will not be available for the
> > guest to
> >                    use. These can be hotplugged and made present. These
> > can be
> > 		   thought of as un-plugged vcpus. These will be included as
> > 		   part of sizing.
> > (*) Posssible cpus: Total vcpus which could ever exist in VM. This
> > includes
> >                     booted cpus plus any cpus which could be later
> > plugged.
> > 		    - Qemu parameter(-maxcpus)
> > 	            - Possible vcpus = Present vcpus (+) Disabled vcpus
> >
> >
> > Limitations of ARMv8 Architecture:
> >
> > A. Physical Limitation to CPU Hotplug:
> > 1. ARMv8 architecture does not support the concept of the physical cpu
> > hotplug.
> >    The closest thing which is recomended to achieve the cpu hotplug on
> > ARM is
> >    to bring down power state of the cpu using PSCI.
> 
> It isn't so much that the ARMv8 architecture doesn't support CPU
> hotplug. It is that CPU hotplug is largely out of the scope of the ARMv8
> architecture, which is a CPU architecture and not a system architecture.
> Yes, the lack of a comprehensive system architecture is *very* annoying,
> but let's put the blame where it belongs... ;-)


Sure.

 
> > 2. Other ARM components like GIC etc. have not been designed to realize
> >    physical cpu hotplug capability as of now.
> >
> > B. Limitations of GIC to Support Virtual CPU Hotplug:
> > 1. GIC requires various resources(related to GICR/redistributor,
> > GICC/cpu
> >    interface etc) like memory regions to be fixed at the VM init time
> > and these
> >    could not be changed later on after VM has inited.
> > 2. Associations between GICC(GIC cpu interface) and vcpu get fixed at
> > the VM
> >    init time and GIC does not allows to change this association once
> > GIC has
> >    initialized.
> 
> There isn't an association, really. the GIC CPU interface is part of the
> CPU itself, and not an external entity. KVM doesn't split the two
> either. It is the association between the CPU and its redistributor that
> is being done. There is no architectural way to set this up this, so KVM
> just statically configures these based on the number of vcpus and the
> number/size of redistributor ranges.


I stand corrected. Sorry for the horrible mix up and I realized that some
how I copied the same terminology at other 2 other places as well, maybe
under sleep duress :(. I will correct it in later versions.

To be frank, I actually meant association of "mp-affinity" and the
"proc number" as given by GICR_TYPER register for the vgic. I guess reading
this register using kvm_gicr_acces/KVM_DEV_ARM_VGIC_GRP_REDIST_REGS from
QEMU lands up in vgic vgic_mmio_read_v3r_typer() which forms the reg value
using "vcpu-id" and "mpidr"(fetched from MPIDR_EL1).

Also, value of the MPIDR for vcpu is set during KVM_ARM_VCPU_INIT IOCTL
from QEMU after the creation of the vcpus(using KVM_CREATE_VCPU). I guess
this is done during reset of all of the system regs  SYS_MPIDR_EL1 value
is also reset to default within function reset_mpidr() derived using below
logic:

                   +----+----+----+----+----+----+----+----+
          MPIDR   |||  Res   |   Aff2  |   Aff1   |  Aff0    |
                   +----+----+----+----+----+----+----+----+
                                \          \          \   |     |
                                 \   8bit   \   8bit  \  |4bit|
                                  \<------->\<------->\|<-->|
                                   \          \          \|     |
                   +----+----+----+----+----+----+----+----+
         VCPU-ID  |  Byte4  |  Byte2   |  Byte1   |  Byte0  |
                   +----+----+----+----+----+----+----+----+

Perhaps once the VM is inited the value present in the MPIDR_EL1 for vcpu cannot
be changed and hence the vgic GICR_TYPER will have fixed association between
"vcpu-id" and "mpidr" as well.




> > C. Known Limitation of the KVM:
> > 1. As of now KVM allows to create VCPUs but does not allows to delete
> > the
> >    already created vcpus. QEMU already provides an interface to manage
> > created
> >    vcpus at KVM level and then to re-use them.
> > 2. Inconsistency in interpretation of the MPIDR generated by KVM for
> > vcpus
> >    vis-a-vis SMT/threads. This does not looks to be compliant to the
> > MPIDR
> >    format(SMT is present) as mentioned in the ARMv8 spec. (Please
> > correct my
> >    understanding if I am wrong here?)
> 
> I'm unsure of which part of the architecture KVM doesn't follow when
> assigning a MPIDR to a vcpu. By having MPIDR_EL1.MT to 0, KVM tells the
> guest OS that vcpus having the same Aff3-1 fields are mostly
> independent.
> 
> This is because:
> - we make no placement guarantee whatsoever (this is userspace's
> business)
> - ARMv8.2 CPUs designed by ARM (as opposed to architecture licensees)
> always set this bit to 1 as they can hypothetically be coupled to SMT
> CPUs in a big-little system. This makes MT totally meaningless.
> - the one SMT implementation available in the wild (ThunderX-2) is
> broken enough that you really should consider turning SMT off (see
> CAVIUM_TX2_ERRATUM_219)
> - KVM actively prevents the sharing of resources such as TLBs across
> vcpus
> 
> Given the above, I fail to see the point in setting the SMT bit to
> anything but 0.


ok. effectively this also means that even in the upcoming SoCs, which
might have SMT capability, the value of MT bit should continue to be 0
at the KVM level for the MPIDR of the vcpus?


> > Workaround to the problems mentioned in Section B & C1:
> > 1. We pre-size the GIC with possible vcpus at VM init time
> > 2. Pre-create all possible vcpus at KVM and associate them with GICC
> 
> Again, I'm not sure what you mean here. The CPU interface is by
> construction associated with the CPU. Do you mean the redistributor?


Sorry for this mix-up. Yes, I meant the association between the mp-affinity
and the proc number as given by GICR_TYPER for vgic as explained earlier.


> 
> > 3. Park the unplugged vcpus (similar to x86)
> >
> >
> > (*) For all of above please refer to Marc's suggestion here[1]
> >
> >
> > Overview of the Approach:
> > At the time of machvirt_init() we pre-create all of the possible ARMCPU
> > objects along with the corresponding KVM vcpus at the host. Disabled
> > KVM vcpu
> > (which are *not* "present" vcpus but are part of "possible" vcpu list)
> > are
> > parked at per VM list "kvm_parked_vcpus" after their initialization.
> >
> > We create the ARMCPU objects(but these are not *realized* in QOM sense)
> > even
> > for the disabled vcpus to facilitate the GIC initialization (pre-sized
> > with
> > possible vcpus). After Initialization of the machine is complete we
> > release
> > the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
> > re-created at the time when vcpu is hot plugged. This new object is
> > then
> > re-attached with the earlier parked KVM vcpu which also gets unparked.
> > The
> > ARMCPU object gets now "realized" in QEMU, which means creation of the
> > corresponding threads, pre_plug/plug phases, and event notification to
> > the
> > guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
> > "unrealization" of the vcpus and will lead to similar ACPI GED events
> > to the
> > guest for unplug and cleanup and eventually ARMCPU object shall be
> > released and
> > KVM vcpus shall be parked again.
> >
> > During machine init, ACPI MADT Table is sized with *possible* vcpus
> > GICC
> > entries. The unplugged/disabled vcpus are presented as MADT GICC
> > DISABLED
> > entries to the guest. This means the guest will have its resources
> > pre-sized
> > with possible vcpus(=present+disabled)
> >
> > Other approaches to deal with ARMCPU object release(after machine
> > init):
> > 1. The ARMCPU objects for the disabled vcpus are released in context to
> > the
> >    virt_machine_done() notifier(approach adopted in this patch-set).
> > 2. Defer the release of current ARMCPU object till the new vcpu object
> > is
> >    hot plugged.
> > 3. Never release and keep on reusing them and release once at VM exit.
> > This
> >    solves many problems with above 2 approaches but requires change in
> > the way
> >    qdev_device_add() fetches/creates the ARMCPU object for the new
> > vcpus being
> >    hotplugged. For the arm cpu hotplug case we need to figure out way
> > how to
> >    get access to old object and use it to "re-realize" instead of the
> > new
> >    ARMCPU object.
> >
> > Concerns/Questions:
> > 1. In ARM arch a cpu is uniquely represented in hierarchy using various
> >    affinity levels which could represent thread, core, cluster,
> > package. This
> >    is generally represented by a value in MPIDR register as per the
> > format
> >    mentioned in specification. Now, the way MPIDR value is derived for
> > vcpus is
> >    done using vcpu-index. The concept of thread is not quite as same
> > and rather
> >    gets lost in the derivation of MPIDR for vcpus.
> 
> I think we disagree on what a thread is when MPIDR_EL1.MT==0.
> 
> > 2. The topology info used to specify the vcpu while hot-plugging might
> > not
> >    match with the MPIDR value given back by the KVM for the vcpu at the
> > time of
> >    init. Concept of SMT bit in MPIDR gets lost as per the derivation
> > being done
> >    in the KVM. Hence, concept of thread-id, core-id, socket-id if used
> > as a
> >    topology info to derive MPIDR value as per ARM specification will
> > not match
> >    with MPIDR actually assigned by the KVM?
> 
> The architecture doesn't map the affinity levels on any of these
> concepts. They merely imply increased independence as you look at higher
> affinity levels. I don't see any value in exposing any meaning to them
> unless you start moving some notion of placement into the kernel, or
> allow userspace to setup MPIDR_EL1 by itself.


MPIDR_EL1 of the vcpu is effectively being derived by KVM using the
'vcpu-id' provided by the QEMU during vcpu creation/init time. so isn't it
kind of being indirectly configured by QEMU even now?

Also. I was wondering if it was safe to present hardware threads as cores
in the virtual topology being set at the QEMU level?


> >    Perhaps need to carry forward work of Andrew? please check here[2]
> > 3. Further if this info is supplied to the guest using PPTT(once
> > introduced in
> >    QEMU) or even derived using MPIDR shall be inconsistent with the
> > host vcpu.
> > 4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
> >    *pending* state for the cpus which have been hot-unplugged? IMHO it
> > looks
> >    okay but will need Marc's confirmation on this.
> 
> I don't see why interrupts wouldn't be pending if the storage resource
> still exists. Given that distributor and redistributors are persistent,
> it seems logical that pending bits stick around.

wouldn't they be migrated to some other vcpu after current vcpu to which
they are associated gets offlined as part of the cpu hot-unplug action?

Like in case of the NIC, it gets notified about this and the affinity is
changed so the interrupts should not remain pending. Is it something
different about storage? 


> 
> Slightly more selfish question: is there any kernel change associated
> with this series? I'd be perfectly happy to hear a "no!"... ;-)


Requires minimalistic change which amounts to:
1. Allocation of the logical cpu-id and associating with acpi-id/hw-id
   coming from qemu/ACPI Tables.
2. During init time parsing the MADT Table entries to get the exact count
   of the possible and present cpus. This as you know shall be used to
   size up the guest data structures. Most of this infrastructure already
   exist in smp.c but still requires some adjustment of the code.

I will be sharing these changes this week.


Many thanks

> 
> Thanks,
> 
>          M.
> --
> Jazz is not dead. It just smells funny...

Andrew Jones June 23, 2020, 9:12 a.m. UTC | #5

On Sat, Jun 13, 2020 at 10:36:07PM +0100, Salil Mehta wrote:
> This patch-set introduces the virtual cpu hotplug support for ARMv8
> architecture in QEMU. Idea is to be able to hotplug and hot-unplug the vcpus
> while guest VM is running and no reboot is required. This does *not* makes any
> assumption of the physical cpu hotplug availability within the host system but
> rather tries to solve the problem at virtualizer/QEMU layer and by introducing
> cpu hotplug hooks and event handling within the guest kernel. No changes are
> required within the host kernel/KVM.
> 
> Motivation:
> This allows scaling the guest VM compute capacity on-demand which would be
> useful for the following example scenarios,
> 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
>    framework which could adjust resource requests (CPU and Mem requests) for
>    the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
>    restrict the total number of compute resources available to the guest VM
>    according to the SLA(Service Level Agreement). VM owner could request for
>    more compute to be hot-plugged for some cost.
> 
> Terminology:
> 
> (*) Present cpus: Total cpus with which guest has/will boot and are available
>                   to guest for use and can be onlined. Qemu parameter(-smp)
> (*) Disabled cpus: Possible cpus which will not be available for the guest to
>                    use. These can be hotplugged and made present. These can be
> 		   thought of as un-plugged vcpus. These will be included as
> 		   part of sizing.
> (*) Posssible cpus: Total vcpus which could ever exist in VM. This includes
>                     booted cpus plus any cpus which could be later plugged.
> 		    - Qemu parameter(-maxcpus)
> 	            - Possible vcpus = Present vcpus (+) Disabled vcpus
> 
> 
> Limitations of ARMv8 Architecture:
> 
> A. Physical Limitation to CPU Hotplug:
> 1. ARMv8 architecture does not support the concept of the physical cpu hotplug.
>    The closest thing which is recomended to achieve the cpu hotplug on ARM is
>    to bring down power state of the cpu using PSCI.
> 2. Other ARM components like GIC etc. have not been designed to realize
>    physical cpu hotplug capability as of now. 
> 
> B. Limitations of GIC to Support Virtual CPU Hotplug:
> 1. GIC requires various resources(related to GICR/redistributor, GICC/cpu
>    interface etc) like memory regions to be fixed at the VM init time and these
>    could not be changed later on after VM has inited.
> 2. Associations between GICC(GIC cpu interface) and vcpu get fixed at the VM
>    init time and GIC does not allows to change this association once GIC has
>    initialized.
> 
> C. Known Limitation of the KVM:
> 1. As of now KVM allows to create VCPUs but does not allows to delete the
>    already created vcpus. QEMU already provides an interface to manage created
>    vcpus at KVM level and then to re-use them.
> 2. Inconsistency in interpretation of the MPIDR generated by KVM for vcpus
>    vis-a-vis SMT/threads. This does not looks to be compliant to the MPIDR
>    format(SMT is present) as mentioned in the ARMv8 spec. (Please correct my
>    understanding if I am wrong here?)
>    
> 
> Workaround to the problems mentioned in Section B & C1:
> 1. We pre-size the GIC with possible vcpus at VM init time
> 2. Pre-create all possible vcpus at KVM and associate them with GICC 
> 3. Park the unplugged vcpus (similar to x86)
> 
> 
> (*) For all of above please refer to Marc's suggestion here[1]
> 
> 
> Overview of the Approach:
> At the time of machvirt_init() we pre-create all of the possible ARMCPU
> objects along with the corresponding KVM vcpus at the host. Disabled KVM vcpu
> (which are *not* "present" vcpus but are part of "possible" vcpu list) are
> parked at per VM list "kvm_parked_vcpus" after their initialization.
> 
> We create the ARMCPU objects(but these are not *realized* in QOM sense) even
> for the disabled vcpus to facilitate the GIC initialization (pre-sized with
> possible vcpus). After Initialization of the machine is complete we release
> the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
> re-created at the time when vcpu is hot plugged. This new object is then
> re-attached with the earlier parked KVM vcpu which also gets unparked. The
> ARMCPU object gets now "realized" in QEMU, which means creation of the
> corresponding threads, pre_plug/plug phases, and event notification to the
> guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
> "unrealization" of the vcpus and will lead to similar ACPI GED events to the
> guest for unplug and cleanup and eventually ARMCPU object shall be released and
> KVM vcpus shall be parked again.
> 
> During machine init, ACPI MADT Table is sized with *possible* vcpus GICC
> entries. The unplugged/disabled vcpus are presented as MADT GICC DISABLED
> entries to the guest. This means the guest will have its resources pre-sized
> with possible vcpus(=present+disabled)
> 
> Other approaches to deal with ARMCPU object release(after machine init):
> 1. The ARMCPU objects for the disabled vcpus are released in context to the
>    virt_machine_done() notifier(approach adopted in this patch-set). 
> 2. Defer the release of current ARMCPU object till the new vcpu object is
>    hot plugged.
> 3. Never release and keep on reusing them and release once at VM exit. This
>    solves many problems with above 2 approaches but requires change in the way
>    qdev_device_add() fetches/creates the ARMCPU object for the new vcpus being
>    hotplugged. For the arm cpu hotplug case we need to figure out way how to
>    get access to old object and use it to "re-realize" instead of the new
>    ARMCPU object.
> 
> Concerns/Questions:
> 1. In ARM arch a cpu is uniquely represented in hierarchy using various
>    affinity levels which could represent thread, core, cluster, package. This
>    is generally represented by a value in MPIDR register as per the format
>    mentioned in specification. Now, the way MPIDR value is derived for vcpus is
>    done using vcpu-index. The concept of thread is not quite as same and rather
>    gets lost in the derivation of MPIDR for vcpus.
> 2. The topology info used to specify the vcpu while hot-plugging might not
>    match with the MPIDR value given back by the KVM for the vcpu at the time of
>    init. Concept of SMT bit in MPIDR gets lost as per the derivation being done
>    in the KVM. Hence, concept of thread-id, core-id, socket-id if used as a
>    topology info to derive MPIDR value as per ARM specification will not match
>    with MPIDR actually assigned by the KVM? 
>    Perhaps need to carry forward work of Andrew? please check here[2]
> 3. Further if this info is supplied to the guest using PPTT(once introduced in
>    QEMU) or even derived using MPIDR shall be inconsistent with the host vcpu. 
> 4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
>    *pending* state for the cpus which have been hot-unplugged? IMHO it looks
>    okay but will need Marc's confirmation on this. 
> 5. If the ARMCPU object is released after the machine init, UEFI could call
>    back virt_update_table() to re-build the ACPI tables which might need an
>    ARMCPU object. Please check the discussion here[5]
> 
> 
> Commands Used:
> 
> A. Qemu launch commands to init the machine
> 
> $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
> -cpu host -smp cpus=4,maxcpus=6 \
> -m 300M \
> -kernel Image \
> -initrd rootfs.cpio.gz \
> -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
> -nographic \
> -bios  QEMU_EFI.fd \
> 
> B. Hot-(un)plug related commands
> 
> # Hotplug a host vcpu(accel=kvm)
> $ device_add host-arm-cpu,id=core4,core-id=4
> 
> # Hotplug a vcpu(accel=tcg)
> $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
> 
> # Delete the vcpu
> $ device_del core4
> 
> NOTE: I have not tested the current solution with '-device' interface. The use
>       is suggested by Igor here[6]. I will test this in coming times but looks
>       it should work with existing changes. 
> 
> 
> Sample output on guest after boot:
> 
> $ cat /sys/devices/system/cpu/possible
> 0-5
> $ cat /sys/devices/system/cpu/present
> 0-3
> $ cat /sys/devices/system/cpu/online
> 0-1
> $ cat /sys/devices/system/cpu/offline
> 2-5
> 
> 
> Sample output on guest after hotplug of vcpu=4:
> 
> $ cat /sys/devices/system/cpu/possible
> 0-5
> $ cat /sys/devices/system/cpu/present
> 0-4
> $ cat /sys/devices/system/cpu/online
> 0-1,4
> $ cat /sys/devices/system/cpu/offline
> 2-3,5
> 
> Note: vcpu=4 was explicitly 'onlined' after hot-plug
> $ echo 1 > /sys/devices/system/cpu/cpu4/online
> 
> 
> Repository:
>  (*) QEMU changes for vcpu hotplug could be cloned from below site,
>      https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
> 
>  (*) Guest Kernel changes required to co-work with the QEMU shall be posted soon
>      and repo made available at above site. 
> 
> 
> THINGS TO DO:
>  (*) Migration support 
>  (*) TCG/Emulation support is not proper right now. Works to a certain extent
>      but is not complete. especially the unrealize part in which there is a
>      overflow of tcg contexts. The last is due to the fact tcg maintains a 
>      count on number of context(per thread instance) so as we hotplug the vcpus
>      this counter keeps on incrementing. But during hot-unplug the counter is
>      not decremented.
>  (*) Support of hotplug with NUMA is not proper
>  (*) CPU Topology right now is not specified using thread/core/socket but 
>      rather flatly indexed using core-id. This needs consideration[2].
>  (*) Do we need PPTT Support for to specify right topology info to guest about
>      hot-plugged or unplugged vcpus?
>  (*) Test cases
>  (*) Docs need to be updated.
> 
>

Hi Salil,

I realize this is just a preliminary posting and the approach hasn't been
finalized, but maybe in a future posting we can put a lot of this
information into a doc patch. I think we'll need good documentation for
this feature to ensure we get it right and keep in maintained correctly.

Thanks,
drew

Salil Mehta June 23, 2020, 9:56 a.m. UTC | #6

> From: Andrew Jones [mailto:drjones@redhat.com]
> Sent: Tuesday, June 23, 2020 10:12 AM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> sudeep.holla@arm.com; gshan@redhat.com; mst@redhat.com; jiakernel2@gmail.com;
> maz@kernel.org; zhukeqian <zhukeqian1@huawei.com>; david@redhat.com;
> richard.henderson@linaro.org; Linuxarm <linuxarm@huawei.com>;
> eric.auger@redhat.com; james.morse@arm.com; catalin.marinas@arm.com;
> imammedo@redhat.com; pbonzini@redhat.com; mehta.salil.lnk@gmail.com;
> maran.wilson@oracle.com; will@kernel.org; wangxiongfeng (C)
> <wangxiongfeng2@huawei.com>
> Subject: Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch
> 
> On Sat, Jun 13, 2020 at 10:36:07PM +0100, Salil Mehta wrote:
> > This patch-set introduces the virtual cpu hotplug support for ARMv8
> > architecture in QEMU. Idea is to be able to hotplug and hot-unplug the vcpus
> > while guest VM is running and no reboot is required. This does *not* makes
> any
> > assumption of the physical cpu hotplug availability within the host system
> but
> > rather tries to solve the problem at virtualizer/QEMU layer and by introducing
> > cpu hotplug hooks and event handling within the guest kernel. No changes are
> > required within the host kernel/KVM.
> >
> > Motivation:
> > This allows scaling the guest VM compute capacity on-demand which would be
> > useful for the following example scenarios,
> > 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
> >    framework which could adjust resource requests (CPU and Mem requests) for
> >    the containers in a pod, based on usage.
> > 2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
> >    restrict the total number of compute resources available to the guest VM
> >    according to the SLA(Service Level Agreement). VM owner could request for
> >    more compute to be hot-plugged for some cost.
> >
> > Terminology:
> >
> > (*) Present cpus: Total cpus with which guest has/will boot and are available
> >                   to guest for use and can be onlined. Qemu parameter(-smp)
> > (*) Disabled cpus: Possible cpus which will not be available for the guest
> to
> >                    use. These can be hotplugged and made present. These can be
> > 		   thought of as un-plugged vcpus. These will be included as
> > 		   part of sizing.
> > (*) Posssible cpus: Total vcpus which could ever exist in VM. This includes
> >                     booted cpus plus any cpus which could be later plugged.
> > 		    - Qemu parameter(-maxcpus)
> > 	            - Possible vcpus = Present vcpus (+) Disabled vcpus
> >
> >
> > Limitations of ARMv8 Architecture:
> >
> > A. Physical Limitation to CPU Hotplug:
> > 1. ARMv8 architecture does not support the concept of the physical cpu hotplug.
> >    The closest thing which is recomended to achieve the cpu hotplug on ARM
> is
> >    to bring down power state of the cpu using PSCI.
> > 2. Other ARM components like GIC etc. have not been designed to realize
> >    physical cpu hotplug capability as of now.
> >
> > B. Limitations of GIC to Support Virtual CPU Hotplug:
> > 1. GIC requires various resources(related to GICR/redistributor, GICC/cpu
> >    interface etc) like memory regions to be fixed at the VM init time and these
> >    could not be changed later on after VM has inited.
> > 2. Associations between GICC(GIC cpu interface) and vcpu get fixed at the VM
> >    init time and GIC does not allows to change this association once GIC has
> >    initialized.
> >
> > C. Known Limitation of the KVM:
> > 1. As of now KVM allows to create VCPUs but does not allows to delete the
> >    already created vcpus. QEMU already provides an interface to manage created
> >    vcpus at KVM level and then to re-use them.
> > 2. Inconsistency in interpretation of the MPIDR generated by KVM for vcpus
> >    vis-a-vis SMT/threads. This does not looks to be compliant to the MPIDR
> >    format(SMT is present) as mentioned in the ARMv8 spec. (Please correct my
> >    understanding if I am wrong here?)
> >
> >
> > Workaround to the problems mentioned in Section B & C1:
> > 1. We pre-size the GIC with possible vcpus at VM init time
> > 2. Pre-create all possible vcpus at KVM and associate them with GICC
> > 3. Park the unplugged vcpus (similar to x86)
> >
> >
> > (*) For all of above please refer to Marc's suggestion here[1]
> >
> >
> > Overview of the Approach:
> > At the time of machvirt_init() we pre-create all of the possible ARMCPU
> > objects along with the corresponding KVM vcpus at the host. Disabled KVM vcpu
> > (which are *not* "present" vcpus but are part of "possible" vcpu list) are
> > parked at per VM list "kvm_parked_vcpus" after their initialization.
> >
> > We create the ARMCPU objects(but these are not *realized* in QOM sense) even
> > for the disabled vcpus to facilitate the GIC initialization (pre-sized with
> > possible vcpus). After Initialization of the machine is complete we release
> > the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
> > re-created at the time when vcpu is hot plugged. This new object is then
> > re-attached with the earlier parked KVM vcpu which also gets unparked. The
> > ARMCPU object gets now "realized" in QEMU, which means creation of the
> > corresponding threads, pre_plug/plug phases, and event notification to the
> > guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
> > "unrealization" of the vcpus and will lead to similar ACPI GED events to the
> > guest for unplug and cleanup and eventually ARMCPU object shall be released
> and
> > KVM vcpus shall be parked again.
> >
> > During machine init, ACPI MADT Table is sized with *possible* vcpus GICC
> > entries. The unplugged/disabled vcpus are presented as MADT GICC DISABLED
> > entries to the guest. This means the guest will have its resources pre-sized
> > with possible vcpus(=present+disabled)
> >
> > Other approaches to deal with ARMCPU object release(after machine init):
> > 1. The ARMCPU objects for the disabled vcpus are released in context to the
> >    virt_machine_done() notifier(approach adopted in this patch-set).
> > 2. Defer the release of current ARMCPU object till the new vcpu object is
> >    hot plugged.
> > 3. Never release and keep on reusing them and release once at VM exit. This
> >    solves many problems with above 2 approaches but requires change in the
> way
> >    qdev_device_add() fetches/creates the ARMCPU object for the new vcpus being
> >    hotplugged. For the arm cpu hotplug case we need to figure out way how to
> >    get access to old object and use it to "re-realize" instead of the new
> >    ARMCPU object.
> >
> > Concerns/Questions:
> > 1. In ARM arch a cpu is uniquely represented in hierarchy using various
> >    affinity levels which could represent thread, core, cluster, package. This
> >    is generally represented by a value in MPIDR register as per the format
> >    mentioned in specification. Now, the way MPIDR value is derived for vcpus
> is
> >    done using vcpu-index. The concept of thread is not quite as same and rather
> >    gets lost in the derivation of MPIDR for vcpus.
> > 2. The topology info used to specify the vcpu while hot-plugging might not
> >    match with the MPIDR value given back by the KVM for the vcpu at the time
> of
> >    init. Concept of SMT bit in MPIDR gets lost as per the derivation being
> done
> >    in the KVM. Hence, concept of thread-id, core-id, socket-id if used as a
> >    topology info to derive MPIDR value as per ARM specification will not match
> >    with MPIDR actually assigned by the KVM?
> >    Perhaps need to carry forward work of Andrew? please check here[2]
> > 3. Further if this info is supplied to the guest using PPTT(once introduced
> in
> >    QEMU) or even derived using MPIDR shall be inconsistent with the host vcpu.
> > 4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
> >    *pending* state for the cpus which have been hot-unplugged? IMHO it looks
> >    okay but will need Marc's confirmation on this.
> > 5. If the ARMCPU object is released after the machine init, UEFI could call
> >    back virt_update_table() to re-build the ACPI tables which might need an
> >    ARMCPU object. Please check the discussion here[5]
> >
> >
> > Commands Used:
> >
> > A. Qemu launch commands to init the machine
> >
> > $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
> > -cpu host -smp cpus=4,maxcpus=6 \
> > -m 300M \
> > -kernel Image \
> > -initrd rootfs.cpio.gz \
> > -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
> > -nographic \
> > -bios  QEMU_EFI.fd \
> >
> > B. Hot-(un)plug related commands
> >
> > # Hotplug a host vcpu(accel=kvm)
> > $ device_add host-arm-cpu,id=core4,core-id=4
> >
> > # Hotplug a vcpu(accel=tcg)
> > $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
> >
> > # Delete the vcpu
> > $ device_del core4
> >
> > NOTE: I have not tested the current solution with '-device' interface. The
> use
> >       is suggested by Igor here[6]. I will test this in coming times but looks
> >       it should work with existing changes.
> >
> >
> > Sample output on guest after boot:
> >
> > $ cat /sys/devices/system/cpu/possible
> > 0-5
> > $ cat /sys/devices/system/cpu/present
> > 0-3
> > $ cat /sys/devices/system/cpu/online
> > 0-1
> > $ cat /sys/devices/system/cpu/offline
> > 2-5
> >
> >
> > Sample output on guest after hotplug of vcpu=4:
> >
> > $ cat /sys/devices/system/cpu/possible
> > 0-5
> > $ cat /sys/devices/system/cpu/present
> > 0-4
> > $ cat /sys/devices/system/cpu/online
> > 0-1,4
> > $ cat /sys/devices/system/cpu/offline
> > 2-3,5
> >
> > Note: vcpu=4 was explicitly 'onlined' after hot-plug
> > $ echo 1 > /sys/devices/system/cpu/cpu4/online
> >
> >
> > Repository:
> >  (*) QEMU changes for vcpu hotplug could be cloned from below site,
> >      https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
> >
> >  (*) Guest Kernel changes required to co-work with the QEMU shall be posted
> soon
> >      and repo made available at above site.
> >
> >
> > THINGS TO DO:
> >  (*) Migration support
> >  (*) TCG/Emulation support is not proper right now. Works to a certain extent
> >      but is not complete. especially the unrealize part in which there is a
> >      overflow of tcg contexts. The last is due to the fact tcg maintains a
> >      count on number of context(per thread instance) so as we hotplug the vcpus
> >      this counter keeps on incrementing. But during hot-unplug the counter
> is
> >      not decremented.
> >  (*) Support of hotplug with NUMA is not proper
> >  (*) CPU Topology right now is not specified using thread/core/socket but
> >      rather flatly indexed using core-id. This needs consideration[2].
> >  (*) Do we need PPTT Support for to specify right topology info to guest about
> >      hot-plugged or unplugged vcpus?
> >  (*) Test cases
> >  (*) Docs need to be updated.
> >
> >
> 
> Hi Salil,
> 
> I realize this is just a preliminary posting and the approach hasn't been
> finalized, but maybe in a future posting we can put a lot of this
> information into a doc patch. I think we'll need good documentation for
> this feature to ensure we get it right and keep in maintained correctly.


Sure, let us do it once we converge on the concept.

Thanks
Salil.

[RFC,00/22] Support of Virtual CPU Hotplug for ARMv8 Arch

Message

Comments