mbox series

[v5,00/10] KVM: arm64: Add support for hypercall services selection

Message ID 20220407011605.1966778-1-rananta@google.com (mailing list archive)
Headers show
Series KVM: arm64: Add support for hypercall services selection | expand

Message

Raghavendra Rao Ananta April 7, 2022, 1:15 a.m. UTC
Hello,

Continuing the discussion from [1], the series tries to add support
for the userspace to elect the hypercall services that it wishes
to expose to the guest, rather than the guest discovering them
unconditionally. The idea employed by the series was taken from
[1] as suggested by Marc Z.

In a broad sense, the concept is similar to the current implementation
of PSCI interface- create a 'firmware psuedo-register' to handle the
firmware revisions. The series extends this idea to all the other
hypercalls such as TRNG (True Random Number Generator), PV_TIME
(Paravirtualized Time), and PTP (Precision Time protocol).

For better categorization and future scaling, these firmware registers
are categorized based on the service call owners. Also, unlike the
existing firmware psuedo-registers, they hold the features supported
in the form of a bitmap.

During the VM initialization, the registers holds an upper-limit of
the features supported by each one of them. It's expected that the
userspace discover the features provided by each register via GET_ONE_REG,
and writeback the desired values using SET_ONE_REG. KVM allows this
modification only until the VM has started.

Some of the standard function-ids, such as ARM_SMCCC_VERSION_FUNC_ID,
need not be associated with a feature bit. For such ids, the series
introduced an allowed-list, hvc_func_default_allowed_list[], that holds
all such ids. As a result, the functions that are not elected by userspace,
or if they are not a part of this allowed-list, will be denied for when
the guests invoke them.

Older VMMs can simply ignore this interface and the hypercall services
will be exposed unconditionally to the guests, thus ensuring backward
compatibility.

The patches are based off of mainline kernel 5.18-rc1, with the selftest
patches from [2] applied.

Patch-1 factors out the non-PSCI related interface from psci.c to
hypercalls.c, as the series would extend the list in the upcoming
patches.

Patch-2 sets up the framework for the bitmap firmware psuedo-registers.
It includes read/write support for the registers, and a helper to check
if a particular hypercall service is supported for the guest.
It also adds the register KVM_REG_ARM_STD_HYP_BMAP to support ARM's
standard secure services.

Patch-3 introduces the firmware register, KVM_REG_ARM_STD_HYP_BMAP,
which holds the standard hypervisor services (such as PV_TIME).

Patch-4 introduces the firmware register, KVM_REG_ARM_VENDOR_HYP_BMAP,
which holds the vendor specific hypercall services.

Patch-5,6 Add the necessary documentation for the newly added firmware
registers.

Patch-7 imports the SMCCC definitions from linux/arm-smccc.h into tools/
for further use in selftests.

Patch-8 adds the selftest to test the guest (using 'hvc') and userspace
interfaces (SET/GET_ONE_REG).

Patch-9 adds these firmware registers into the get-reg-list selftest.

Patch-10 is unrelated to the series, but adds KVM_REG_ARM_FW_REG(3)
to base_regs[] of get-regs-list selftest for the sake of completion.

[1]: https://lore.kernel.org/kvmarm/874kbcpmlq.wl-maz@kernel.org/T/
[2]: https://lore.kernel.org/kvmarm/YUzgdbYk8BeCnHyW@google.com/

Regards,
Raghavendra

v4 -> v5:

Addressed comments by Oliver (thank you!):

- Rebased the series to accommodate ARM_SMCCC_ARCH_WORKAROUND_3
  and PSCI 1.1 changes, and capturing VM's first run.
- Removed the patches related to register scoping (v4 02/13 and
  03/13). I plan to re-introduce them in its own series.
- Dropped the patch that captures VM's first run.
- Moved the bitmap feature firmware registers to its own CORPOC
  space (0x0016).
- Move the KVM_REG_ARM_*_BIT_MAX definitions from uapi header
  to internal header (arm_hypercalls.h).
- Renamed the hypercall descriptor to 'struct kvm_smccc_features',
  and kvm_hvc_call_supported() to kvm_hvc_call_allowed().
- Introduced an allowed-list to hold the function-ids that aren't
  represented by feature-bits.
- Introduced kvm_psci_func_id_is_valid() to check if a given
  function-id is a valid PSCI id, which is used in
  kvm_hvc_call_allowed().
- Introduced KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT as bit-0 of
  KVM_REG_ARM_VENDOR_HYP_BMAP register and
  KVM_REG_ARM_VENDOR_HYP_BIT_PTP is moved to bit-1.
- Updated the arm-smccc.h import to include the definition of
  ARM_SMCCC_ARCH_WORKAROUND_3.
- Introduced the KVM_REG_ARM_FW_FEAT_BMAP COPROC definition to
  get-reg-list selftest.
- Created a new patch to include KVM_REG_ARM_FW_REG(3) in
  get-reg-list.


v3 -> v4

Addressed comments and took suggestions by Reiji, Oliver, Marc,
Sean and Jim:

- Renamed and moved the VM has run once check to arm64.
- Introduced the capability to dynamically modify the register
  encodings to include the scope information.
- Replaced mutex_lock with READ_ONCE and WRITE_ONCE when the
  bitmaps are accessed.
- The hypercalls selftest re-runs with KVM_CAP_ARM_REG_SCOPE
  enabled.

v2 -> v3

Addressed comments by Marc and Andrew:

- Dropped kvm_vcpu_has_run_once() implementation.
- Redifined kvm_vm_has_run_once() as kvm_vm_has_started() in the core
  KVM code that introduces a new field, 'vm_started', to track this.
- KVM_CAP_ARM_HVC_FW_REG_BMAP returns the number of psuedo-firmware
  bitmap registers upon a 'read'. Support for 'write' removed.
- Removed redundant spinlock, 'fw_reg_bmap_enabled' fields from the
  hypercall descriptor structure.
- A separate sub-struct to hold the bitmap info is removed. The bitmap
  info is directly stored in the hypercall descriptor structure
  (struct kvm_hvc_desc).

v1 -> v2

Addressed comments by Oliver (thanks!):

- Introduced kvm_vcpu_has_run_once() and kvm_vm_has_run_once() in the
  core kvm code, rather than relying on ARM specific
  vcpu->arch.has_run_once.
- Writing to KVM_REG_ARM_PSCI_VERSION is done in hypercalls.c itself,
  rather than separating out to psci.c.
- Introduced KVM_CAP_ARM_HVC_FW_REG_BMAP to enable the extension.
- Tracks the register accesses from VMM to decide whether to sanitize
  a register or not, as opposed to sanitizing upon the first 'write'
  in v1.
- kvm_hvc_call_supported() is implemented using a direct switch-case
  statement, instead of looping over all the registers to pick the
  register for the function-id.
- Replaced the register bit definitions with #defines, instead of enums.
- Removed the patch v1-06/08 that imports the firmware register
  definitions as it's not needed.
- Separated out the documentations in its own patch, and the renaming
  of hypercalls.rst to psci.rst into another patch.
- Add the new firmware registers to get-reg-list KVM selftest.

v1: https://lore.kernel.org/kvmarm/20211102002203.1046069-1-rananta@google.com/
v2: https://lore.kernel.org/kvmarm/20211113012234.1443009-1-rananta@google.com/
v3: https://lore.kernel.org/linux-arm-kernel/20220104194918.373612-1-rananta@google.com/
v4: https://lore.kernel.org/lkml/20220224172559.4170192-1-rananta@google.com/

Raghavendra Rao Ananta (10):
  KVM: arm64: Factor out firmware register handling from psci.c
  KVM: arm64: Setup a framework for hypercall bitmap firmware registers
  KVM: arm64: Add standard hypervisor firmware register
  KVM: arm64: Add vendor hypervisor firmware register
  Docs: KVM: Rename psci.rst to hypercalls.rst
  Docs: KVM: Add doc for the bitmap firmware registers
  tools: Import ARM SMCCC definitions
  selftests: KVM: aarch64: Introduce hypercall ABI test
  selftests: KVM: aarch64: Add the bitmap firmware registers to
    get-reg-list
  selftests: KVM: aarch64: Add KVM_REG_ARM_FW_REG(3) to get-reg-list

 Documentation/virt/kvm/api.rst                |  17 +
 Documentation/virt/kvm/arm/hypercalls.rst     | 136 +++++++
 Documentation/virt/kvm/arm/psci.rst           |  77 ----
 arch/arm64/include/asm/kvm_host.h             |  16 +
 arch/arm64/include/uapi/asm/kvm.h             |  16 +
 arch/arm64/kvm/arm.c                          |   1 +
 arch/arm64/kvm/guest.c                        |  10 +-
 arch/arm64/kvm/hypercalls.c                   | 321 +++++++++++++++-
 arch/arm64/kvm/psci.c                         | 183 ----------
 include/kvm/arm_hypercalls.h                  |  22 ++
 include/kvm/arm_psci.h                        |  17 +-
 tools/include/linux/arm-smccc.h               | 193 ++++++++++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/aarch64/get-reg-list.c      |   9 +
 .../selftests/kvm/aarch64/hypercalls.c        | 344 ++++++++++++++++++
 16 files changed, 1092 insertions(+), 272 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/hypercalls.rst
 delete mode 100644 Documentation/virt/kvm/arm/psci.rst
 create mode 100644 tools/include/linux/arm-smccc.h
 create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c

Comments

Gavin Shan April 15, 2022, 6:44 a.m. UTC | #1
Hi Raghavendra,

On 4/7/22 9:15 AM, Raghavendra Rao Ananta wrote:
> Continuing the discussion from [1], the series tries to add support
> for the userspace to elect the hypercall services that it wishes
> to expose to the guest, rather than the guest discovering them
> unconditionally. The idea employed by the series was taken from
> [1] as suggested by Marc Z.
> 
> In a broad sense, the concept is similar to the current implementation
> of PSCI interface- create a 'firmware psuedo-register' to handle the
> firmware revisions. The series extends this idea to all the other
> hypercalls such as TRNG (True Random Number Generator), PV_TIME
> (Paravirtualized Time), and PTP (Precision Time protocol).
> 
> For better categorization and future scaling, these firmware registers
> are categorized based on the service call owners. Also, unlike the
> existing firmware psuedo-registers, they hold the features supported
> in the form of a bitmap.
> 
> During the VM initialization, the registers holds an upper-limit of
> the features supported by each one of them. It's expected that the
> userspace discover the features provided by each register via GET_ONE_REG,
> and writeback the desired values using SET_ONE_REG. KVM allows this
> modification only until the VM has started.
> 
> Some of the standard function-ids, such as ARM_SMCCC_VERSION_FUNC_ID,
> need not be associated with a feature bit. For such ids, the series
> introduced an allowed-list, hvc_func_default_allowed_list[], that holds
> all such ids. As a result, the functions that are not elected by userspace,
> or if they are not a part of this allowed-list, will be denied for when
> the guests invoke them.
> 
> Older VMMs can simply ignore this interface and the hypercall services
> will be exposed unconditionally to the guests, thus ensuring backward
> compatibility.
> 

[...]

I rethinking about the design again and just get one question. Hopefully,
someone have the answer for us. The newly added 3 pseudo registers and
the existing ones like KVM_REG_ARM_PSCI_VERSION are all tied up with
vcpu, instead of VM. I don't think it's correct. I'm not sure if VM-scoped
pseudo registers aren't allowed by ARM architecture or the effort isn't
worthy to support it.

These pseudo registers are introduced to present the available hypercalls,
and then they can be disabled from userspace. In the implementation, these 3
registers are vcpu scoped. It means that multiple vcpus can be asymmetric
in terms of usable hypercalls. For example, ARM_SMCCC_TRNG hypercalls
can be enabled on vcpu0, but disabled on vcpu1. I don't think it's expected.

On the other hand, the information stored in these 3 registers needs to
be migrated through {GET,SET}_ONE_REG by VMM (QEMU). all the information
stored in these 3 registers are all same on all vcpus, which is exactly
as we expect. In migration circumstance, we're transporting identical
information for all vcpus and it's unnecessary.

Thanks,
Gavin
Marc Zyngier April 15, 2022, 8:58 a.m. UTC | #2
On Fri, 15 Apr 2022 07:44:55 +0100,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Raghavendra,
> 
> On 4/7/22 9:15 AM, Raghavendra Rao Ananta wrote:
> > Continuing the discussion from [1], the series tries to add support
> > for the userspace to elect the hypercall services that it wishes
> > to expose to the guest, rather than the guest discovering them
> > unconditionally. The idea employed by the series was taken from
> > [1] as suggested by Marc Z.
> > 
> > In a broad sense, the concept is similar to the current implementation
> > of PSCI interface- create a 'firmware psuedo-register' to handle the
> > firmware revisions. The series extends this idea to all the other
> > hypercalls such as TRNG (True Random Number Generator), PV_TIME
> > (Paravirtualized Time), and PTP (Precision Time protocol).
> > 
> > For better categorization and future scaling, these firmware registers
> > are categorized based on the service call owners. Also, unlike the
> > existing firmware psuedo-registers, they hold the features supported
> > in the form of a bitmap.
> > 
> > During the VM initialization, the registers holds an upper-limit of
> > the features supported by each one of them. It's expected that the
> > userspace discover the features provided by each register via GET_ONE_REG,
> > and writeback the desired values using SET_ONE_REG. KVM allows this
> > modification only until the VM has started.
> > 
> > Some of the standard function-ids, such as ARM_SMCCC_VERSION_FUNC_ID,
> > need not be associated with a feature bit. For such ids, the series
> > introduced an allowed-list, hvc_func_default_allowed_list[], that holds
> > all such ids. As a result, the functions that are not elected by userspace,
> > or if they are not a part of this allowed-list, will be denied for when
> > the guests invoke them.
> > 
> > Older VMMs can simply ignore this interface and the hypercall services
> > will be exposed unconditionally to the guests, thus ensuring backward
> > compatibility.
> > 
> 
> [...]
> 
> I rethinking about the design again and just get one question. Hopefully,
> someone have the answer for us. The newly added 3 pseudo registers and
> the existing ones like KVM_REG_ARM_PSCI_VERSION are all tied up with
> vcpu, instead of VM. I don't think it's correct. I'm not sure if VM-scoped
> pseudo registers aren't allowed by ARM architecture or the effort isn't
> worthy to support it.

We have had that discussion before (around version 2 of this series,
if I remember well).

> 
> These pseudo registers are introduced to present the available hypercalls,
> and then they can be disabled from userspace. In the implementation, these 3
> registers are vcpu scoped. It means that multiple vcpus can be asymmetric
> in terms of usable hypercalls. For example, ARM_SMCCC_TRNG hypercalls
> can be enabled on vcpu0, but disabled on vcpu1. I don't think it's expected.

No, that's not the way this is supposed to work. These hypercalls are
of course global, even if the accessor is per-vcpu. This is similar to
tons of other things, such as some of the PMU data, the timer virtual
offset... the list goes on. If that's not what this code does, then it
is a bug and it needs to be fixed.

> On the other hand, the information stored in these 3 registers needs to
> be migrated through {GET,SET}_ONE_REG by VMM (QEMU). all the information
> stored in these 3 registers are all same on all vcpus, which is exactly
> as we expect. In migration circumstance, we're transporting identical
> information for all vcpus and it's unnecessary.

Yes, we all understand that. My response to that was (and still is):

- There is no need to invent a new userspace interface. The one we
  have is terrible enough, and we don't need another square wheel that
  would need to be maintained beside the existing one.

- Let's say we have 1024 new pseudo-registers, 1024 vcpus, 64bit regs:
  that's 8MB worth of extra data. This is not insignificant, but also
  not really a problem given that such a large VM is probably attached
  to a proportionally large amount of memory. In practice, we're
  talking of less than 10 registers, and less than 100 vcpus. A crazy
  8kB at most. Who cares?

- If this is eventually deemed to be a *real* scalability problem, we
  can always expose a map of registers that are global, and let
  userspace know that it can elide the rest. Problem solved, backward
  compatibility preserved. And I'm willing to bet that we won't need
  it in my lifetime.

Thanks,

	M.
Gavin Shan April 18, 2022, 2:53 a.m. UTC | #3
Hi Marc,

On 4/15/22 4:58 PM, Marc Zyngier wrote:
> On Fri, 15 Apr 2022 07:44:55 +0100,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 4/7/22 9:15 AM, Raghavendra Rao Ananta wrote:
>>> Continuing the discussion from [1], the series tries to add support
>>> for the userspace to elect the hypercall services that it wishes
>>> to expose to the guest, rather than the guest discovering them
>>> unconditionally. The idea employed by the series was taken from
>>> [1] as suggested by Marc Z.
>>>
>>> In a broad sense, the concept is similar to the current implementation
>>> of PSCI interface- create a 'firmware psuedo-register' to handle the
>>> firmware revisions. The series extends this idea to all the other
>>> hypercalls such as TRNG (True Random Number Generator), PV_TIME
>>> (Paravirtualized Time), and PTP (Precision Time protocol).
>>>
>>> For better categorization and future scaling, these firmware registers
>>> are categorized based on the service call owners. Also, unlike the
>>> existing firmware psuedo-registers, they hold the features supported
>>> in the form of a bitmap.
>>>
>>> During the VM initialization, the registers holds an upper-limit of
>>> the features supported by each one of them. It's expected that the
>>> userspace discover the features provided by each register via GET_ONE_REG,
>>> and writeback the desired values using SET_ONE_REG. KVM allows this
>>> modification only until the VM has started.
>>>
>>> Some of the standard function-ids, such as ARM_SMCCC_VERSION_FUNC_ID,
>>> need not be associated with a feature bit. For such ids, the series
>>> introduced an allowed-list, hvc_func_default_allowed_list[], that holds
>>> all such ids. As a result, the functions that are not elected by userspace,
>>> or if they are not a part of this allowed-list, will be denied for when
>>> the guests invoke them.
>>>
>>> Older VMMs can simply ignore this interface and the hypercall services
>>> will be exposed unconditionally to the guests, thus ensuring backward
>>> compatibility.
>>>
>>
>> [...]
>>
>> I rethinking about the design again and just get one question. Hopefully,
>> someone have the answer for us. The newly added 3 pseudo registers and
>> the existing ones like KVM_REG_ARM_PSCI_VERSION are all tied up with
>> vcpu, instead of VM. I don't think it's correct. I'm not sure if VM-scoped
>> pseudo registers aren't allowed by ARM architecture or the effort isn't
>> worthy to support it.
> 
> We have had that discussion before (around version 2 of this series,
> if I remember well).
> 

Yeah, I'm chime-in this series lately. There must be some discussions,
including this topic, I missed :)

>>
>> These pseudo registers are introduced to present the available hypercalls,
>> and then they can be disabled from userspace. In the implementation, these 3
>> registers are vcpu scoped. It means that multiple vcpus can be asymmetric
>> in terms of usable hypercalls. For example, ARM_SMCCC_TRNG hypercalls
>> can be enabled on vcpu0, but disabled on vcpu1. I don't think it's expected.
> 
> No, that's not the way this is supposed to work. These hypercalls are
> of course global, even if the accessor is per-vcpu. This is similar to
> tons of other things, such as some of the PMU data, the timer virtual
> offset... the list goes on. If that's not what this code does, then it
> is a bug and it needs to be fixed.
> 

Ok.

>> On the other hand, the information stored in these 3 registers needs to
>> be migrated through {GET,SET}_ONE_REG by VMM (QEMU). all the information
>> stored in these 3 registers are all same on all vcpus, which is exactly
>> as we expect. In migration circumstance, we're transporting identical
>> information for all vcpus and it's unnecessary.
> 
> Yes, we all understand that. My response to that was (and still is):
> 
> - There is no need to invent a new userspace interface. The one we
>    have is terrible enough, and we don't need another square wheel that
>    would need to be maintained beside the existing one.
> 
> - Let's say we have 1024 new pseudo-registers, 1024 vcpus, 64bit regs:
>    that's 8MB worth of extra data. This is not insignificant, but also
>    not really a problem given that such a large VM is probably attached
>    to a proportionally large amount of memory. In practice, we're
>    talking of less than 10 registers, and less than 100 vcpus. A crazy
>    8kB at most. Who cares?
> 
> - If this is eventually deemed to be a *real* scalability problem, we
>    can always expose a map of registers that are global, and let
>    userspace know that it can elide the rest. Problem solved, backward
>    compatibility preserved. And I'm willing to bet that we won't need
>    it in my lifetime.
> 

The reason why I raised question is just to check if it's a missed
point in the design. As I said, I obviously missed the previous
discussions and glad that this has been discussed through.

Thanks for the details. Yes, it's totally fine to migrate 8KB data.
Besides, VMM (QEMU) can choose to do migration on one single vcpu,
instead of all of them, as you said.

Thanks,
Gavin