mbox series

[v6,00/21] KVM: Add idempotent controls for migrating system counter state

Message ID 20210804085819.846610-1-oupton@google.com (mailing list archive)
Headers show
Series KVM: Add idempotent controls for migrating system counter state | expand

Message

Oliver Upton Aug. 4, 2021, 8:57 a.m. UTC
KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state is not idempotent: the host system counter is still
oscillating between the attempted save and restore. Furthermore, VMMs
may wish to transparently live migrate guest VMs, meaning that they
include the elapsed time due to live migration blackout in the guest
system counter view. The VMM thread could be preempted for any number of
reasons (scheduler, L0 hypervisor under nested) between the time that
it calculates the desired guest counter value and when KVM actually sets
this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of system counter offsets.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in new ioctls. This series
introduces new vCPU device attributes to provide userspace access to the
vCPU's system counter offset.

Patch 1 addresses a possible race in KVM_GET_CLOCK where
use_master_clock is read outside of the pvclock_gtod_sync_lock.

Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
ioctls to provide userspace with a (host_tsc, realtime) instant. This is
essential for a VMM to perform precise migration of the guest's system
counters.

Patches 3-4 are some preparatory changes for exposing the TSC offset to
userspace. Patch 5 provides a vCPU attribute to provide userspace access
to the TSC offset.

Patches 6-7 implement a test for the new additions to
KVM_{GET,SET}_CLOCK.

Patch 8 fixes some assertions in the kvm device attribute helpers.

Patches 9-10 implement at test for the tsc offset attribute introduced in
patch 5.

Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
through the ONE_REG interface.

Patches 14-15 add test cases for userspace manipulation of the virtual
counter-timer.

Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
support for non-ECV hosts by emulating physical counter offsetting.

Patch 20 adds test cases for adjusting the host-guest offset, and
finally patch 21 adds a test to measure the emulation overhead of
CNTPCT_EL2.

This series was tested on both an Ampere Mt. Jade and Haswell systems.
Unfortunately, the ECV portions of this series are untested, as there is
no ECV-capable hardware and the ARM fast models only partially implement
ECV.

Physical counter benchmark
--------------------------

The following data was collected by running 10000 iterations of the
benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
machine with 2 80-core Ampere Altra SoCs. Measurements were collected
for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
parameter.

nVHE
----

+--------------------+--------+---------+
|       Metric       | Native | Trapped |
+--------------------+--------+---------+
| Average            | 54ns   | 148ns   |
| Standard Deviation | 124ns  | 122ns   |
| 95th Percentile    | 258ns  | 348ns   |
+--------------------+--------+---------+

VHE
---

+--------------------+--------+---------+
|       Metric       | Native | Trapped |
+--------------------+--------+---------+
| Average            | 53ns   | 152ns   |
| Standard Deviation | 92ns   | 94ns    |
| 95th Percentile    | 204ns  | 307ns   |
+--------------------+--------+---------+

This series applies cleanly to kvm/queue at the following commit:

6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI descriptor")

v1 -> v2:
  - Reimplemented as vCPU device attributes instead of a distinct ioctl.
  - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
  - Changed the arm64 implementation to broadcast counter
    offset values to all vCPUs in a guest. This upholds the
    architectural expectations of a consistent counter-timer across CPUs.
  - Fixed a bug with traps in VHE mode. We now configure traps on every
    transition into a guest to handle differing VMs (trapped, emulated).

v2 -> v3:
  - Added documentation for additions to KVM_{GET,SET}_CLOCK
  - Added documentation for all new vCPU attributes
  - Added documentation for suggested algorithm to migrate a guest's
    TSC(s)
  - Bug fixes throughout series
  - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME

v3 -> v4:
  - Added patch to address incorrect device helper assertions (Drew)
  - Carried Drew's r-b tags where appropriate
  - x86 selftest cleanup
  - Removed stale kvm_timer_init_vhe() function
  - Removed unnecessary GUEST_DONE() from selftests

v4 -> v5:
  - Fix typo in TSC migration algorithm
  - Carry more of Drew's r-b tags
  - clean up run loop logic in counter emulation benchmark (missed from
    Drew's comments on v3)

v5 -> v6:
  - Add fix for race in KVM_GET_CLOCK (Sean)
  - Fix 32-bit build issues in series + use of uninitialized host tsc
    value (Sean)
  - General style cleanups
  - Rework ARM virtual counter offsetting to match guest behavior. Use
    the ONE_REG interface instead of a VM attribute (Marc)
  - Maintain a single host-guest counter offset, which applies to both
    physical and virtual counters
  - Dropped some of Drew's r-b tags due to nontrivial patch changes
    (sorry for the churn!)

v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oupton@google.com/
v2: https://lore.kernel.org/r/20210716212629.2232756-1-oupton@google.com
v3: https://lore.kernel.org/r/20210719184949.1385910-1-oupton@google.com
v4: https://lore.kernel.org/r/20210729001012.70394-1-oupton@google.com
v5: https://lore.kernel.org/r/20210729173300.181775-1-oupton@google.com

Oliver Upton (21):
  KVM: x86: Fix potential race in KVM_GET_CLOCK
  KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
  KVM: x86: Refactor tsc synchronization code
  KVM: x86: Expose TSC offset controls to userspace
  tools: arch: x86: pull in pvclock headers
  selftests: KVM: Add test for KVM_{GET,SET}_CLOCK
  selftests: KVM: Fix kvm device helper ioctl assertions
  selftests: KVM: Add helpers for vCPU device attributes
  selftests: KVM: Introduce system counter offset test
  KVM: arm64: Refactor update_vtimer_cntvoff()
  KVM: arm64: Separate guest/host counter offset values
  KVM: arm64: Allow userspace to configure a vCPU's virtual offset
  selftests: KVM: Add helper to check for register presence
  selftests: KVM: Add support for aarch64 to system_counter_offset_test
  arm64: cpufeature: Enumerate support for Enhanced Counter
    Virtualization
  KVM: arm64: Allow userspace to configure a guest's counter-timer
    offset
  KVM: arm64: Configure timer traps in vcpu_load() for VHE
  KVM: arm64: Emulate physical counter offsetting on non-ECV systems
  selftests: KVM: Test physical counter offsetting
  selftests: KVM: Add counter emulation benchmark

 Documentation/virt/kvm/api.rst                |  52 ++-
 Documentation/virt/kvm/devices/vcpu.rst       |  85 ++++
 Documentation/virt/kvm/locking.rst            |  11 +
 arch/arm64/include/asm/kvm_asm.h              |   2 +
 arch/arm64/include/asm/sysreg.h               |   5 +
 arch/arm64/include/uapi/asm/kvm.h             |   2 +
 arch/arm64/kernel/cpufeature.c                |  10 +
 arch/arm64/kvm/arch_timer.c                   | 224 ++++++++++-
 arch/arm64/kvm/arm.c                          |   4 +-
 arch/arm64/kvm/guest.c                        |   6 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  29 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   6 +
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
 arch/arm64/kvm/hyp/vhe/timer-sr.c             |   5 +
 arch/arm64/tools/cpucaps                      |   1 +
 arch/x86/include/asm/kvm_host.h               |   4 +
 arch/x86/include/uapi/asm/kvm.h               |   4 +
 arch/x86/kvm/x86.c                            | 364 +++++++++++++-----
 include/clocksource/arm_arch_timer.h          |   1 +
 include/kvm/arm_arch_timer.h                  |   6 +-
 include/uapi/linux/kvm.h                      |   7 +-
 tools/arch/x86/include/asm/pvclock-abi.h      |  48 +++
 tools/arch/x86/include/asm/pvclock.h          | 103 +++++
 tools/testing/selftests/kvm/.gitignore        |   3 +
 tools/testing/selftests/kvm/Makefile          |   4 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 207 ++++++++++
 .../selftests/kvm/include/aarch64/processor.h |  24 ++
 .../testing/selftests/kvm/include/kvm_util.h  |  13 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  63 ++-
 .../kvm/system_counter_offset_test.c          | 211 ++++++++++
 .../selftests/kvm/x86_64/kvm_clock_test.c     | 204 ++++++++++
 31 files changed, 1581 insertions(+), 143 deletions(-)
 create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h
 create mode 100644 tools/arch/x86/include/asm/pvclock.h
 create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
 create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c
 create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c

Comments

Oliver Upton Aug. 4, 2021, 11:05 a.m. UTC | #1
On Wed, Aug 4, 2021 at 1:58 AM Oliver Upton <oupton@google.com> wrote:
>
> KVM's current means of saving/restoring system counters is plagued with
> temporal issues. At least on ARM64 and x86, we migrate the guest's
> system counter by-value through the respective guest system register
> values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> brittle as the state is not idempotent: the host system counter is still
> oscillating between the attempted save and restore. Furthermore, VMMs
> may wish to transparently live migrate guest VMs, meaning that they
> include the elapsed time due to live migration blackout in the guest
> system counter view. The VMM thread could be preempted for any number of
> reasons (scheduler, L0 hypervisor under nested) between the time that
> it calculates the desired guest counter value and when KVM actually sets
> this counter state.
>
> Despite the value-based interface that we present to userspace, KVM
> actually has idempotent guest controls by way of system counter offsets.
> We can avoid all of the issues associated with a value-based interface
> by abstracting these offset controls in new ioctls. This series
> introduces new vCPU device attributes to provide userspace access to the
> vCPU's system counter offset.
>
> Patch 1 addresses a possible race in KVM_GET_CLOCK where
> use_master_clock is read outside of the pvclock_gtod_sync_lock.
>
> Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> essential for a VMM to perform precise migration of the guest's system
> counters.
>
> Patches 3-4 are some preparatory changes for exposing the TSC offset to
> userspace. Patch 5 provides a vCPU attribute to provide userspace access
> to the TSC offset.
>
> Patches 6-7 implement a test for the new additions to
> KVM_{GET,SET}_CLOCK.
>
> Patch 8 fixes some assertions in the kvm device attribute helpers.
>
> Patches 9-10 implement at test for the tsc offset attribute introduced in
> patch 5.
>
> Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
> through the ONE_REG interface.
>
> Patches 14-15 add test cases for userspace manipulation of the virtual
> counter-timer.
>
> Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
> ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
> support for non-ECV hosts by emulating physical counter offsetting.
>
> Patch 20 adds test cases for adjusting the host-guest offset, and
> finally patch 21 adds a test to measure the emulation overhead of
> CNTPCT_EL2.
>
> This series was tested on both an Ampere Mt. Jade and Haswell systems.
> Unfortunately, the ECV portions of this series are untested, as there is
> no ECV-capable hardware and the ARM fast models only partially implement
> ECV.

Small correction: I was only using the foundation model. Apparently
the AEM FVP provides full ECV support.

>
> Physical counter benchmark
> --------------------------
>
> The following data was collected by running 10000 iterations of the
> benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
> machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> parameter.
>
> nVHE
> ----
>
> +--------------------+--------+---------+
> |       Metric       | Native | Trapped |
> +--------------------+--------+---------+
> | Average            | 54ns   | 148ns   |
> | Standard Deviation | 124ns  | 122ns   |
> | 95th Percentile    | 258ns  | 348ns   |
> +--------------------+--------+---------+
>
> VHE
> ---
>
> +--------------------+--------+---------+
> |       Metric       | Native | Trapped |
> +--------------------+--------+---------+
> | Average            | 53ns   | 152ns   |
> | Standard Deviation | 92ns   | 94ns    |
> | 95th Percentile    | 204ns  | 307ns   |
> +--------------------+--------+---------+
>
> This series applies cleanly to kvm/queue at the following commit:
>
> 6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI descriptor")
>
> v1 -> v2:
>   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
>   - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
>   - Changed the arm64 implementation to broadcast counter
>     offset values to all vCPUs in a guest. This upholds the
>     architectural expectations of a consistent counter-timer across CPUs.
>   - Fixed a bug with traps in VHE mode. We now configure traps on every
>     transition into a guest to handle differing VMs (trapped, emulated).
>
> v2 -> v3:
>   - Added documentation for additions to KVM_{GET,SET}_CLOCK
>   - Added documentation for all new vCPU attributes
>   - Added documentation for suggested algorithm to migrate a guest's
>     TSC(s)
>   - Bug fixes throughout series
>   - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME
>
> v3 -> v4:
>   - Added patch to address incorrect device helper assertions (Drew)
>   - Carried Drew's r-b tags where appropriate
>   - x86 selftest cleanup
>   - Removed stale kvm_timer_init_vhe() function
>   - Removed unnecessary GUEST_DONE() from selftests
>
> v4 -> v5:
>   - Fix typo in TSC migration algorithm
>   - Carry more of Drew's r-b tags
>   - clean up run loop logic in counter emulation benchmark (missed from
>     Drew's comments on v3)
>
> v5 -> v6:
>   - Add fix for race in KVM_GET_CLOCK (Sean)
>   - Fix 32-bit build issues in series + use of uninitialized host tsc
>     value (Sean)
>   - General style cleanups
>   - Rework ARM virtual counter offsetting to match guest behavior. Use
>     the ONE_REG interface instead of a VM attribute (Marc)
>   - Maintain a single host-guest counter offset, which applies to both
>     physical and virtual counters
>   - Dropped some of Drew's r-b tags due to nontrivial patch changes
>     (sorry for the churn!)
>
> v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oupton@google.com/
> v2: https://lore.kernel.org/r/20210716212629.2232756-1-oupton@google.com
> v3: https://lore.kernel.org/r/20210719184949.1385910-1-oupton@google.com
> v4: https://lore.kernel.org/r/20210729001012.70394-1-oupton@google.com
> v5: https://lore.kernel.org/r/20210729173300.181775-1-oupton@google.com
>
> Oliver Upton (21):
>   KVM: x86: Fix potential race in KVM_GET_CLOCK
>   KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
>   KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
>   KVM: x86: Refactor tsc synchronization code
>   KVM: x86: Expose TSC offset controls to userspace
>   tools: arch: x86: pull in pvclock headers
>   selftests: KVM: Add test for KVM_{GET,SET}_CLOCK
>   selftests: KVM: Fix kvm device helper ioctl assertions
>   selftests: KVM: Add helpers for vCPU device attributes
>   selftests: KVM: Introduce system counter offset test
>   KVM: arm64: Refactor update_vtimer_cntvoff()
>   KVM: arm64: Separate guest/host counter offset values
>   KVM: arm64: Allow userspace to configure a vCPU's virtual offset
>   selftests: KVM: Add helper to check for register presence
>   selftests: KVM: Add support for aarch64 to system_counter_offset_test
>   arm64: cpufeature: Enumerate support for Enhanced Counter
>     Virtualization
>   KVM: arm64: Allow userspace to configure a guest's counter-timer
>     offset
>   KVM: arm64: Configure timer traps in vcpu_load() for VHE
>   KVM: arm64: Emulate physical counter offsetting on non-ECV systems
>   selftests: KVM: Test physical counter offsetting
>   selftests: KVM: Add counter emulation benchmark
>
>  Documentation/virt/kvm/api.rst                |  52 ++-
>  Documentation/virt/kvm/devices/vcpu.rst       |  85 ++++
>  Documentation/virt/kvm/locking.rst            |  11 +
>  arch/arm64/include/asm/kvm_asm.h              |   2 +
>  arch/arm64/include/asm/sysreg.h               |   5 +
>  arch/arm64/include/uapi/asm/kvm.h             |   2 +
>  arch/arm64/kernel/cpufeature.c                |  10 +
>  arch/arm64/kvm/arch_timer.c                   | 224 ++++++++++-
>  arch/arm64/kvm/arm.c                          |   4 +-
>  arch/arm64/kvm/guest.c                        |   6 +-
>  arch/arm64/kvm/hyp/include/hyp/switch.h       |  29 ++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   6 +
>  arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
>  arch/arm64/kvm/hyp/vhe/timer-sr.c             |   5 +
>  arch/arm64/tools/cpucaps                      |   1 +
>  arch/x86/include/asm/kvm_host.h               |   4 +
>  arch/x86/include/uapi/asm/kvm.h               |   4 +
>  arch/x86/kvm/x86.c                            | 364 +++++++++++++-----
>  include/clocksource/arm_arch_timer.h          |   1 +
>  include/kvm/arm_arch_timer.h                  |   6 +-
>  include/uapi/linux/kvm.h                      |   7 +-
>  tools/arch/x86/include/asm/pvclock-abi.h      |  48 +++
>  tools/arch/x86/include/asm/pvclock.h          | 103 +++++
>  tools/testing/selftests/kvm/.gitignore        |   3 +
>  tools/testing/selftests/kvm/Makefile          |   4 +
>  .../kvm/aarch64/counter_emulation_benchmark.c | 207 ++++++++++
>  .../selftests/kvm/include/aarch64/processor.h |  24 ++
>  .../testing/selftests/kvm/include/kvm_util.h  |  13 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    |  63 ++-
>  .../kvm/system_counter_offset_test.c          | 211 ++++++++++
>  .../selftests/kvm/x86_64/kvm_clock_test.c     | 204 ++++++++++
>  31 files changed, 1581 insertions(+), 143 deletions(-)
>  create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h
>  create mode 100644 tools/arch/x86/include/asm/pvclock.h
>  create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
>  create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c
>  create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
>
> --
> 2.32.0.605.g8dce9f2422-goog
>
Oliver Upton Aug. 4, 2021, 10:03 p.m. UTC | #2
On Wed, Aug 4, 2021 at 4:05 AM Oliver Upton <oupton@google.com> wrote:
>
> On Wed, Aug 4, 2021 at 1:58 AM Oliver Upton <oupton@google.com> wrote:
> >
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces new vCPU device attributes to provide userspace access to the
> > vCPU's system counter offset.
> >
> > Patch 1 addresses a possible race in KVM_GET_CLOCK where
> > use_master_clock is read outside of the pvclock_gtod_sync_lock.
> >
> > Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> > ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> > essential for a VMM to perform precise migration of the guest's system
> > counters.
> >
> > Patches 3-4 are some preparatory changes for exposing the TSC offset to
> > userspace. Patch 5 provides a vCPU attribute to provide userspace access
> > to the TSC offset.
> >
> > Patches 6-7 implement a test for the new additions to
> > KVM_{GET,SET}_CLOCK.
> >
> > Patch 8 fixes some assertions in the kvm device attribute helpers.
> >
> > Patches 9-10 implement at test for the tsc offset attribute introduced in
> > patch 5.
> >
> > Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
> > through the ONE_REG interface.
> >
> > Patches 14-15 add test cases for userspace manipulation of the virtual
> > counter-timer.
> >
> > Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
> > ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
> > support for non-ECV hosts by emulating physical counter offsetting.
> >
> > Patch 20 adds test cases for adjusting the host-guest offset, and
> > finally patch 21 adds a test to measure the emulation overhead of
> > CNTPCT_EL2.
> >
> > This series was tested on both an Ampere Mt. Jade and Haswell systems.
> > Unfortunately, the ECV portions of this series are untested, as there is
> > no ECV-capable hardware and the ARM fast models only partially implement
> > ECV.
>
> Small correction: I was only using the foundation model. Apparently
> the AEM FVP provides full ECV support.

Ok. I've now tested this series on the FVP Base RevC fast model@v8.6 +
ECV=2. Passes on VHE, fails on nVHE.

I'll respin this series with the fix for nVHE+ECV soon.

--
Thanks,
Oliver

>
> >
> > Physical counter benchmark
> > --------------------------
> >
> > The following data was collected by running 10000 iterations of the
> > benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > ----
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile    | 258ns  | 348ns   |
> > +--------------------+--------+---------+
> >
> > VHE
> > ---
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns    |
> > | 95th Percentile    | 204ns  | 307ns   |
> > +--------------------+--------+---------+
> >
> > This series applies cleanly to kvm/queue at the following commit:
> >
> > 6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI descriptor")
> >
> > v1 -> v2:
> >   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
> >   - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
> >   - Changed the arm64 implementation to broadcast counter
> >     offset values to all vCPUs in a guest. This upholds the
> >     architectural expectations of a consistent counter-timer across CPUs.
> >   - Fixed a bug with traps in VHE mode. We now configure traps on every
> >     transition into a guest to handle differing VMs (trapped, emulated).
> >
> > v2 -> v3:
> >   - Added documentation for additions to KVM_{GET,SET}_CLOCK
> >   - Added documentation for all new vCPU attributes
> >   - Added documentation for suggested algorithm to migrate a guest's
> >     TSC(s)
> >   - Bug fixes throughout series
> >   - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME
> >
> > v3 -> v4:
> >   - Added patch to address incorrect device helper assertions (Drew)
> >   - Carried Drew's r-b tags where appropriate
> >   - x86 selftest cleanup
> >   - Removed stale kvm_timer_init_vhe() function
> >   - Removed unnecessary GUEST_DONE() from selftests
> >
> > v4 -> v5:
> >   - Fix typo in TSC migration algorithm
> >   - Carry more of Drew's r-b tags
> >   - clean up run loop logic in counter emulation benchmark (missed from
> >     Drew's comments on v3)
> >
> > v5 -> v6:
> >   - Add fix for race in KVM_GET_CLOCK (Sean)
> >   - Fix 32-bit build issues in series + use of uninitialized host tsc
> >     value (Sean)
> >   - General style cleanups
> >   - Rework ARM virtual counter offsetting to match guest behavior. Use
> >     the ONE_REG interface instead of a VM attribute (Marc)
> >   - Maintain a single host-guest counter offset, which applies to both
> >     physical and virtual counters
> >   - Dropped some of Drew's r-b tags due to nontrivial patch changes
> >     (sorry for the churn!)
> >
> > v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oupton@google.com/
> > v2: https://lore.kernel.org/r/20210716212629.2232756-1-oupton@google.com
> > v3: https://lore.kernel.org/r/20210719184949.1385910-1-oupton@google.com
> > v4: https://lore.kernel.org/r/20210729001012.70394-1-oupton@google.com
> > v5: https://lore.kernel.org/r/20210729173300.181775-1-oupton@google.com
> >
> > Oliver Upton (21):
> >   KVM: x86: Fix potential race in KVM_GET_CLOCK
> >   KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
> >   KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
> >   KVM: x86: Refactor tsc synchronization code
> >   KVM: x86: Expose TSC offset controls to userspace
> >   tools: arch: x86: pull in pvclock headers
> >   selftests: KVM: Add test for KVM_{GET,SET}_CLOCK
> >   selftests: KVM: Fix kvm device helper ioctl assertions
> >   selftests: KVM: Add helpers for vCPU device attributes
> >   selftests: KVM: Introduce system counter offset test
> >   KVM: arm64: Refactor update_vtimer_cntvoff()
> >   KVM: arm64: Separate guest/host counter offset values
> >   KVM: arm64: Allow userspace to configure a vCPU's virtual offset
> >   selftests: KVM: Add helper to check for register presence
> >   selftests: KVM: Add support for aarch64 to system_counter_offset_test
> >   arm64: cpufeature: Enumerate support for Enhanced Counter
> >     Virtualization
> >   KVM: arm64: Allow userspace to configure a guest's counter-timer
> >     offset
> >   KVM: arm64: Configure timer traps in vcpu_load() for VHE
> >   KVM: arm64: Emulate physical counter offsetting on non-ECV systems
> >   selftests: KVM: Test physical counter offsetting
> >   selftests: KVM: Add counter emulation benchmark
> >
> >  Documentation/virt/kvm/api.rst                |  52 ++-
> >  Documentation/virt/kvm/devices/vcpu.rst       |  85 ++++
> >  Documentation/virt/kvm/locking.rst            |  11 +
> >  arch/arm64/include/asm/kvm_asm.h              |   2 +
> >  arch/arm64/include/asm/sysreg.h               |   5 +
> >  arch/arm64/include/uapi/asm/kvm.h             |   2 +
> >  arch/arm64/kernel/cpufeature.c                |  10 +
> >  arch/arm64/kvm/arch_timer.c                   | 224 ++++++++++-
> >  arch/arm64/kvm/arm.c                          |   4 +-
> >  arch/arm64/kvm/guest.c                        |   6 +-
> >  arch/arm64/kvm/hyp/include/hyp/switch.h       |  29 ++
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   6 +
> >  arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
> >  arch/arm64/kvm/hyp/vhe/timer-sr.c             |   5 +
> >  arch/arm64/tools/cpucaps                      |   1 +
> >  arch/x86/include/asm/kvm_host.h               |   4 +
> >  arch/x86/include/uapi/asm/kvm.h               |   4 +
> >  arch/x86/kvm/x86.c                            | 364 +++++++++++++-----
> >  include/clocksource/arm_arch_timer.h          |   1 +
> >  include/kvm/arm_arch_timer.h                  |   6 +-
> >  include/uapi/linux/kvm.h                      |   7 +-
> >  tools/arch/x86/include/asm/pvclock-abi.h      |  48 +++
> >  tools/arch/x86/include/asm/pvclock.h          | 103 +++++
> >  tools/testing/selftests/kvm/.gitignore        |   3 +
> >  tools/testing/selftests/kvm/Makefile          |   4 +
> >  .../kvm/aarch64/counter_emulation_benchmark.c | 207 ++++++++++
> >  .../selftests/kvm/include/aarch64/processor.h |  24 ++
> >  .../testing/selftests/kvm/include/kvm_util.h  |  13 +
> >  tools/testing/selftests/kvm/lib/kvm_util.c    |  63 ++-
> >  .../kvm/system_counter_offset_test.c          | 211 ++++++++++
> >  .../selftests/kvm/x86_64/kvm_clock_test.c     | 204 ++++++++++
> >  31 files changed, 1581 insertions(+), 143 deletions(-)
> >  create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h
> >  create mode 100644 tools/arch/x86/include/asm/pvclock.h
> >  create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
> >  create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c
> >  create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
> >
> > --
> > 2.32.0.605.g8dce9f2422-goog
> >
Oliver Upton Aug. 10, 2021, 12:04 a.m. UTC | #3
On Wed, Aug 4, 2021 at 3:03 PM Oliver Upton <oupton@google.com> wrote:
>
> On Wed, Aug 4, 2021 at 4:05 AM Oliver Upton <oupton@google.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 1:58 AM Oliver Upton <oupton@google.com> wrote:
> > >
> > > KVM's current means of saving/restoring system counters is plagued with
> > > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > > system counter by-value through the respective guest system register
> > > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > > brittle as the state is not idempotent: the host system counter is still
> > > oscillating between the attempted save and restore. Furthermore, VMMs
> > > may wish to transparently live migrate guest VMs, meaning that they
> > > include the elapsed time due to live migration blackout in the guest
> > > system counter view. The VMM thread could be preempted for any number of
> > > reasons (scheduler, L0 hypervisor under nested) between the time that
> > > it calculates the desired guest counter value and when KVM actually sets
> > > this counter state.
> > >
> > > Despite the value-based interface that we present to userspace, KVM
> > > actually has idempotent guest controls by way of system counter offsets.
> > > We can avoid all of the issues associated with a value-based interface
> > > by abstracting these offset controls in new ioctls. This series
> > > introduces new vCPU device attributes to provide userspace access to the
> > > vCPU's system counter offset.
> > >
> > > Patch 1 addresses a possible race in KVM_GET_CLOCK where
> > > use_master_clock is read outside of the pvclock_gtod_sync_lock.
> > >
> > > Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> > > ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> > > essential for a VMM to perform precise migration of the guest's system
> > > counters.
> > >
> > > Patches 3-4 are some preparatory changes for exposing the TSC offset to
> > > userspace. Patch 5 provides a vCPU attribute to provide userspace access
> > > to the TSC offset.
> > >
> > > Patches 6-7 implement a test for the new additions to
> > > KVM_{GET,SET}_CLOCK.
> > >
> > > Patch 8 fixes some assertions in the kvm device attribute helpers.
> > >
> > > Patches 9-10 implement at test for the tsc offset attribute introduced in
> > > patch 5.

Paolo,

Is there anything else you're waiting to see for the x86 portion of
this series after addressing Sean's comments? There's some work
remaining on the arm64 side, though I believe the two architectures
are now disjoint for this series.

--
Thanks,
Oliver

> > > Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
> > > through the ONE_REG interface.
> > >
> > > Patches 14-15 add test cases for userspace manipulation of the virtual
> > > counter-timer.
> > >
> > > Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
> > > ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
> > > support for non-ECV hosts by emulating physical counter offsetting.
> > >
> > > Patch 20 adds test cases for adjusting the host-guest offset, and
> > > finally patch 21 adds a test to measure the emulation overhead of
> > > CNTPCT_EL2.
> > >
> > > This series was tested on both an Ampere Mt. Jade and Haswell systems.
> > > Unfortunately, the ECV portions of this series are untested, as there is
> > > no ECV-capable hardware and the ARM fast models only partially implement
> > > ECV.
> >
> > Small correction: I was only using the foundation model. Apparently
> > the AEM FVP provides full ECV support.
>
> Ok. I've now tested this series on the FVP Base RevC fast model@v8.6 +
> ECV=2. Passes on VHE, fails on nVHE.
>
> I'll respin this series with the fix for nVHE+ECV soon.
>
> --
> Thanks,
> Oliver
>
> >
> > >
> > > Physical counter benchmark
> > > --------------------------
> > >
> > > The following data was collected by running 10000 iterations of the
> > > benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
> > > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > > parameter.
> > >
> > > nVHE
> > > ----
> > >
> > > +--------------------+--------+---------+
> > > |       Metric       | Native | Trapped |
> > > +--------------------+--------+---------+
> > > | Average            | 54ns   | 148ns   |
> > > | Standard Deviation | 124ns  | 122ns   |
> > > | 95th Percentile    | 258ns  | 348ns   |
> > > +--------------------+--------+---------+
> > >
> > > VHE
> > > ---
> > >
> > > +--------------------+--------+---------+
> > > |       Metric       | Native | Trapped |
> > > +--------------------+--------+---------+
> > > | Average            | 53ns   | 152ns   |
> > > | Standard Deviation | 92ns   | 94ns    |
> > > | 95th Percentile    | 204ns  | 307ns   |
> > > +--------------------+--------+---------+
> > >
> > > This series applies cleanly to kvm/queue at the following commit:
> > >
> > > 6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI descriptor")
> > >
> > > v1 -> v2:
> > >   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
> > >   - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
> > >   - Changed the arm64 implementation to broadcast counter
> > >     offset values to all vCPUs in a guest. This upholds the
> > >     architectural expectations of a consistent counter-timer across CPUs.
> > >   - Fixed a bug with traps in VHE mode. We now configure traps on every
> > >     transition into a guest to handle differing VMs (trapped, emulated).
> > >
> > > v2 -> v3:
> > >   - Added documentation for additions to KVM_{GET,SET}_CLOCK
> > >   - Added documentation for all new vCPU attributes
> > >   - Added documentation for suggested algorithm to migrate a guest's
> > >     TSC(s)
> > >   - Bug fixes throughout series
> > >   - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME
> > >
> > > v3 -> v4:
> > >   - Added patch to address incorrect device helper assertions (Drew)
> > >   - Carried Drew's r-b tags where appropriate
> > >   - x86 selftest cleanup
> > >   - Removed stale kvm_timer_init_vhe() function
> > >   - Removed unnecessary GUEST_DONE() from selftests
> > >
> > > v4 -> v5:
> > >   - Fix typo in TSC migration algorithm
> > >   - Carry more of Drew's r-b tags
> > >   - clean up run loop logic in counter emulation benchmark (missed from
> > >     Drew's comments on v3)
> > >
> > > v5 -> v6:
> > >   - Add fix for race in KVM_GET_CLOCK (Sean)
> > >   - Fix 32-bit build issues in series + use of uninitialized host tsc
> > >     value (Sean)
> > >   - General style cleanups
> > >   - Rework ARM virtual counter offsetting to match guest behavior. Use
> > >     the ONE_REG interface instead of a VM attribute (Marc)
> > >   - Maintain a single host-guest counter offset, which applies to both
> > >     physical and virtual counters
> > >   - Dropped some of Drew's r-b tags due to nontrivial patch changes
> > >     (sorry for the churn!)
> > >
> > > v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oupton@google.com/
> > > v2: https://lore.kernel.org/r/20210716212629.2232756-1-oupton@google.com
> > > v3: https://lore.kernel.org/r/20210719184949.1385910-1-oupton@google.com
> > > v4: https://lore.kernel.org/r/20210729001012.70394-1-oupton@google.com
> > > v5: https://lore.kernel.org/r/20210729173300.181775-1-oupton@google.com
> > >
> > > Oliver Upton (21):
> > >   KVM: x86: Fix potential race in KVM_GET_CLOCK
> > >   KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
> > >   KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
> > >   KVM: x86: Refactor tsc synchronization code
> > >   KVM: x86: Expose TSC offset controls to userspace
> > >   tools: arch: x86: pull in pvclock headers
> > >   selftests: KVM: Add test for KVM_{GET,SET}_CLOCK
> > >   selftests: KVM: Fix kvm device helper ioctl assertions
> > >   selftests: KVM: Add helpers for vCPU device attributes
> > >   selftests: KVM: Introduce system counter offset test
> > >   KVM: arm64: Refactor update_vtimer_cntvoff()
> > >   KVM: arm64: Separate guest/host counter offset values
> > >   KVM: arm64: Allow userspace to configure a vCPU's virtual offset
> > >   selftests: KVM: Add helper to check for register presence
> > >   selftests: KVM: Add support for aarch64 to system_counter_offset_test
> > >   arm64: cpufeature: Enumerate support for Enhanced Counter
> > >     Virtualization
> > >   KVM: arm64: Allow userspace to configure a guest's counter-timer
> > >     offset
> > >   KVM: arm64: Configure timer traps in vcpu_load() for VHE
> > >   KVM: arm64: Emulate physical counter offsetting on non-ECV systems
> > >   selftests: KVM: Test physical counter offsetting
> > >   selftests: KVM: Add counter emulation benchmark
> > >
> > >  Documentation/virt/kvm/api.rst                |  52 ++-
> > >  Documentation/virt/kvm/devices/vcpu.rst       |  85 ++++
> > >  Documentation/virt/kvm/locking.rst            |  11 +
> > >  arch/arm64/include/asm/kvm_asm.h              |   2 +
> > >  arch/arm64/include/asm/sysreg.h               |   5 +
> > >  arch/arm64/include/uapi/asm/kvm.h             |   2 +
> > >  arch/arm64/kernel/cpufeature.c                |  10 +
> > >  arch/arm64/kvm/arch_timer.c                   | 224 ++++++++++-
> > >  arch/arm64/kvm/arm.c                          |   4 +-
> > >  arch/arm64/kvm/guest.c                        |   6 +-
> > >  arch/arm64/kvm/hyp/include/hyp/switch.h       |  29 ++
> > >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   6 +
> > >  arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
> > >  arch/arm64/kvm/hyp/vhe/timer-sr.c             |   5 +
> > >  arch/arm64/tools/cpucaps                      |   1 +
> > >  arch/x86/include/asm/kvm_host.h               |   4 +
> > >  arch/x86/include/uapi/asm/kvm.h               |   4 +
> > >  arch/x86/kvm/x86.c                            | 364 +++++++++++++-----
> > >  include/clocksource/arm_arch_timer.h          |   1 +
> > >  include/kvm/arm_arch_timer.h                  |   6 +-
> > >  include/uapi/linux/kvm.h                      |   7 +-
> > >  tools/arch/x86/include/asm/pvclock-abi.h      |  48 +++
> > >  tools/arch/x86/include/asm/pvclock.h          | 103 +++++
> > >  tools/testing/selftests/kvm/.gitignore        |   3 +
> > >  tools/testing/selftests/kvm/Makefile          |   4 +
> > >  .../kvm/aarch64/counter_emulation_benchmark.c | 207 ++++++++++
> > >  .../selftests/kvm/include/aarch64/processor.h |  24 ++
> > >  .../testing/selftests/kvm/include/kvm_util.h  |  13 +
> > >  tools/testing/selftests/kvm/lib/kvm_util.c    |  63 ++-
> > >  .../kvm/system_counter_offset_test.c          | 211 ++++++++++
> > >  .../selftests/kvm/x86_64/kvm_clock_test.c     | 204 ++++++++++
> > >  31 files changed, 1581 insertions(+), 143 deletions(-)
> > >  create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h
> > >  create mode 100644 tools/arch/x86/include/asm/pvclock.h
> > >  create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
> > >  create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c
> > >  create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
> > >
> > > --
> > > 2.32.0.605.g8dce9f2422-goog
> > >
Marc Zyngier Aug. 10, 2021, 12:30 p.m. UTC | #4
On Tue, 10 Aug 2021 01:04:38 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 4, 2021 at 3:03 PM Oliver Upton <oupton@google.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 4:05 AM Oliver Upton <oupton@google.com> wrote:
> > >
> > > On Wed, Aug 4, 2021 at 1:58 AM Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > KVM's current means of saving/restoring system counters is plagued with
> > > > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > > > system counter by-value through the respective guest system register
> > > > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > > > brittle as the state is not idempotent: the host system counter is still
> > > > oscillating between the attempted save and restore. Furthermore, VMMs
> > > > may wish to transparently live migrate guest VMs, meaning that they
> > > > include the elapsed time due to live migration blackout in the guest
> > > > system counter view. The VMM thread could be preempted for any number of
> > > > reasons (scheduler, L0 hypervisor under nested) between the time that
> > > > it calculates the desired guest counter value and when KVM actually sets
> > > > this counter state.
> > > >
> > > > Despite the value-based interface that we present to userspace, KVM
> > > > actually has idempotent guest controls by way of system counter offsets.
> > > > We can avoid all of the issues associated with a value-based interface
> > > > by abstracting these offset controls in new ioctls. This series
> > > > introduces new vCPU device attributes to provide userspace access to the
> > > > vCPU's system counter offset.
> > > >
> > > > Patch 1 addresses a possible race in KVM_GET_CLOCK where
> > > > use_master_clock is read outside of the pvclock_gtod_sync_lock.
> > > >
> > > > Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> > > > ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> > > > essential for a VMM to perform precise migration of the guest's system
> > > > counters.
> > > >
> > > > Patches 3-4 are some preparatory changes for exposing the TSC offset to
> > > > userspace. Patch 5 provides a vCPU attribute to provide userspace access
> > > > to the TSC offset.
> > > >
> > > > Patches 6-7 implement a test for the new additions to
> > > > KVM_{GET,SET}_CLOCK.
> > > >
> > > > Patch 8 fixes some assertions in the kvm device attribute helpers.
> > > >
> > > > Patches 9-10 implement at test for the tsc offset attribute introduced in
> > > > patch 5.
> 
> Paolo,
> 
> Is there anything else you're waiting to see for the x86 portion of
> this series after addressing Sean's comments? There's some work
> remaining on the arm64 side, though I believe the two architectures
> are now disjoint for this series.

I think at this stage it would make sense to split the series in two
and drive them independently.

Thanks,

	M.
Paolo Bonzini Aug. 11, 2021, 1:05 p.m. UTC | #5
On 04/08/21 10:57, Oliver Upton wrote:
> KVM's current means of saving/restoring system counters is plagued with
> temporal issues. At least on ARM64 and x86, we migrate the guest's
> system counter by-value through the respective guest system register
> values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> brittle as the state is not idempotent: the host system counter is still
> oscillating between the attempted save and restore. Furthermore, VMMs
> may wish to transparently live migrate guest VMs, meaning that they
> include the elapsed time due to live migration blackout in the guest
> system counter view. The VMM thread could be preempted for any number of
> reasons (scheduler, L0 hypervisor under nested) between the time that
> it calculates the desired guest counter value and when KVM actually sets
> this counter state.
> 
> Despite the value-based interface that we present to userspace, KVM
> actually has idempotent guest controls by way of system counter offsets.
> We can avoid all of the issues associated with a value-based interface
> by abstracting these offset controls in new ioctls. This series
> introduces new vCPU device attributes to provide userspace access to the
> vCPU's system counter offset.
> 
> Patch 1 addresses a possible race in KVM_GET_CLOCK where
> use_master_clock is read outside of the pvclock_gtod_sync_lock.
> 
> Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> essential for a VMM to perform precise migration of the guest's system
> counters.
> 
> Patches 3-4 are some preparatory changes for exposing the TSC offset to
> userspace. Patch 5 provides a vCPU attribute to provide userspace access
> to the TSC offset.
> 
> Patches 6-7 implement a test for the new additions to
> KVM_{GET,SET}_CLOCK.
> 
> Patch 8 fixes some assertions in the kvm device attribute helpers.
> 
> Patches 9-10 implement at test for the tsc offset attribute introduced in
> patch 5.

The x86 parts look good, except that patch 3 is a bit redundant with my 
idea of altogether getting rid of the pvclock_gtod_sync_lock.  That said 
I agree that patches 1 and 2 (and extracting kvm_vm_ioctl_get_clock and 
kvm_vm_ioctl_set_clock) should be done before whatever locking changes 
have to be done.

Time is ticking for 5.15 due to my vacation, I'll see if I have some 
time to look at it further next week.

I agree that arm64 can be done separately from x86.

Paolo
Oliver Upton Aug. 11, 2021, 6:56 p.m. UTC | #6
On Wed, Aug 11, 2021 at 6:05 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 04/08/21 10:57, Oliver Upton wrote:
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces new vCPU device attributes to provide userspace access to the
> > vCPU's system counter offset.
> >
> > Patch 1 addresses a possible race in KVM_GET_CLOCK where
> > use_master_clock is read outside of the pvclock_gtod_sync_lock.
> >
> > Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> > ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> > essential for a VMM to perform precise migration of the guest's system
> > counters.
> >
> > Patches 3-4 are some preparatory changes for exposing the TSC offset to
> > userspace. Patch 5 provides a vCPU attribute to provide userspace access
> > to the TSC offset.
> >
> > Patches 6-7 implement a test for the new additions to
> > KVM_{GET,SET}_CLOCK.
> >
> > Patch 8 fixes some assertions in the kvm device attribute helpers.
> >
> > Patches 9-10 implement at test for the tsc offset attribute introduced in
> > patch 5.
>
> The x86 parts look good, except that patch 3 is a bit redundant with my
> idea of altogether getting rid of the pvclock_gtod_sync_lock.  That said
> I agree that patches 1 and 2 (and extracting kvm_vm_ioctl_get_clock and
> kvm_vm_ioctl_set_clock) should be done before whatever locking changes
> have to be done.

Following up on patch 3.

> Time is ticking for 5.15 due to my vacation, I'll see if I have some
> time to look at it further next week.
>
> I agree that arm64 can be done separately from x86.

Marc, just a disclaimer:

I'm going to separate these two series, although there will still
exist dependencies in the selftests changes. Otherwise, kernel changes
are disjoint.

--
Thanks,
Oliver
Marc Zyngier Aug. 11, 2021, 7:01 p.m. UTC | #7
On Wed, 11 Aug 2021 19:56:22 +0100,
Oliver Upton <oupton@google.com> wrote:

[...]

> > Time is ticking for 5.15 due to my vacation, I'll see if I have some
> > time to look at it further next week.
> >
> > I agree that arm64 can be done separately from x86.
> 
> Marc, just a disclaimer:
> 
> I'm going to separate these two series, although there will still
> exist dependencies in the selftests changes. Otherwise, kernel changes
> are disjoint.

No problem. The selftests can even sit in a third series if that makes
it easier.

Thanks,

	M.