Message ID | 20210916181538.968978-1-oupton@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: Add idempotent controls for migrating system counter state | expand |
On 16/09/21 20:15, Oliver Upton wrote: > KVM's current means of saving/restoring system counters is plagued with > temporal issues. On x86, we migrate the guest's system counter by-value > through the respective guest's IA32_TSC value. Restoring system counters > by-value is brittle as the state is not idempotent: the host system > counter is still oscillating between the attempted save and restore. > Furthermore, VMMs may wish to transparently live migrate guest VMs, > meaning that they include the elapsed time due to live migration blackout > in the guest system counter view. The VMM thread could be preempted for > any number of reasons (scheduler, L0 hypervisor under nested) between the > time that it calculates the desired guest counter value and when > KVM actually sets this counter state. > > Despite the value-based interface that we present to userspace, KVM > actually has idempotent guest controls by way of the TSC offset. > We can avoid all of the issues associated with a value-based interface > by abstracting these offset controls in a new device attribute. This > series introduces new vCPU device attributes to provide userspace access > to the vCPU's system counter offset. > > Patches 1-2 are Paolo's refactorings around locking and the > KVM_{GET,SET}_CLOCK ioctls. > > Patch 3 cures a race where use_master_clock is read outside of the > pvclock lock in the KVM_GET_CLOCK ioctl. > > Patch 4 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK > ioctls to provide userspace with a (host_tsc, realtime) instant. This is > essential for a VMM to perform precise migration of the guest's system > counters. > > Patch 5 does away with the pvclock spin lock in favor of a sequence > lock based on the tsc_write_lock. The original patch is from Paolo, I > touched it up a bit to fix a deadlock and some unused variables that > caused -Werror to scream. > > Patch 6 extracts the TSC synchronization tracking code in a way that it > can be used for both offset-based and value-based TSC synchronization > schemes. > > Finally, patch 7 implements a vCPU device attribute which allows VMMs to > get at the TSC offset of a vCPU. > > This series was tested with the new KVM selftests for the KVM clock and > system counter offset controls on Haswell hardware. Kernel was built > with CONFIG_LOCKDEP given the new locking changes/lockdep assertions > here. > > Note that these tests are mailed as a separate series due to the > dependencies in both x86 and arm64. > > Applies cleanly to 5.15-rc1 > > v8: http://lore.kernel.org/r/20210816001130.3059564-1-oupton@google.com > > v7 -> v8: > - Rebased to 5.15-rc1 > - Picked up Paolo's version of the series, which includes locking > changes > - Make KVM advertise KVM_CAP_VCPU_ATTRIBUTES > > Oliver Upton (4): > KVM: x86: Fix potential race in KVM_GET_CLOCK > KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK > KVM: x86: Refactor tsc synchronization code > KVM: x86: Expose TSC offset controls to userspace > > Paolo Bonzini (3): > kvm: x86: abstract locking around pvclock_update_vm_gtod_copy > KVM: x86: extract KVM_GET_CLOCK/KVM_SET_CLOCK to separate functions > kvm: x86: protect masterclock with a seqcount > > Documentation/virt/kvm/api.rst | 42 ++- > Documentation/virt/kvm/devices/vcpu.rst | 57 +++ > arch/x86/include/asm/kvm_host.h | 12 +- > arch/x86/include/uapi/asm/kvm.h | 4 + > arch/x86/kvm/x86.c | 458 ++++++++++++++++-------- > include/uapi/linux/kvm.h | 7 +- > 6 files changed, 419 insertions(+), 161 deletions(-) > Queued, thanks. Paolo