From patchwork Wed Aug 4 08:57:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12418165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCB5DC4338F for ; Wed, 4 Aug 2021 09:01:26 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8601060E52 for ; Wed, 4 Aug 2021 09:01:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8601060E52 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Mime-Version: Message-Id:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=8hXeZ6TNetCASTQG3XYhOCkusO7vsX3G5mllZjs8jAc=; b=mOf D5Yf2MJy+NY+dBRE6vpkzjpJ97IztFY8ohlACcEAFEJQYjcp52AzXU4ZtCBZDXaJK3p2Bb+SJAbcM qmVBR8G2q7ilVyulKfawoQry0+etZfCdwIYvTMSfnjaaLaOXZQlPPlLqRZTBh1Y2lGQO1iWu+ZhtX 2o/QvShrS+Xn+hY82igbuEMOWUzFHoFdeQGdMczeqtPweeAdHvCM9Blc+AYAD2BauJMdzsmHwHdeT Fxm9yf9+oYgV0f3znS4+2uZ+qTQmx+oI1rPSS4uww/HAwif2QmGgHlUJaqrHqoxlfw8WGyhmFr7k0 +CTn+uGoJzdt8kqrYzCAT1owowsjSEA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mBCjl-005Fd9-3Y; Wed, 04 Aug 2021 08:58:53 +0000 Received: from mail-io1-xd4a.google.com ([2607:f8b0:4864:20::d4a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mBCjV-005FXX-Rl for linux-arm-kernel@lists.infradead.org; Wed, 04 Aug 2021 08:58:40 +0000 Received: by mail-io1-xd4a.google.com with SMTP id k20-20020a6b6f140000b029053817be16cdso1075653ioc.3 for ; Wed, 04 Aug 2021 01:58:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=8tfOjPF2Y6IfI3Z9GhekH9/XT0xoqmaTWwT7GQXgwvk=; b=v2BJrO6Zmjnmh+MqRbPqEcSloH2Nldpo9k4EBrbtWcnCSfrmhBMR2oEsC9GXGY1gSZ aNSMYh22GU86trgMxkhEA5ktRHeFX6mHLU3T0dQzCRL329Xw0MYYat0H7XXscIXvM0ib 5PcFjaMjISeo14eyDBaBjGBld6ybN5pXNvpvSuWzMzrbFvStkrKPbWtLWKgeBa++18Yu DmXUR/Iv9n6SCXUUISFQADJTXfJl8sKD2gignJY3HhUtSFDwgmsmLW1coLuHWhqSa2fE qfgKasq2y4n3HefJp1u/X79wjhkFJGbje2BSDi8YsbIWVaM97clfm4FSOnSuQgy/uRmT PHLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=8tfOjPF2Y6IfI3Z9GhekH9/XT0xoqmaTWwT7GQXgwvk=; b=hhTBOIHzTfmX4uJrv7TMu7MQVow6Xt5Dws2JEP1x/2jaqLtIrWss7PLmowiYy1ZQ4A 3pXBP5ge7L8qQiC5kRgSlaeH5y3F6yO7XY41g8BR4B+MRIZ/kM9z647jcOy5atsdluxM 7JfuNz5Lg79r6g2UTRjJi+334Q1E/4F3TEq2/aIXfbW7NZgbUq2H50yfFW9TmGeys3lp 6xmhGzvsDmAC7FUTcv1tFEWJS6qLXp1BY3OL1KA4dvtEcNtUllOrOsOrb2yZ+hZVxNjc b3gVTFYBASvtaEzHwXTurL2mjuZiNrDKBkS4UugRSaC7oakygSjsL15oVwemPPV1HoK1 Qr5Q== X-Gm-Message-State: AOAM530VASWrCXpG1ZSIs1H5WvgXpyA/aaHV+6o0/0pYc3Se3UnhjWpm jUigXIB1y1mgornBTxbPHrNjxqukncc= X-Google-Smtp-Source: ABdhPJx61cheFyrbodphJ+g/D9Ym3oEIbyL+/keMsAM7KPpd4O1DmkdB1LYYOa3hz+hXJntpKjOFwtDcV/Y= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a02:9a13:: with SMTP id b19mr15227849jal.37.1628067514824; Wed, 04 Aug 2021 01:58:34 -0700 (PDT) Date: Wed, 4 Aug 2021 08:57:58 +0000 Message-Id: <20210804085819.846610-1-oupton@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.32.0.605.g8dce9f2422-goog Subject: [PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210804_015837_959183_4070BC4A X-CRM114-Status: GOOD ( 28.12 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org KVM's current means of saving/restoring system counters is plagued with temporal issues. At least on ARM64 and x86, we migrate the guest's system counter by-value through the respective guest system register values (cntvct_el0, ia32_tsc). Restoring system counters by-value is brittle as the state is not idempotent: the host system counter is still oscillating between the attempted save and restore. Furthermore, VMMs may wish to transparently live migrate guest VMs, meaning that they include the elapsed time due to live migration blackout in the guest system counter view. The VMM thread could be preempted for any number of reasons (scheduler, L0 hypervisor under nested) between the time that it calculates the desired guest counter value and when KVM actually sets this counter state. Despite the value-based interface that we present to userspace, KVM actually has idempotent guest controls by way of system counter offsets. We can avoid all of the issues associated with a value-based interface by abstracting these offset controls in new ioctls. This series introduces new vCPU device attributes to provide userspace access to the vCPU's system counter offset. Patch 1 addresses a possible race in KVM_GET_CLOCK where use_master_clock is read outside of the pvclock_gtod_sync_lock. Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK ioctls to provide userspace with a (host_tsc, realtime) instant. This is essential for a VMM to perform precise migration of the guest's system counters. Patches 3-4 are some preparatory changes for exposing the TSC offset to userspace. Patch 5 provides a vCPU attribute to provide userspace access to the TSC offset. Patches 6-7 implement a test for the new additions to KVM_{GET,SET}_CLOCK. Patch 8 fixes some assertions in the kvm device attribute helpers. Patches 9-10 implement at test for the tsc offset attribute introduced in patch 5. Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2 through the ONE_REG interface. Patches 14-15 add test cases for userspace manipulation of the virtual counter-timer. Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add support for non-ECV hosts by emulating physical counter offsetting. Patch 20 adds test cases for adjusting the host-guest offset, and finally patch 21 adds a test to measure the emulation overhead of CNTPCT_EL2. This series was tested on both an Ampere Mt. Jade and Haswell systems. Unfortunately, the ECV portions of this series are untested, as there is no ECV-capable hardware and the ARM fast models only partially implement ECV. Physical counter benchmark -------------------------- The following data was collected by running 10000 iterations of the benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S machine with 2 80-core Ampere Altra SoCs. Measurements were collected for both VHE and nVHE operation using the `kvm-arm.mode=` command-line parameter. nVHE ---- +--------------------+--------+---------+ | Metric | Native | Trapped | +--------------------+--------+---------+ | Average | 54ns | 148ns | | Standard Deviation | 124ns | 122ns | | 95th Percentile | 258ns | 348ns | +--------------------+--------+---------+ VHE --- +--------------------+--------+---------+ | Metric | Native | Trapped | +--------------------+--------+---------+ | Average | 53ns | 152ns | | Standard Deviation | 92ns | 94ns | | 95th Percentile | 204ns | 307ns | +--------------------+--------+---------+ This series applies cleanly to kvm/queue at the following commit: 6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI descriptor") v1 -> v2: - Reimplemented as vCPU device attributes instead of a distinct ioctl. - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK - Changed the arm64 implementation to broadcast counter offset values to all vCPUs in a guest. This upholds the architectural expectations of a consistent counter-timer across CPUs. - Fixed a bug with traps in VHE mode. We now configure traps on every transition into a guest to handle differing VMs (trapped, emulated). v2 -> v3: - Added documentation for additions to KVM_{GET,SET}_CLOCK - Added documentation for all new vCPU attributes - Added documentation for suggested algorithm to migrate a guest's TSC(s) - Bug fixes throughout series - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME v3 -> v4: - Added patch to address incorrect device helper assertions (Drew) - Carried Drew's r-b tags where appropriate - x86 selftest cleanup - Removed stale kvm_timer_init_vhe() function - Removed unnecessary GUEST_DONE() from selftests v4 -> v5: - Fix typo in TSC migration algorithm - Carry more of Drew's r-b tags - clean up run loop logic in counter emulation benchmark (missed from Drew's comments on v3) v5 -> v6: - Add fix for race in KVM_GET_CLOCK (Sean) - Fix 32-bit build issues in series + use of uninitialized host tsc value (Sean) - General style cleanups - Rework ARM virtual counter offsetting to match guest behavior. Use the ONE_REG interface instead of a VM attribute (Marc) - Maintain a single host-guest counter offset, which applies to both physical and virtual counters - Dropped some of Drew's r-b tags due to nontrivial patch changes (sorry for the churn!) v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oupton@google.com/ v2: https://lore.kernel.org/r/20210716212629.2232756-1-oupton@google.com v3: https://lore.kernel.org/r/20210719184949.1385910-1-oupton@google.com v4: https://lore.kernel.org/r/20210729001012.70394-1-oupton@google.com v5: https://lore.kernel.org/r/20210729173300.181775-1-oupton@google.com Oliver Upton (21): KVM: x86: Fix potential race in KVM_GET_CLOCK KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK KVM: x86: Take the pvclock sync lock behind the tsc_write_lock KVM: x86: Refactor tsc synchronization code KVM: x86: Expose TSC offset controls to userspace tools: arch: x86: pull in pvclock headers selftests: KVM: Add test for KVM_{GET,SET}_CLOCK selftests: KVM: Fix kvm device helper ioctl assertions selftests: KVM: Add helpers for vCPU device attributes selftests: KVM: Introduce system counter offset test KVM: arm64: Refactor update_vtimer_cntvoff() KVM: arm64: Separate guest/host counter offset values KVM: arm64: Allow userspace to configure a vCPU's virtual offset selftests: KVM: Add helper to check for register presence selftests: KVM: Add support for aarch64 to system_counter_offset_test arm64: cpufeature: Enumerate support for Enhanced Counter Virtualization KVM: arm64: Allow userspace to configure a guest's counter-timer offset KVM: arm64: Configure timer traps in vcpu_load() for VHE KVM: arm64: Emulate physical counter offsetting on non-ECV systems selftests: KVM: Test physical counter offsetting selftests: KVM: Add counter emulation benchmark Documentation/virt/kvm/api.rst | 52 ++- Documentation/virt/kvm/devices/vcpu.rst | 85 ++++ Documentation/virt/kvm/locking.rst | 11 + arch/arm64/include/asm/kvm_asm.h | 2 + arch/arm64/include/asm/sysreg.h | 5 + arch/arm64/include/uapi/asm/kvm.h | 2 + arch/arm64/kernel/cpufeature.c | 10 + arch/arm64/kvm/arch_timer.c | 224 ++++++++++- arch/arm64/kvm/arm.c | 4 +- arch/arm64/kvm/guest.c | 6 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 29 ++ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 + arch/arm64/kvm/hyp/nvhe/timer-sr.c | 16 +- arch/arm64/kvm/hyp/vhe/timer-sr.c | 5 + arch/arm64/tools/cpucaps | 1 + arch/x86/include/asm/kvm_host.h | 4 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 364 +++++++++++++----- include/clocksource/arm_arch_timer.h | 1 + include/kvm/arm_arch_timer.h | 6 +- include/uapi/linux/kvm.h | 7 +- tools/arch/x86/include/asm/pvclock-abi.h | 48 +++ tools/arch/x86/include/asm/pvclock.h | 103 +++++ tools/testing/selftests/kvm/.gitignore | 3 + tools/testing/selftests/kvm/Makefile | 4 + .../kvm/aarch64/counter_emulation_benchmark.c | 207 ++++++++++ .../selftests/kvm/include/aarch64/processor.h | 24 ++ .../testing/selftests/kvm/include/kvm_util.h | 13 + tools/testing/selftests/kvm/lib/kvm_util.c | 63 ++- .../kvm/system_counter_offset_test.c | 211 ++++++++++ .../selftests/kvm/x86_64/kvm_clock_test.c | 204 ++++++++++ 31 files changed, 1581 insertions(+), 143 deletions(-) create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h create mode 100644 tools/arch/x86/include/asm/pvclock.h create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c