From patchwork Wed May 22 00:17:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 13669842 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5253F2595; Wed, 22 May 2024 00:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716337118; cv=none; b=gPIJ3OxwGyQcLlmHM497UMwSqt1Fs5K12V3PTRROfRfYojZzhoJlBXj8CLXXGFQlStmz0yudFt0sA+mfMKUQBHDMohicm/bkvvgA9MtS1Mm1MRkHbhsdfiM2pRAPN7Bi1KGL12W4g3cMa7J6pGGKG619Fp9eRW60g+tkRZMcb3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716337118; c=relaxed/simple; bh=+Hq2uw0ZiCJvZgejSs2T5jnEENChFImfxm55VF9PvqY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BBGp1atHuG5ur54HvE3aJtikEm/QjwmmSeIjYc+qPaqiKW/nUrutOeD0lQOPE26rx3/6YsdK7pQmyJfgMph/lfqcRfXzKk6oKeuHkGCNNAw8apF3JReIz96kLWr/nwrvTzu3LYxgToWnvNZ6gmWOjYWjVAG5YSNru+kkAwLvTuU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=UDNk1zPq; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="UDNk1zPq" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=emrPkVDjyCI61l9vFnKZcwwq0GnteVpJUYbCEB1MhmI=; b=UDNk1zPqeyjrbjVxVZMBKzUQTG szGOHqyNiQp7Wvb1FkWwzNPv5MAdR2CGveIg7bjKPsX+HbBvmwhj++ocPT6QjnuhUeiCwMzhYEchg yakW2NcH8WR6UvyH8sPvxaRNVx2uYz1fv7J1Pie81aGHOtdc+JT5BLNa9gPAOQLT7V49gy+PrNIRo LYAIgmvrwBzhBavnfxVvwSUjf58jibZYFUgix9tTDe7wK4jnOv1GMyvzoltSnbj21gSFncby8gB45 /BESobMug2r1HdNvr6qG3hIpujC+R4rIL9oTY+iT2jdq4r4q3fcJf9XPq/icgtB10CkjwhExQFNxA uEkka6CQ==; Received: from [2001:8b0:10b:1::ebe] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1s9ZgT-0000000081T-18OM; Wed, 22 May 2024 00:18:21 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1s9ZgS-00000002b5Y-3M9T; Wed, 22 May 2024 01:18:20 +0100 From: David Woodhouse To: kvm@vger.kernel.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com, Dongli Zhang , Chenyi Qiang Subject: [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative Date: Wed, 22 May 2024 01:17:15 +0100 Message-ID: <20240522001817.619072-21-dwmw2@infradead.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240522001817.619072-1-dwmw2@infradead.org> References: <20240522001817.619072-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the time spent in the previous runstate is accounted. This is based on the delta between the current KVM clock time, and the previous value stored in vcpu->arch.xen.runstate_entry_time. If the KVM clock goes backwards, that delta will be negative. Or, since it's an unsigned 64-bit integer, very *large*. Linux guests deal with that particularly badly, reporting 100% steal time for ever more (well, for *centuries* at least, until the delta has been consumed). So when a negative delta is detected, just refrain from updating the runstates until the KVM clock catches up with runstate_entry_time again. The userspace APIs for setting the runstate times do not allow them to be set past the current KVM clock, but userspace can still adjust the KVM clock *after* setting the runstate times, which would cause this situation to occur. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/xen.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 014048c22652..3d4111de4472 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -538,24 +538,34 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) { struct kvm_vcpu_xen *vx = &v->arch.xen; u64 now = get_kvmclock_ns(v->kvm); - u64 delta_ns = now - vx->runstate_entry_time; u64 run_delay = current->sched_info.run_delay; + s64 delta_ns = now - vx->runstate_entry_time; + s64 steal_ns = run_delay - vx->last_steal; if (unlikely(!vx->runstate_entry_time)) vx->current_runstate = RUNSTATE_offline; + vx->last_steal = run_delay; + + /* + * If KVM clock time went backwards, stop updating until it + * catches up (or the runstates are reset by userspace). + */ + if (delta_ns < 0) + return; + /* * Time waiting for the scheduler isn't "stolen" if the * vCPU wasn't running anyway. */ - if (vx->current_runstate == RUNSTATE_running) { - u64 steal_ns = run_delay - vx->last_steal; + if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) { + if (steal_ns > delta_ns) + steal_ns = delta_ns; delta_ns -= steal_ns; vx->runstate_times[RUNSTATE_runnable] += steal_ns; } - vx->last_steal = run_delay; vx->runstate_times[vx->current_runstate] += delta_ns; vx->current_runstate = state;