Message ID | 20240522001817.619072-21-dwmw2@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cleaning up the KVM clock mess | expand |
On 22/05/2024 01:17, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the > time spent in the previous runstate is accounted. This is based on the > delta between the current KVM clock time, and the previous value stored > in vcpu->arch.xen.runstate_entry_time. > > If the KVM clock goes backwards, that delta will be negative. Or, since > it's an unsigned 64-bit integer, very *large*. Linux guests deal with > that particularly badly, reporting 100% steal time for ever more (well, > for *centuries* at least, until the delta has been consumed). > > So when a negative delta is detected, just refrain from updating the > runstates until the KVM clock catches up with runstate_entry_time again. > > The userspace APIs for setting the runstate times do not allow them to > be set past the current KVM clock, but userspace can still adjust the > KVM clock *after* setting the runstate times, which would cause this > situation to occur. > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > arch/x86/kvm/xen.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > Reviewed-by: Paul Durrant <paul@xen.org>
On Wed, May 22, 2024, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the > time spent in the previous runstate is accounted. This is based on the > delta between the current KVM clock time, and the previous value stored > in vcpu->arch.xen.runstate_entry_time. > > If the KVM clock goes backwards, that delta will be negative. Or, since > it's an unsigned 64-bit integer, very *large*. Linux guests deal with > that particularly badly, reporting 100% steal time for ever more (well, > for *centuries* at least, until the delta has been consumed). > > So when a negative delta is detected, just refrain from updating the > runstates until the KVM clock catches up with runstate_entry_time again. > > The userspace APIs for setting the runstate times do not allow them to > be set past the current KVM clock, but userspace can still adjust the > KVM clock *after* setting the runstate times, which would cause this > situation to occur. > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > arch/x86/kvm/xen.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c > index 014048c22652..3d4111de4472 100644 > --- a/arch/x86/kvm/xen.c > +++ b/arch/x86/kvm/xen.c > @@ -538,24 +538,34 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) > { > struct kvm_vcpu_xen *vx = &v->arch.xen; > u64 now = get_kvmclock_ns(v->kvm); > - u64 delta_ns = now - vx->runstate_entry_time; > u64 run_delay = current->sched_info.run_delay; > + s64 delta_ns = now - vx->runstate_entry_time; > + s64 steal_ns = run_delay - vx->last_steal; > > if (unlikely(!vx->runstate_entry_time)) > vx->current_runstate = RUNSTATE_offline; > > + vx->last_steal = run_delay; > + > + /* > + * If KVM clock time went backwards, stop updating until it > + * catches up (or the runstates are reset by userspace). > + */ I take it this is a legitimate scenario where userpace sets KVM clock and then the runstates, and KVM needs to lend a hand because userspace can't do those two things atomically? > + if (delta_ns < 0) > + return; > + > /* > * Time waiting for the scheduler isn't "stolen" if the > * vCPU wasn't running anyway. > */ > - if (vx->current_runstate == RUNSTATE_running) { > - u64 steal_ns = run_delay - vx->last_steal; > + if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) { > + if (steal_ns > delta_ns) > + steal_ns = delta_ns; > > delta_ns -= steal_ns; > > vx->runstate_times[RUNSTATE_runnable] += steal_ns; > } > - vx->last_steal = run_delay; > > vx->runstate_times[vx->current_runstate] += delta_ns; > vx->current_runstate = state; > -- > 2.44.0 >
On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote: > > + vx->last_steal = run_delay; > > + > > + /* > > + * If KVM clock time went backwards, stop updating until it > > + * catches up (or the runstates are reset by userspace). > > + */ > > I take it this is a legitimate scenario where userpace sets KVM clock and then > the runstates, and KVM needs to lend a hand because userspace can't do those two > things atomically? Indeed. Will update the comment to make that more obvious. Thanks for the rest of the review on this series. I'll go through in detail and update it, hopefully this week.
On Tue, 20 Aug 2024 11:22:31 +0100 David Woodhouse <dwmw2@infradead.org> wrote: > On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote: > > > + vx->last_steal = run_delay; > > > + > > > + /* > > > + * If KVM clock time went backwards, stop updating until it > > > + * catches up (or the runstates are reset by userspace). > > > + */ > > > > I take it this is a legitimate scenario where userpace sets KVM clock and then > > the runstates, and KVM needs to lend a hand because userspace can't do those two > > things atomically? > > Indeed. Will update the comment to make that more obvious. > > Thanks for the rest of the review on this series. I'll go through in > detail and update it, hopefully this week. Hmm, is this related at all to this: https://lore.kernel.org/all/20240806111157.1336532-1-suleiman@google.com/ -- Steve
On Tue, 2024-08-20 at 11:08 -0400, Steven Rostedt wrote: > On Tue, 20 Aug 2024 11:22:31 +0100 > David Woodhouse <dwmw2@infradead.org> wrote: > > > On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote: > > > > + vx->last_steal = run_delay; > > > > + > > > > + /* > > > > + * If KVM clock time went backwards, stop updating > > > > until it > > > > + * catches up (or the runstates are reset by > > > > userspace). > > > > + */ > > > > > > I take it this is a legitimate scenario where userpace sets KVM > > > clock and then > > > the runstates, and KVM needs to lend a hand because userspace > > > can't do those two > > > things atomically? > > > > Indeed. Will update the comment to make that more obvious. > > > > Thanks for the rest of the review on this series. I'll go through > > in > > detail and update it, hopefully this week. > > Hmm, is this related at all to this: > > > https://lore.kernel.org/all/20240806111157.1336532-1-suleiman@google.com/ No, but patch 21 in this same series is.
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 014048c22652..3d4111de4472 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -538,24 +538,34 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) { struct kvm_vcpu_xen *vx = &v->arch.xen; u64 now = get_kvmclock_ns(v->kvm); - u64 delta_ns = now - vx->runstate_entry_time; u64 run_delay = current->sched_info.run_delay; + s64 delta_ns = now - vx->runstate_entry_time; + s64 steal_ns = run_delay - vx->last_steal; if (unlikely(!vx->runstate_entry_time)) vx->current_runstate = RUNSTATE_offline; + vx->last_steal = run_delay; + + /* + * If KVM clock time went backwards, stop updating until it + * catches up (or the runstates are reset by userspace). + */ + if (delta_ns < 0) + return; + /* * Time waiting for the scheduler isn't "stolen" if the * vCPU wasn't running anyway. */ - if (vx->current_runstate == RUNSTATE_running) { - u64 steal_ns = run_delay - vx->last_steal; + if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) { + if (steal_ns > delta_ns) + steal_ns = delta_ns; delta_ns -= steal_ns; vx->runstate_times[RUNSTATE_runnable] += steal_ns; } - vx->last_steal = run_delay; vx->runstate_times[vx->current_runstate] += delta_ns; vx->current_runstate = state;