diff mbox series

[RFC,v3,20/21] KVM: x86/xen: Prevent runstate times from becoming negative

Message ID 20240522001817.619072-21-dwmw2@infradead.org (mailing list archive)
State New, archived
Headers show
Series Cleaning up the KVM clock mess | expand

Commit Message

David Woodhouse May 22, 2024, 12:17 a.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
time spent in the previous runstate is accounted. This is based on the
delta between the current KVM clock time, and the previous value stored
in vcpu->arch.xen.runstate_entry_time.

If the KVM clock goes backwards, that delta will be negative. Or, since
it's an unsigned 64-bit integer, very *large*. Linux guests deal with
that particularly badly, reporting 100% steal time for ever more (well,
for *centuries* at least, until the delta has been consumed).

So when a negative delta is detected, just refrain from updating the
runstates until the KVM clock catches up with runstate_entry_time again.

The userspace APIs for setting the runstate times do not allow them to
be set past the current KVM clock, but userspace can still adjust the
KVM clock *after* setting the runstate times, which would cause this
situation to occur.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/xen.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Paul Durrant May 24, 2024, 2:21 p.m. UTC | #1
On 22/05/2024 01:17, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
> time spent in the previous runstate is accounted. This is based on the
> delta between the current KVM clock time, and the previous value stored
> in vcpu->arch.xen.runstate_entry_time.
> 
> If the KVM clock goes backwards, that delta will be negative. Or, since
> it's an unsigned 64-bit integer, very *large*. Linux guests deal with
> that particularly badly, reporting 100% steal time for ever more (well,
> for *centuries* at least, until the delta has been consumed).
> 
> So when a negative delta is detected, just refrain from updating the
> runstates until the KVM clock catches up with runstate_entry_time again.
> 
> The userspace APIs for setting the runstate times do not allow them to
> be set past the current KVM clock, but userspace can still adjust the
> KVM clock *after* setting the runstate times, which would cause this
> situation to occur.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>   arch/x86/kvm/xen.c | 18 ++++++++++++++----
>   1 file changed, 14 insertions(+), 4 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>
Sean Christopherson Aug. 16, 2024, 4:39 a.m. UTC | #2
On Wed, May 22, 2024, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
> time spent in the previous runstate is accounted. This is based on the
> delta between the current KVM clock time, and the previous value stored
> in vcpu->arch.xen.runstate_entry_time.
> 
> If the KVM clock goes backwards, that delta will be negative. Or, since
> it's an unsigned 64-bit integer, very *large*. Linux guests deal with
> that particularly badly, reporting 100% steal time for ever more (well,
> for *centuries* at least, until the delta has been consumed).
> 
> So when a negative delta is detected, just refrain from updating the
> runstates until the KVM clock catches up with runstate_entry_time again.
> 
> The userspace APIs for setting the runstate times do not allow them to
> be set past the current KVM clock, but userspace can still adjust the
> KVM clock *after* setting the runstate times, which would cause this
> situation to occur.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/xen.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 014048c22652..3d4111de4472 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -538,24 +538,34 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
>  {
>  	struct kvm_vcpu_xen *vx = &v->arch.xen;
>  	u64 now = get_kvmclock_ns(v->kvm);
> -	u64 delta_ns = now - vx->runstate_entry_time;
>  	u64 run_delay = current->sched_info.run_delay;
> +	s64 delta_ns = now - vx->runstate_entry_time;
> +	s64 steal_ns = run_delay - vx->last_steal;
>  
>  	if (unlikely(!vx->runstate_entry_time))
>  		vx->current_runstate = RUNSTATE_offline;
>  
> +	vx->last_steal = run_delay;
> +
> +	/*
> +	 * If KVM clock time went backwards, stop updating until it
> +	 * catches up (or the runstates are reset by userspace).
> +	 */

I take it this is a legitimate scenario where userpace sets KVM clock and then
the runstates, and KVM needs to lend a hand because userspace can't do those two
things atomically?

> +	if (delta_ns < 0)
> +		return;
> +
>  	/*
>  	 * Time waiting for the scheduler isn't "stolen" if the
>  	 * vCPU wasn't running anyway.
>  	 */
> -	if (vx->current_runstate == RUNSTATE_running) {
> -		u64 steal_ns = run_delay - vx->last_steal;
> +	if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) {
> +		if (steal_ns > delta_ns)
> +			steal_ns = delta_ns;
>  
>  		delta_ns -= steal_ns;
>  
>  		vx->runstate_times[RUNSTATE_runnable] += steal_ns;
>  	}
> -	vx->last_steal = run_delay;
>  
>  	vx->runstate_times[vx->current_runstate] += delta_ns;
>  	vx->current_runstate = state;
> -- 
> 2.44.0
>
David Woodhouse Aug. 20, 2024, 10:22 a.m. UTC | #3
On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote:
> > +       vx->last_steal = run_delay;
> > +
> > +       /*
> > +        * If KVM clock time went backwards, stop updating until it
> > +        * catches up (or the runstates are reset by userspace).
> > +        */
> 
> I take it this is a legitimate scenario where userpace sets KVM clock and then
> the runstates, and KVM needs to lend a hand because userspace can't do those two
> things atomically?

Indeed. Will update the comment to make that more obvious.

Thanks for the rest of the review on this series. I'll go through in
detail and update it, hopefully this week.
Steven Rostedt Aug. 20, 2024, 3:08 p.m. UTC | #4
On Tue, 20 Aug 2024 11:22:31 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote:
> > > +       vx->last_steal = run_delay;
> > > +
> > > +       /*
> > > +        * If KVM clock time went backwards, stop updating until it
> > > +        * catches up (or the runstates are reset by userspace).
> > > +        */  
> > 
> > I take it this is a legitimate scenario where userpace sets KVM clock and then
> > the runstates, and KVM needs to lend a hand because userspace can't do those two
> > things atomically?  
> 
> Indeed. Will update the comment to make that more obvious.
> 
> Thanks for the rest of the review on this series. I'll go through in
> detail and update it, hopefully this week.

Hmm, is this related at all to this:

  https://lore.kernel.org/all/20240806111157.1336532-1-suleiman@google.com/

-- Steve
David Woodhouse Aug. 20, 2024, 3:42 p.m. UTC | #5
On Tue, 2024-08-20 at 11:08 -0400, Steven Rostedt wrote:
> On Tue, 20 Aug 2024 11:22:31 +0100
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote:
> > > > +       vx->last_steal = run_delay;
> > > > +
> > > > +       /*
> > > > +        * If KVM clock time went backwards, stop updating
> > > > until it
> > > > +        * catches up (or the runstates are reset by
> > > > userspace).
> > > > +        */  
> > > 
> > > I take it this is a legitimate scenario where userpace sets KVM
> > > clock and then
> > > the runstates, and KVM needs to lend a hand because userspace
> > > can't do those two
> > > things atomically?  
> > 
> > Indeed. Will update the comment to make that more obvious.
> > 
> > Thanks for the rest of the review on this series. I'll go through
> > in
> > detail and update it, hopefully this week.
> 
> Hmm, is this related at all to this:
> 
>  
> https://lore.kernel.org/all/20240806111157.1336532-1-suleiman@google.com/

No, but patch 21 in this same series is.
diff mbox series

Patch

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 014048c22652..3d4111de4472 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -538,24 +538,34 @@  void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
 {
 	struct kvm_vcpu_xen *vx = &v->arch.xen;
 	u64 now = get_kvmclock_ns(v->kvm);
-	u64 delta_ns = now - vx->runstate_entry_time;
 	u64 run_delay = current->sched_info.run_delay;
+	s64 delta_ns = now - vx->runstate_entry_time;
+	s64 steal_ns = run_delay - vx->last_steal;
 
 	if (unlikely(!vx->runstate_entry_time))
 		vx->current_runstate = RUNSTATE_offline;
 
+	vx->last_steal = run_delay;
+
+	/*
+	 * If KVM clock time went backwards, stop updating until it
+	 * catches up (or the runstates are reset by userspace).
+	 */
+	if (delta_ns < 0)
+		return;
+
 	/*
 	 * Time waiting for the scheduler isn't "stolen" if the
 	 * vCPU wasn't running anyway.
 	 */
-	if (vx->current_runstate == RUNSTATE_running) {
-		u64 steal_ns = run_delay - vx->last_steal;
+	if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) {
+		if (steal_ns > delta_ns)
+			steal_ns = delta_ns;
 
 		delta_ns -= steal_ns;
 
 		vx->runstate_times[RUNSTATE_runnable] += steal_ns;
 	}
-	vx->last_steal = run_delay;
 
 	vx->runstate_times[vx->current_runstate] += delta_ns;
 	vx->current_runstate = state;