diff mbox

kvm: x86: add trace event for pvclock updates

Message ID 1415216802-19201-1-git-send-email-dmatlack@google.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Matlack Nov. 5, 2014, 7:46 p.m. UTC
The new trace event records:
  * the id of vcpu being updated
  * the pvclock_vcpu_time_info struct being written to guest memory

This is useful for debugging pvclock bugs, such as the bug fixed by
"[PATCH] kvm: x86: Fix kvm clock versioning.".

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/trace.h | 37 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c   |  2 ++
 2 files changed, 39 insertions(+)

Comments

Paolo Bonzini Nov. 6, 2014, 10:53 a.m. UTC | #1
On 05/11/2014 20:46, David Matlack wrote:
> The new trace event records:
>   * the id of vcpu being updated
>   * the pvclock_vcpu_time_info struct being written to guest memory
> 
> This is useful for debugging pvclock bugs, such as the bug fixed by
> "[PATCH] kvm: x86: Fix kvm clock versioning.".
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/trace.h | 37 +++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c   |  2 ++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 6b06ab8..c2a34bb 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -5,6 +5,7 @@
>  #include <asm/vmx.h>
>  #include <asm/svm.h>
>  #include <asm/clocksource.h>
> +#include <asm/pvclock-abi.h>
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM kvm
> @@ -877,6 +878,42 @@ TRACE_EVENT(kvm_ple_window,
>  #define trace_kvm_ple_window_shrink(vcpu_id, new, old) \
>  	trace_kvm_ple_window(false, vcpu_id, new, old)
>  
> +TRACE_EVENT(kvm_pvclock_update,
> +	TP_PROTO(unsigned int vcpu_id, struct pvclock_vcpu_time_info *pvclock),
> +	TP_ARGS(vcpu_id, pvclock),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	vcpu_id			)
> +		__field(	__u32,		version			)
> +		__field(	__u64,		tsc_timestamp		)
> +		__field(	__u64,		system_time		)
> +		__field(	__u32,		tsc_to_system_mul	)
> +		__field(	__s8,		tsc_shift		)
> +		__field(	__u8,		flags			)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_id	   = vcpu_id;
> +		__entry->version	   = pvclock->version;
> +		__entry->tsc_timestamp	   = pvclock->tsc_timestamp;
> +		__entry->system_time	   = pvclock->system_time;
> +		__entry->tsc_to_system_mul = pvclock->tsc_to_system_mul;
> +		__entry->tsc_shift	   = pvclock->tsc_shift;
> +		__entry->flags		   = pvclock->flags;
> +	),
> +
> +	TP_printk("vcpu_id %u, pvclock { version %u, tsc_timestamp 0x%llx, "
> +		  "system_time 0x%llx, tsc_to_system_mul 0x%x, tsc_shift %d, "
> +		  "flags 0x%x }",
> +		  __entry->vcpu_id,
> +		  __entry->version,
> +		  __entry->tsc_timestamp,
> +		  __entry->system_time,
> +		  __entry->tsc_to_system_mul,
> +		  __entry->tsc_shift,
> +		  __entry->flags)
> +);
> +
>  #endif /* _TRACE_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0033df3..0f533df 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1662,6 +1662,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  
>  	vcpu->hv_clock.flags = pvclock_flags;
>  
> +	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> +
>  	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
>  				&vcpu->hv_clock,
>  				sizeof(vcpu->hv_clock));
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti Nov. 11, 2014, 1:18 a.m. UTC | #2
On Wed, Nov 05, 2014 at 11:46:42AM -0800, David Matlack wrote:
> The new trace event records:
>   * the id of vcpu being updated
>   * the pvclock_vcpu_time_info struct being written to guest memory
> 
> This is useful for debugging pvclock bugs, such as the bug fixed by
> "[PATCH] kvm: x86: Fix kvm clock versioning.".
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

So you actually hit that bug in practice? Can you describe the
scenario?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Matlack Nov. 12, 2014, 6 p.m. UTC | #3
On 11/10 11:18 PM, Marcelo Tosatti wrote:
> On Wed, Nov 05, 2014 at 11:46:42AM -0800, David Matlack wrote:
> > The new trace event records:
> >   * the id of vcpu being updated
> >   * the pvclock_vcpu_time_info struct being written to guest memory
> > 
> > This is useful for debugging pvclock bugs, such as the bug fixed by
> > "[PATCH] kvm: x86: Fix kvm clock versioning.".
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> 
> So you actually hit that bug in practice? Can you describe the
> scenario?

We noticed guests running stress workloads would occasionally get stuck 
on the far side of a save/restore. Inspecting the guest state revealed
arch/x86/kernel/pvclock.c:last_value was stuck at a value like
8020566108469899263, despite TSC and pvclock looking sane.

Since these guests ran without PVCLOCK_TSC_STABLE_BIT set in their
CPUID, they were stuck with this large time value until real time
caught up (in about 250 years :).

We've been unable to reproduce the bug with "kvm: x86: Fix kvm clock
versioning." so we didn't invest in catching the overflow in the act,
but a likely explanation is the guest gets save/restore-ed while
computing the pvclock delta:

        u64 delta = __native_read_tsc() - src->tsc_timestamp;

causing the subtraction to underflow and delta to be huge.

> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 6b06ab8..c2a34bb 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -5,6 +5,7 @@ 
 #include <asm/vmx.h>
 #include <asm/svm.h>
 #include <asm/clocksource.h>
+#include <asm/pvclock-abi.h>
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
@@ -877,6 +878,42 @@  TRACE_EVENT(kvm_ple_window,
 #define trace_kvm_ple_window_shrink(vcpu_id, new, old) \
 	trace_kvm_ple_window(false, vcpu_id, new, old)
 
+TRACE_EVENT(kvm_pvclock_update,
+	TP_PROTO(unsigned int vcpu_id, struct pvclock_vcpu_time_info *pvclock),
+	TP_ARGS(vcpu_id, pvclock),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id			)
+		__field(	__u32,		version			)
+		__field(	__u64,		tsc_timestamp		)
+		__field(	__u64,		system_time		)
+		__field(	__u32,		tsc_to_system_mul	)
+		__field(	__s8,		tsc_shift		)
+		__field(	__u8,		flags			)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	   = vcpu_id;
+		__entry->version	   = pvclock->version;
+		__entry->tsc_timestamp	   = pvclock->tsc_timestamp;
+		__entry->system_time	   = pvclock->system_time;
+		__entry->tsc_to_system_mul = pvclock->tsc_to_system_mul;
+		__entry->tsc_shift	   = pvclock->tsc_shift;
+		__entry->flags		   = pvclock->flags;
+	),
+
+	TP_printk("vcpu_id %u, pvclock { version %u, tsc_timestamp 0x%llx, "
+		  "system_time 0x%llx, tsc_to_system_mul 0x%x, tsc_shift %d, "
+		  "flags 0x%x }",
+		  __entry->vcpu_id,
+		  __entry->version,
+		  __entry->tsc_timestamp,
+		  __entry->system_time,
+		  __entry->tsc_to_system_mul,
+		  __entry->tsc_shift,
+		  __entry->flags)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0033df3..0f533df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1662,6 +1662,8 @@  static int kvm_guest_time_update(struct kvm_vcpu *v)
 
 	vcpu->hv_clock.flags = pvclock_flags;
 
+	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
+
 	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
 				&vcpu->hv_clock,
 				sizeof(vcpu->hv_clock));