diff mbox

[V2,1/5] ara virt interface of perf to support kvm guest os statistics collection in guest os

Message ID 1277112680.2096.509.camel@ymzhang.sh.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yanmin Zhang June 21, 2010, 9:31 a.m. UTC
None
diff mbox

Patch

--- linux-2.6_tip0620/Documentation/kvm/paravirt-perf.txt	1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0620perfkvm/Documentation/kvm/paravirt-perf.txt	2010-06-21 15:21:39.312999849 +0800
@@ -0,0 +1,133 @@ 
+The x86 kvm paravirt perf event interface
+===================================
+
+This paravirt interface is responsible for supporting guest os perf event
+collections. If guest os supports this interface, users could run command
+perf in guest os directly.
+
+Design
+========
+
+Guest os calls a series of hypercalls to communicate with host kernel to
+create/enable/disable/close perf events. Host kernel notifies guest os
+by injecting an NMI to guest os when an event overflows. Guets os need
+go through all its active events to check if they overflow, and output
+performance statistics if they do.
+
+ABI
+=====
+
+1) Detect if host kernel supports paravirt perf interface:
+#define KVM_FEATURE_PV_PERF       4
+Host kernel defines above cpuid bit. Guest os calls cpuid to check if host
+os retuns this bit. If it does, it mean host kernel supports paravirt perf
+interface.
+
+2) Open a new event at host side:
+kvm_hypercall3(KVM_PERF_OP, KVM_PERF_OP_OPEN, param_addr_low32bit,
+param_addr_high32bit);
+
+#define KVM_PERF_OP                    3
+/* Operations for KVM_PERF_OP */
+#define KVM_PERF_OP_OPEN                1
+#define KVM_PERF_OP_CLOSE               2
+#define KVM_PERF_OP_ENABLE              3
+#define KVM_PERF_OP_DISABLE             4
+#define KVM_PERF_OP_READ                5
+/*
+ * guest_perf_attr is used when guest calls hypercall to
+ * open a new perf_event at host side. Mostly, it's a copy of
+ * perf_event_attr and deletes something not used by host kernel.
+ */
+struct guest_perf_attr {
+        __u32                   type;
+        __u64                   config;
+        __u64                   sample_period;
+        __u64                   sample_type;
+        __u64                   read_format;
+        __u64                   flags;
+        __u32                   bp_type;
+        __u64                   bp_addr;
+        __u64                   bp_len;
+};
+/*
+ * data communication area about perf_event between
+ * Host kernel and guest kernel
+ */
+struct guest_perf_event {
+        u64 count;
+        atomic_t overflows;
+};
+struct guest_perf_event_param {
+        __u64 attr_addr;
+        __u64 guest_event_addr;
+        /* In case there is an alignment issue, we put id as the last one */
+        int id;
+};
+
+param_addr_low32bit and param_addr_high32bit compose a u64 integer which means
+the physical address of parameter struct guest_perf_event_param.
+struct guest_perf_event_param consists of 3 members. attr_addr has the
+physical address of parameter struct guest_perf_attr. guest_event_addr has the
+physical address of a parameter whose type is struct guest_perf_eventi which
+has to be aligned with 4 bytes.
+guest os need allocate an exclusive id per event in this guest os instance, and save it to
+guest_perf_event_param->id. Later on, the id is the only method to notify host
+kernel about on what event guest os wants host kernel to operate.
+guest_perf_event->count saves the latest count of the event.
+guest_perf_event->overflows means how many times this event has overflowed
+since guest os processes it. Host kernel just inc guest_perf_event->overflows
+when the event overflows. Guest kernel should use a atomic_cmpxchg to reset
+guest_perf_event->overflows to 0 in case there is a race between its reset by
+guest os and host kernel data update.
+Host kernel saves count and overflow update information into guest_perf_event
+pointed by guest_perf_event_param->guest_event_addr.
+
+After host kernel creates the event, this event is at disabled mode.
+
+This hypercall3 return 0 when host kernel creates the event successfully. Or
+other value if it fails.
+
+3) Enable event at host side:
+kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_ENABLE, id);
+
+Parameter id means the event id allocated by guest os. Guest os need call this
+hypercall to enable the event at host side. Then, host side will really start
+to collect statistics by this event.
+
+This hypercall3 return 0 if host kernel succeds. Or other value if it fails.
+
+
+4) Disable event at host side:
+kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_DISABLE, id);
+
+Parameter id means the event id allocated by guest os. Guest os need call this
+hypercall to disable the event at host side. Then, host side will stop
+statistics collection initiated by the event.
+
+This hypercall3 return 0 if host kernel succeds. Or other value if it fails.
+
+
+5) Close event at host side:
+kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_CLOSE, id);
+it will close and delete the event at host side.
+
+8) NMI notification from host kernel:
+When an event overflows at host side, host kernel injects an NMI to guest os.
+Guest os has to check all its active events in guest os NMI handler.
+
+
+Usage flow at guest side
+=============
+1) Guest os registers an NMI handler to prepare to process all active event
+overflows.
+2) Guest os calls hypercall3(..., KVM_PERF_OP_OPEN, ...) to create an event at
+host side.
+3) Guest os calls hypercall2 (..., KVM_PERF_OP_ENABLE, ...) to enable the
+event.
+4) Guest os calls hypercall2 (..., KVM_PERF_OP_DISABLE, ...) to disable the
+event.
+5) Guest os could repeat 3) and 4).
+6) Guest os calls hypercall2 (..., KVM_PERF_OP_CLOSE, ...) to close the event.
+
+