From patchwork Mon Jun 21 09:31:20 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yanmin Zhang X-Patchwork-Id: 107157 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.3/8.14.3) with ESMTP id o5L9VwN6026660 for ; Mon, 21 Jun 2010 09:31:58 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932085Ab0FUJbM (ORCPT ); Mon, 21 Jun 2010 05:31:12 -0400 Received: from mga02.intel.com ([134.134.136.20]:59369 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932075Ab0FUJbJ (ORCPT ); Mon, 21 Jun 2010 05:31:09 -0400 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 21 Jun 2010 02:30:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.53,452,1272870000"; d="scan'208";a="528717307" Received: from ymzhang.sh.intel.com (HELO [10.239.13.128]) ([10.239.13.128]) by orsmga002.jf.intel.com with ESMTP; 21 Jun 2010 02:30:55 -0700 Subject: [PATCH V2 1/5] ara virt interface of perf to support kvm guest os statistics collection in guest os From: "Zhang, Yanmin" To: LKML , kvm@vger.kernel.org, Avi Kivity Cc: Ingo Molnar , Fr??d??ric Weisbecker , Arnaldo Carvalho de Melo , Cyrill Gorcunov , Lin Ming , Sheng Yang , Marcelo Tosatti , oerg Roedel , Jes Sorensen , Gleb Natapov , Zachary Amsden , zhiteng.huang@intel.com, tim.c.chen@intel.com Date: Mon, 21 Jun 2010 17:31:20 +0800 Message-Id: <1277112680.2096.509.camel@ymzhang.sh.intel.com> Mime-Version: 1.0 X-Mailer: Evolution 2.28.0 (2.28.0-2.fc12) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Mon, 21 Jun 2010 09:32:00 +0000 (UTC) --- linux-2.6_tip0620/Documentation/kvm/paravirt-perf.txt 1970-01-01 08:00:00.000000000 +0800 +++ linux-2.6_tip0620perfkvm/Documentation/kvm/paravirt-perf.txt 2010-06-21 15:21:39.312999849 +0800 @@ -0,0 +1,133 @@ +The x86 kvm paravirt perf event interface +=================================== + +This paravirt interface is responsible for supporting guest os perf event +collections. If guest os supports this interface, users could run command +perf in guest os directly. + +Design +======== + +Guest os calls a series of hypercalls to communicate with host kernel to +create/enable/disable/close perf events. Host kernel notifies guest os +by injecting an NMI to guest os when an event overflows. Guets os need +go through all its active events to check if they overflow, and output +performance statistics if they do. + +ABI +===== + +1) Detect if host kernel supports paravirt perf interface: +#define KVM_FEATURE_PV_PERF 4 +Host kernel defines above cpuid bit. Guest os calls cpuid to check if host +os retuns this bit. If it does, it mean host kernel supports paravirt perf +interface. + +2) Open a new event at host side: +kvm_hypercall3(KVM_PERF_OP, KVM_PERF_OP_OPEN, param_addr_low32bit, +param_addr_high32bit); + +#define KVM_PERF_OP 3 +/* Operations for KVM_PERF_OP */ +#define KVM_PERF_OP_OPEN 1 +#define KVM_PERF_OP_CLOSE 2 +#define KVM_PERF_OP_ENABLE 3 +#define KVM_PERF_OP_DISABLE 4 +#define KVM_PERF_OP_READ 5 +/* + * guest_perf_attr is used when guest calls hypercall to + * open a new perf_event at host side. Mostly, it's a copy of + * perf_event_attr and deletes something not used by host kernel. + */ +struct guest_perf_attr { + __u32 type; + __u64 config; + __u64 sample_period; + __u64 sample_type; + __u64 read_format; + __u64 flags; + __u32 bp_type; + __u64 bp_addr; + __u64 bp_len; +}; +/* + * data communication area about perf_event between + * Host kernel and guest kernel + */ +struct guest_perf_event { + u64 count; + atomic_t overflows; +}; +struct guest_perf_event_param { + __u64 attr_addr; + __u64 guest_event_addr; + /* In case there is an alignment issue, we put id as the last one */ + int id; +}; + +param_addr_low32bit and param_addr_high32bit compose a u64 integer which means +the physical address of parameter struct guest_perf_event_param. +struct guest_perf_event_param consists of 3 members. attr_addr has the +physical address of parameter struct guest_perf_attr. guest_event_addr has the +physical address of a parameter whose type is struct guest_perf_eventi which +has to be aligned with 4 bytes. +guest os need allocate an exclusive id per event in this guest os instance, and save it to +guest_perf_event_param->id. Later on, the id is the only method to notify host +kernel about on what event guest os wants host kernel to operate. +guest_perf_event->count saves the latest count of the event. +guest_perf_event->overflows means how many times this event has overflowed +since guest os processes it. Host kernel just inc guest_perf_event->overflows +when the event overflows. Guest kernel should use a atomic_cmpxchg to reset +guest_perf_event->overflows to 0 in case there is a race between its reset by +guest os and host kernel data update. +Host kernel saves count and overflow update information into guest_perf_event +pointed by guest_perf_event_param->guest_event_addr. + +After host kernel creates the event, this event is at disabled mode. + +This hypercall3 return 0 when host kernel creates the event successfully. Or +other value if it fails. + +3) Enable event at host side: +kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_ENABLE, id); + +Parameter id means the event id allocated by guest os. Guest os need call this +hypercall to enable the event at host side. Then, host side will really start +to collect statistics by this event. + +This hypercall3 return 0 if host kernel succeds. Or other value if it fails. + + +4) Disable event at host side: +kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_DISABLE, id); + +Parameter id means the event id allocated by guest os. Guest os need call this +hypercall to disable the event at host side. Then, host side will stop +statistics collection initiated by the event. + +This hypercall3 return 0 if host kernel succeds. Or other value if it fails. + + +5) Close event at host side: +kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_CLOSE, id); +it will close and delete the event at host side. + +8) NMI notification from host kernel: +When an event overflows at host side, host kernel injects an NMI to guest os. +Guest os has to check all its active events in guest os NMI handler. + + +Usage flow at guest side +============= +1) Guest os registers an NMI handler to prepare to process all active event +overflows. +2) Guest os calls hypercall3(..., KVM_PERF_OP_OPEN, ...) to create an event at +host side. +3) Guest os calls hypercall2 (..., KVM_PERF_OP_ENABLE, ...) to enable the +event. +4) Guest os calls hypercall2 (..., KVM_PERF_OP_DISABLE, ...) to disable the +event. +5) Guest os could repeat 3) and 4). +6) Guest os calls hypercall2 (..., KVM_PERF_OP_CLOSE, ...) to close the event. + +