From patchwork Thu May 14 13:43:33 2009
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Dong, Eddie" <eddie.dong@intel.com>
X-Patchwork-Id: 23772
Received: from vger.kernel.org (vger.kernel.org [209.132.176.167])
	by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n4EDlXmV030703
	for <patchwork-kvm@patchwork.kernel.org>;
	Thu, 14 May 2009 13:47:33 GMT
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752526AbZENNr2 (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Thu, 14 May 2009 09:47:28 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752387AbZENNr2
	(ORCPT <rfc822;kvm-outgoing>); Thu, 14 May 2009 09:47:28 -0400
Received: from mga09.intel.com ([134.134.136.24]:45554 "EHLO mga09.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751667AbZENNr1 convert rfc822-to-8bit (ORCPT
	<rfc822;kvm@vger.kernel.org>); Thu, 14 May 2009 09:47:27 -0400
Received: from orsmga002.jf.intel.com ([10.7.209.21])
	by orsmga102.jf.intel.com with ESMTP; 14 May 2009 06:37:13 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.41,194,1241420400"; d="scan'208";a="412403463"
Received: from pgsmsx602.gar.corp.intel.com ([10.221.43.81])
	by orsmga002.jf.intel.com with ESMTP; 14 May 2009 06:55:11 -0700
Received: from pgsmsx601.gar.corp.intel.com (10.221.43.69) by
	pgsmsx602.gar.corp.intel.com (10.221.43.81) with Microsoft SMTP
	Server (TLS) id 8.1.358.0; Thu, 14 May 2009 21:47:27 +0800
Received: from pdsmsx602.ccr.corp.intel.com (172.16.12.184) by
	pgsmsx601.gar.corp.intel.com (10.221.43.69) with Microsoft SMTP
	Server (TLS) id 8.1.358.0; Thu, 14 May 2009 21:47:26 +0800
Received: from pdsmsx503.ccr.corp.intel.com ([172.16.12.95]) by
	pdsmsx602.ccr.corp.intel.com ([172.16.12.184]) with mapi;
	Thu, 14 May 2009 21:47:25 +0800
From: "Dong, Eddie" <eddie.dong@intel.com>
To: Avi Kivity <avi@redhat.com>
CC: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Dong, Eddie" <eddie.dong@intel.com>
Date: Thu, 14 May 2009 21:43:33 +0800
Subject: RE: event injection MACROs
Thread-Topic: event injection MACROs
Thread-Index: AcnUdkdCIqtKmFG3TBCbDhOWBRQBygAHFe4g
Message-ID: 
 <9832F13BD22FB94A829F798DA4A8280501B2762226@pdsmsx503.ccr.corp.intel.com>
References: 
 <9832F13BD22FB94A829F798DA4A8280501A80F02A3@pdsmsx503.ccr.corp.intel.com>
	<20090503105330.GL9795@redhat.com>
	<9832F13BD22FB94A829F798DA4A8280501A81A8E83@pdsmsx503.ccr.corp.intel.com>
	<20090508095336.GD25357@redhat.com>
	<9832F13BD22FB94A829F798DA4A8280501A81A8EFE@pdsmsx503.ccr.corp.intel.com>
	<9832F13BD22FB94A829F798DA4A8280501A81A8F02@pdsmsx503.ccr.corp.intel.com>
	<20090508122358.GF25357@redhat.com>
	<9832F13BD22FB94A829F798DA4A8280501A81A8F2B@pdsmsx503.ccr.corp.intel.com>
	<20090508184417.GA27255@redhat.com>
	<9832F13BD22FB94A829F798DA4A8280501A81A9078@pdsmsx503.ccr.corp.intel.com>
	<4A07C304.2000208@redhat.com>
	<9832F13BD22FB94A829F798DA4A8280501B24E54F0@pdsmsx503.ccr.corp.intel.com>
	<4A0A97A5.7070508@redhat.com>
	<9832F13BD22FB94A829F798DA4A8280501B24E5C3A@pdsmsx503.ccr.corp.intel.com>
	<4A0BE41D.3060809@redhat.com>
In-Reply-To: <4A0BE41D.3060809@redhat.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
MIME-Version: 1.0
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Avi Kivity wrote:
> Dong, Eddie wrote:
>> OK.
>> Also back to Gleb's question, the reason I want to do that is to
>> simplify event 
>> generation mechanism in current KVM.
>> 
>> Today KVM use additional layer of exception/nmi/interrupt such as
>> vcpu.arch.exception.pending, vcpu->arch.interrupt.pending &
>> vcpu->arch.nmi_injected. 
>> All those additional layer is due to compete of
>> VM_ENTRY_INTR_INFO_FIELD 
>> write to inject the event. Both SVM & VMX has only one resource to
>> inject the virtual event but KVM generates 3 catagory of events in
>> parallel which further requires additional 
>> logic to dictate among them.
> 
> I thought of using a queue to hold all pending events (in a common
> format), sort it by priority, and inject the head.

The SDM Table 5-4 requires to merge 2 events together, i.e. convert to #DF/
Triple fault or inject serially when 2 events happens no matter NMI, IRQ or exception.

As if considering above events merging activity, that is a single element queue.
 We could have either:  1) A pure SW "queue" that will be flush to HW 
register later (VM_ENTRY_INTR_INFO_FIELD), 2) Direct use HW register.


A potential benefit is that it can avoid duplicated code and potential bugs
in current code as following patch shows if I understand correctly:

                return kvm_mmu_page_fault(vcpu, cr2, error_code);
        }

Either way are OK and up to you. BTW Xen uses HW register directly to representing
an pending event.

> 
>> One example is that exception has higher priority
>> than NMI/IRQ injection in current code which is not true in reality.
>> 
> 
> I don't think it matters in practice, since the guest will see it as a
> timing issue.  NMIs and IRQs are asynchronous (even those generated by
> the guest through the local APIC).

Yes. But also cause IRQ injection be delayed which may have side effect.
For example if guest exception handler is very longer or if guest VCPU fall into
recursive #GP. Within current logic, a guest IRQ event from KDB (IPI) running
on VCPU0, as an example, can't force the dead loop VCPU1 into KDB since it
is recursively #GP. 

> 
>> Another issue is that an failed event from previous injection say
>> IRQ or NMI may be discarded if an virtual exception happens in the
>> EXIT handling now. With the patch of generic double fault handling,
>> this case should be handled as normally. 
>> 
> 
> Discarding an exception is usually okay as it will be regenerated.  I
> don't think we discard interrupts or NMIs.
In reality (Running OS in guest), it doesn't happen so far. But architecturally, 
it could. For example KVM injects an IRQ, but VM Resume get #PF and 
back to KVM with IDT_VECTORING valid. Then KVM will put back the failed 
IRQ to interrupt queue. But if #PF handling generates another exception,
then the interrupt queue won't be able to be injected, since KVM inject 
exception first. And the interrupt queue is discarded at next VM Exit.

Overal, I think this is mostly for simplification but may benefit future
a lot. Especially with Gleb's recent cleanup, it soulds to be much easier to
do than before. 

I may make mistake here, will like to see more comments.
thx, eddie
---
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct
kvm_run *kvm_run)
                cr2 = vmcs_readl(EXIT_QUALIFICATION);
                KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
                            (u32)((u64)cr2 >> 32), handler);
-               if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending
)
+               if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending
 || vcpu->arch.nmi_injected)
                        kvm_mmu_unprotect_page_virt(vcpu, cr2);
                return kvm_mmu_page_fault(vcpu, cr2, error_code);
        }


If using above merged SW "queue" or HW direct register, we can do like following:

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct
kvm_run *kvm_run)
                cr2 = vmcs_readl(EXIT_QUALIFICATION);
                KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
                            (u32)((u64)cr2 >> 32), handler);
-               if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending
)
+               if (vmcs_read(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK)
                        kvm_mmu_unprotect_page_virt(vcpu, cr2);