Message ID | 5124C93B.50902@siemens.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, By the way, if you haven't seen my description of why the current code did what it did, take a look at http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html Another description might also come in handy: http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. Last time I checked, if I'm remembering correctly, the nested SVM code did something a bit different: After the exit from L2 to L1 and unnecessarily queuing the pending interrupt for injection, it skipped one entry into L1, and as usual after the entry the interrupt queue is cleared so next time around, when L1 one is really entered, the wrong injection is not attempted. > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. I didn't understand this last point. > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > - /* clear vm-entry fields which are to be cleared on exit */ > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); It would be nice to move these lines out of prepare_vmcs12(), since they don't really do anything with vmcs12, and move it into nested_vmx_vmexit() (which is the one which called prepare_vmcs12()). Did you test this both with PIN_BASED_EXT_INTR_MASK (the usual case) and !PIN_BASED_EXT_INTR_MASK (the case which interests you)? We need to make sure that in the former case, this doesn't clear the interrupt queue after we put an interrupt to be injected in it (at first glance it seems fine, but these code paths are so convoluted, it's hard to be sure). > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { > + /* > + * Preserve the event that was supposed to be injected > + * by emulating it would have been returned in > + * IDT_VECTORING_INFO_FIELD. > + */ > + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & > + INTR_INFO_VALID_MASK) { > + vmcs12->idt_vectoring_info_field = > + vmcs12->vm_entry_intr_info_field; > + vmcs12->idt_vectoring_error_code = > + vmcs12->vm_entry_exception_error_code; > + vmcs12->vm_exit_instruction_len = > + vmcs12->vm_entry_instruction_len; > + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); I'm afraid I'm missing what you are trying to do here. Why would vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK ever be true? After all, the processor clears it after each sucessful exit so last if() will only succeed on failed entries - but this is NOT the case if we're in the enclosing if (note that vmcs12->vm_exit_reason = vmcs_read32(VM_EXIT_REASON)). Maybe I'm missing something? Nadav.
On 2013-02-20 15:14, Nadav Har'El wrote: > Hi, > > By the way, if you haven't seen my description of why the current code > did what it did, take a look at > http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > Another description might also come in handy: > http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > > On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >> This aligns VMX more with SVM regarding event injection and recovery for >> nested guests. The changes allow to inject interrupts directly from L0 >> to L2. >> >> One difference to SVM is that we always transfer the pending event >> injection into the architectural state of the VCPU and then drop it from >> there if it turns out that we left L2 to enter L1. > > Last time I checked, if I'm remembering correctly, the nested SVM code did > something a bit different: After the exit from L2 to L1 and unnecessarily > queuing the pending interrupt for injection, it skipped one entry into L1, > and as usual after the entry the interrupt queue is cleared so next time > around, when L1 one is really entered, the wrong injection is not attempted. > >> VMX and SVM are now identical in how they recover event injections from >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >> still contains a valid event and, if yes, transfer the content into L1's >> idt_vectoring_info_field. > >> To avoid that we incorrectly leak an event into the architectural VCPU >> state that L1 wants to inject, we skip cancellation on nested run. > > I didn't understand this last point. - prepare_vmcs02 sets event to be injected into L2 - while trying to enter L2, a cancel condition is met - we call vmx_cancel_interrupts but should now avoid filling L1's event into the arch event queues - it's kept in vmcs12 > >> @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) >> vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); >> vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); >> >> - /* clear vm-entry fields which are to be cleared on exit */ >> - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) >> + /* drop what we picked up for L0 via vmx_complete_interrupts */ >> + vcpu->arch.nmi_injected = false; >> + kvm_clear_exception_queue(vcpu); >> + kvm_clear_interrupt_queue(vcpu); > > It would be nice to move these lines out of prepare_vmcs12(), since they > don't really do anything with vmcs12, and move it into > nested_vmx_vmexit() (which is the one which called prepare_vmcs12()). OK. > > Did you test this both with PIN_BASED_EXT_INTR_MASK (the usual case) and > !PIN_BASED_EXT_INTR_MASK (the case which interests you)? We need to make > sure that in the former case, this doesn't clear the interrupt queue after > we put an interrupt to be injected in it (at first glance it seems fine, > but these code paths are so convoluted, it's hard to be sure). I tested both, but none of my tests was close to cover all potential corner cases. But that unconditional queue clearing surely deserves attention and critical review. > >> + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && >> + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { >> + /* >> + * Preserve the event that was supposed to be injected >> + * by emulating it would have been returned in >> + * IDT_VECTORING_INFO_FIELD. >> + */ >> + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & >> + INTR_INFO_VALID_MASK) { >> + vmcs12->idt_vectoring_info_field = >> + vmcs12->vm_entry_intr_info_field; >> + vmcs12->idt_vectoring_error_code = >> + vmcs12->vm_entry_exception_error_code; >> + vmcs12->vm_exit_instruction_len = >> + vmcs12->vm_entry_instruction_len; >> + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); > > I'm afraid I'm missing what you are trying to do here. Why would > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK ever be > true? After all, the processor clears it after each sucessful exit so > last if() will only succeed on failed entries - but this is NOT the > case if we're in the enclosing if (note that vmcs12->vm_exit_reason = > vmcs_read32(VM_EXIT_REASON)). Maybe I'm missing something? Canceled vmentry as indicated above. Look at vcpu_enter_guest: kvm_mmu_reload may fail, or we need to handle some async event / perform some reschedule. But those points are past prepare_vmcs02. Jan
On 2013-02-20 14:01, Jan Kiszka wrote: > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. > > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > --- > > Survived moderate testing here and (currently) makes sense to me, but > please review very carefully. I wouldn't be surprised if I'm still > missing some subtle corner case. Forgot to point this out again: It still takes "KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 injection work. So this patch logically depends on it. Jan > > arch/x86/kvm/vmx.c | 57 +++++++++++++++++++++++---------------------------- > 1 files changed, 26 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index dd3a8a0..7d2fbd2 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > { > - if (is_guest_mode(&vmx->vcpu)) > - return; > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > VM_EXIT_INSTRUCTION_LEN, > IDT_VECTORING_ERROR_CODE); > @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > { > - if (is_guest_mode(vcpu)) > + if (to_vmx(vcpu)->nested.nested_run_pending) > return; > __vmx_complete_interrupts(to_vmx(vcpu), > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), > @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > struct vcpu_vmx *vmx = to_vmx(vcpu); > unsigned long debugctlmsr; > > - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_VALID_MASK) { > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > - vmcs12->idt_vectoring_info_field); > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > - vmcs12->vm_exit_instruction_len); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_DELIVER_CODE_MASK) > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > - vmcs12->idt_vectoring_error_code); > - } > - } > - > /* Record the guest's net vcpu time for enforced NMI injections. */ > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) > vmx->entry_time = ktime_get(); > @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > - if (is_guest_mode(vcpu)) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; > - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { > - vmcs12->idt_vectoring_error_code = > - vmcs_read32(IDT_VECTORING_ERROR_CODE); > - vmcs12->vm_exit_instruction_len = > - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > - } > - } > - > vmx->loaded_vmcs->launched = 1; > > vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > - /* clear vm-entry fields which are to be cleared on exit */ > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); > + > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { > + /* > + * Preserve the event that was supposed to be injected > + * by emulating it would have been returned in > + * IDT_VECTORING_INFO_FIELD. > + */ > + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & > + INTR_INFO_VALID_MASK) { > + vmcs12->idt_vectoring_info_field = > + vmcs12->vm_entry_intr_info_field; > + vmcs12->idt_vectoring_error_code = > + vmcs12->vm_entry_exception_error_code; > + vmcs12->vm_exit_instruction_len = > + vmcs12->vm_entry_instruction_len; > + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); > + } > + > + /* clear vm-entry fields which are to be cleared on exit */ > vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK; > + } > } > > /* >
On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: > On 2013-02-20 14:01, Jan Kiszka wrote: > > This aligns VMX more with SVM regarding event injection and recovery for > > nested guests. The changes allow to inject interrupts directly from L0 > > to L2. > > > > One difference to SVM is that we always transfer the pending event > > injection into the architectural state of the VCPU and then drop it from > > there if it turns out that we left L2 to enter L1. > > > > VMX and SVM are now identical in how they recover event injections from > > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > > still contains a valid event and, if yes, transfer the content into L1's > > idt_vectoring_info_field. > > > > To avoid that we incorrectly leak an event into the architectural VCPU > > state that L1 wants to inject, we skip cancellation on nested run. > > > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > > --- > > > > Survived moderate testing here and (currently) makes sense to me, but > > please review very carefully. I wouldn't be surprised if I'm still > > missing some subtle corner case. > > Forgot to point this out again: It still takes "KVM: nVMX: Fix injection > of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 > injection work. So this patch logically depends on it. > But this patch has hunks from that patch. > Jan > > > > > arch/x86/kvm/vmx.c | 57 +++++++++++++++++++++++---------------------------- > > 1 files changed, 26 insertions(+), 31 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index dd3a8a0..7d2fbd2 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, > > > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > > { > > - if (is_guest_mode(&vmx->vcpu)) > > - return; > > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > > VM_EXIT_INSTRUCTION_LEN, > > IDT_VECTORING_ERROR_CODE); > > @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > > > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > > { > > - if (is_guest_mode(vcpu)) > > + if (to_vmx(vcpu)->nested.nested_run_pending) > > return; > > __vmx_complete_interrupts(to_vmx(vcpu), > > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), > > @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > unsigned long debugctlmsr; > > > > - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { > > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > - if (vmcs12->idt_vectoring_info_field & > > - VECTORING_INFO_VALID_MASK) { > > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > > - vmcs12->idt_vectoring_info_field); > > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > > - vmcs12->vm_exit_instruction_len); > > - if (vmcs12->idt_vectoring_info_field & > > - VECTORING_INFO_DELIVER_CODE_MASK) > > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > > - vmcs12->idt_vectoring_error_code); > > - } > > - } > > - > > /* Record the guest's net vcpu time for enforced NMI injections. */ > > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) > > vmx->entry_time = ktime_get(); > > @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > > > > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > > > - if (is_guest_mode(vcpu)) { > > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; > > - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { > > - vmcs12->idt_vectoring_error_code = > > - vmcs_read32(IDT_VECTORING_ERROR_CODE); > > - vmcs12->vm_exit_instruction_len = > > - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > > - } > > - } > > - > > vmx->loaded_vmcs->launched = 1; > > > > vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); > > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) > > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > > > - /* clear vm-entry fields which are to be cleared on exit */ > > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > > + vcpu->arch.nmi_injected = false; > > + kvm_clear_exception_queue(vcpu); > > + kvm_clear_interrupt_queue(vcpu); > > + > > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { > > + /* > > + * Preserve the event that was supposed to be injected > > + * by emulating it would have been returned in > > + * IDT_VECTORING_INFO_FIELD. > > + */ > > + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & > > + INTR_INFO_VALID_MASK) { > > + vmcs12->idt_vectoring_info_field = > > + vmcs12->vm_entry_intr_info_field; > > + vmcs12->idt_vectoring_error_code = > > + vmcs12->vm_entry_exception_error_code; > > + vmcs12->vm_exit_instruction_len = > > + vmcs12->vm_entry_instruction_len; > > + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); > > + } > > + > > + /* clear vm-entry fields which are to be cleared on exit */ > > vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK; > > + } > > } > > > > /* > > > -- > Siemens AG, Corporate Technology, CT RTC ITP SDP-DE > Corporate Competence Center Embedded Linux -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-20 16:30, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: >> On 2013-02-20 14:01, Jan Kiszka wrote: >>> This aligns VMX more with SVM regarding event injection and recovery for >>> nested guests. The changes allow to inject interrupts directly from L0 >>> to L2. >>> >>> One difference to SVM is that we always transfer the pending event >>> injection into the architectural state of the VCPU and then drop it from >>> there if it turns out that we left L2 to enter L1. >>> >>> VMX and SVM are now identical in how they recover event injections from >>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>> still contains a valid event and, if yes, transfer the content into L1's >>> idt_vectoring_info_field. >>> >>> To avoid that we incorrectly leak an event into the architectural VCPU >>> state that L1 wants to inject, we skip cancellation on nested run. >>> >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>> --- >>> >>> Survived moderate testing here and (currently) makes sense to me, but >>> please review very carefully. I wouldn't be surprised if I'm still >>> missing some subtle corner case. >> >> Forgot to point this out again: It still takes "KVM: nVMX: Fix injection >> of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 >> injection work. So this patch logically depends on it. >> > But this patch has hunks from that patch. Not mechanically. If you prefer me merging them together, let me know. Jan
On Wed, Feb 20, 2013 at 04:51:39PM +0100, Jan Kiszka wrote: > On 2013-02-20 16:30, Gleb Natapov wrote: > > On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: > >> On 2013-02-20 14:01, Jan Kiszka wrote: > >>> This aligns VMX more with SVM regarding event injection and recovery for > >>> nested guests. The changes allow to inject interrupts directly from L0 > >>> to L2. > >>> > >>> One difference to SVM is that we always transfer the pending event > >>> injection into the architectural state of the VCPU and then drop it from > >>> there if it turns out that we left L2 to enter L1. > >>> > >>> VMX and SVM are now identical in how they recover event injections from > >>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >>> still contains a valid event and, if yes, transfer the content into L1's > >>> idt_vectoring_info_field. > >>> > >>> To avoid that we incorrectly leak an event into the architectural VCPU > >>> state that L1 wants to inject, we skip cancellation on nested run. > >>> > >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>> --- > >>> > >>> Survived moderate testing here and (currently) makes sense to me, but > >>> please review very carefully. I wouldn't be surprised if I'm still > >>> missing some subtle corner case. > >> > >> Forgot to point this out again: It still takes "KVM: nVMX: Fix injection > >> of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 > >> injection work. So this patch logically depends on it. > >> > > But this patch has hunks from that patch. > > Not mechanically. > What do you mean? > If you prefer me merging them together, let me know. > For review not necessary, for applying preferably. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-20 16:57, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 04:51:39PM +0100, Jan Kiszka wrote: >> On 2013-02-20 16:30, Gleb Natapov wrote: >>> On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: >>>> On 2013-02-20 14:01, Jan Kiszka wrote: >>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>> to L2. >>>>> >>>>> One difference to SVM is that we always transfer the pending event >>>>> injection into the architectural state of the VCPU and then drop it from >>>>> there if it turns out that we left L2 to enter L1. >>>>> >>>>> VMX and SVM are now identical in how they recover event injections from >>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>> idt_vectoring_info_field. >>>>> >>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>>> >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>> --- >>>>> >>>>> Survived moderate testing here and (currently) makes sense to me, but >>>>> please review very carefully. I wouldn't be surprised if I'm still >>>>> missing some subtle corner case. >>>> >>>> Forgot to point this out again: It still takes "KVM: nVMX: Fix injection >>>> of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 >>>> injection work. So this patch logically depends on it. >>>> >>> But this patch has hunks from that patch. >> >> Not mechanically. >> > What do you mean? You can apply them in arbitrary order, just minor offset shifts will be the result. > >> If you prefer me merging them together, let me know. >> > For review not necessary, for applying preferably. OK, will wait for review on this, then send out a combo patch. Jan
On Wed, Feb 20, 2013 at 02:01:47PM +0100, Jan Kiszka wrote: > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. > > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > --- > > Survived moderate testing here and (currently) makes sense to me, but > please review very carefully. I wouldn't be surprised if I'm still > missing some subtle corner case. > > arch/x86/kvm/vmx.c | 57 +++++++++++++++++++++++---------------------------- > 1 files changed, 26 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index dd3a8a0..7d2fbd2 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > { > - if (is_guest_mode(&vmx->vcpu)) > - return; > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > VM_EXIT_INSTRUCTION_LEN, > IDT_VECTORING_ERROR_CODE); > @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > { > - if (is_guest_mode(vcpu)) > + if (to_vmx(vcpu)->nested.nested_run_pending) > return; Why is this needed here? > __vmx_complete_interrupts(to_vmx(vcpu), > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), > @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > struct vcpu_vmx *vmx = to_vmx(vcpu); > unsigned long debugctlmsr; > > - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_VALID_MASK) { > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > - vmcs12->idt_vectoring_info_field); > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > - vmcs12->vm_exit_instruction_len); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_DELIVER_CODE_MASK) > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > - vmcs12->idt_vectoring_error_code); > - } > - } > - > /* Record the guest's net vcpu time for enforced NMI injections. */ > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) > vmx->entry_time = ktime_get(); > @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > - if (is_guest_mode(vcpu)) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; > - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { > - vmcs12->idt_vectoring_error_code = > - vmcs_read32(IDT_VECTORING_ERROR_CODE); > - vmcs12->vm_exit_instruction_len = > - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > - } > - } > - > vmx->loaded_vmcs->launched = 1; > > vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > - /* clear vm-entry fields which are to be cleared on exit */ > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); > + > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { > + /* > + * Preserve the event that was supposed to be injected > + * by emulating it would have been returned in > + * IDT_VECTORING_INFO_FIELD. > + */ > + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & > + INTR_INFO_VALID_MASK) { > + vmcs12->idt_vectoring_info_field = > + vmcs12->vm_entry_intr_info_field; > + vmcs12->idt_vectoring_error_code = > + vmcs12->vm_entry_exception_error_code; > + vmcs12->vm_exit_instruction_len = > + vmcs12->vm_entry_instruction_len; > + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); > + } > + > + /* clear vm-entry fields which are to be cleared on exit */ > vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK; > + } > } > > /* > -- > 1.7.3.4 -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-20 17:46, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 02:01:47PM +0100, Jan Kiszka wrote: >> This aligns VMX more with SVM regarding event injection and recovery for >> nested guests. The changes allow to inject interrupts directly from L0 >> to L2. >> >> One difference to SVM is that we always transfer the pending event >> injection into the architectural state of the VCPU and then drop it from >> there if it turns out that we left L2 to enter L1. >> >> VMX and SVM are now identical in how they recover event injections from >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >> still contains a valid event and, if yes, transfer the content into L1's >> idt_vectoring_info_field. >> >> To avoid that we incorrectly leak an event into the architectural VCPU >> state that L1 wants to inject, we skip cancellation on nested run. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >> --- >> >> Survived moderate testing here and (currently) makes sense to me, but >> please review very carefully. I wouldn't be surprised if I'm still >> missing some subtle corner case. >> >> arch/x86/kvm/vmx.c | 57 +++++++++++++++++++++++---------------------------- >> 1 files changed, 26 insertions(+), 31 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index dd3a8a0..7d2fbd2 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, >> >> static void vmx_complete_interrupts(struct vcpu_vmx *vmx) >> { >> - if (is_guest_mode(&vmx->vcpu)) >> - return; >> __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, >> VM_EXIT_INSTRUCTION_LEN, >> IDT_VECTORING_ERROR_CODE); >> @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) >> >> static void vmx_cancel_injection(struct kvm_vcpu *vcpu) >> { >> - if (is_guest_mode(vcpu)) >> + if (to_vmx(vcpu)->nested.nested_run_pending) >> return; > Why is this needed here? Please check if my reply to Nadav explains this sufficiently. Jan
On Wed, Feb 20, 2013 at 05:48:40PM +0100, Jan Kiszka wrote: > On 2013-02-20 17:46, Gleb Natapov wrote: > > On Wed, Feb 20, 2013 at 02:01:47PM +0100, Jan Kiszka wrote: > >> This aligns VMX more with SVM regarding event injection and recovery for > >> nested guests. The changes allow to inject interrupts directly from L0 > >> to L2. > >> > >> One difference to SVM is that we always transfer the pending event > >> injection into the architectural state of the VCPU and then drop it from > >> there if it turns out that we left L2 to enter L1. > >> > >> VMX and SVM are now identical in how they recover event injections from > >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >> still contains a valid event and, if yes, transfer the content into L1's > >> idt_vectoring_info_field. > >> > >> To avoid that we incorrectly leak an event into the architectural VCPU > >> state that L1 wants to inject, we skip cancellation on nested run. > >> > >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >> --- > >> > >> Survived moderate testing here and (currently) makes sense to me, but > >> please review very carefully. I wouldn't be surprised if I'm still > >> missing some subtle corner case. > >> > >> arch/x86/kvm/vmx.c | 57 +++++++++++++++++++++++---------------------------- > >> 1 files changed, 26 insertions(+), 31 deletions(-) > >> > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> index dd3a8a0..7d2fbd2 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, > >> > >> static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > >> { > >> - if (is_guest_mode(&vmx->vcpu)) > >> - return; > >> __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > >> VM_EXIT_INSTRUCTION_LEN, > >> IDT_VECTORING_ERROR_CODE); > >> @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > >> > >> static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > >> { > >> - if (is_guest_mode(vcpu)) > >> + if (to_vmx(vcpu)->nested.nested_run_pending) > >> return; > > Why is this needed here? > > Please check if my reply to Nadav explains this sufficiently. > Ah, sorry. Will follow up there if it is not. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: > On 2013-02-20 15:14, Nadav Har'El wrote: > > Hi, > > > > By the way, if you haven't seen my description of why the current code > > did what it did, take a look at > > http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > > Another description might also come in handy: > > http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > > > > On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > >> This aligns VMX more with SVM regarding event injection and recovery for > >> nested guests. The changes allow to inject interrupts directly from L0 > >> to L2. > >> > >> One difference to SVM is that we always transfer the pending event > >> injection into the architectural state of the VCPU and then drop it from > >> there if it turns out that we left L2 to enter L1. > > > > Last time I checked, if I'm remembering correctly, the nested SVM code did > > something a bit different: After the exit from L2 to L1 and unnecessarily > > queuing the pending interrupt for injection, it skipped one entry into L1, > > and as usual after the entry the interrupt queue is cleared so next time > > around, when L1 one is really entered, the wrong injection is not attempted. > > > >> VMX and SVM are now identical in how they recover event injections from > >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >> still contains a valid event and, if yes, transfer the content into L1's > >> idt_vectoring_info_field. > > > >> To avoid that we incorrectly leak an event into the architectural VCPU > >> state that L1 wants to inject, we skip cancellation on nested run. > > > > I didn't understand this last point. > > - prepare_vmcs02 sets event to be injected into L2 > - while trying to enter L2, a cancel condition is met > - we call vmx_cancel_interrupts but should now avoid filling L1's event > into the arch event queues - it's kept in vmcs12 > But what if we put it in arch event queue? It will be reinjected during next entry attempt, so nothing bad happens and we have one less if() to explain, or do I miss something terrible that will happen? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-20 18:01, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >> On 2013-02-20 15:14, Nadav Har'El wrote: >>> Hi, >>> >>> By the way, if you haven't seen my description of why the current code >>> did what it did, take a look at >>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>> Another description might also come in handy: >>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>> >>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>> This aligns VMX more with SVM regarding event injection and recovery for >>>> nested guests. The changes allow to inject interrupts directly from L0 >>>> to L2. >>>> >>>> One difference to SVM is that we always transfer the pending event >>>> injection into the architectural state of the VCPU and then drop it from >>>> there if it turns out that we left L2 to enter L1. >>> >>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>> something a bit different: After the exit from L2 to L1 and unnecessarily >>> queuing the pending interrupt for injection, it skipped one entry into L1, >>> and as usual after the entry the interrupt queue is cleared so next time >>> around, when L1 one is really entered, the wrong injection is not attempted. >>> >>>> VMX and SVM are now identical in how they recover event injections from >>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>> still contains a valid event and, if yes, transfer the content into L1's >>>> idt_vectoring_info_field. >>> >>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>> state that L1 wants to inject, we skip cancellation on nested run. >>> >>> I didn't understand this last point. >> >> - prepare_vmcs02 sets event to be injected into L2 >> - while trying to enter L2, a cancel condition is met >> - we call vmx_cancel_interrupts but should now avoid filling L1's event >> into the arch event queues - it's kept in vmcs12 >> > But what if we put it in arch event queue? It will be reinjected during > next entry attempt, so nothing bad happens and we have one less if() to explain, > or do I miss something terrible that will happen? I started without that if but ran into troubles with KVM-on-KVM (L1 locks up). Let me dig out the instrumentation and check the event flow again. Jan
On 2013-02-20 18:24, Jan Kiszka wrote: > On 2013-02-20 18:01, Gleb Natapov wrote: >> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>> On 2013-02-20 15:14, Nadav Har'El wrote: >>>> Hi, >>>> >>>> By the way, if you haven't seen my description of why the current code >>>> did what it did, take a look at >>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>>> Another description might also come in handy: >>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>>> >>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>> to L2. >>>>> >>>>> One difference to SVM is that we always transfer the pending event >>>>> injection into the architectural state of the VCPU and then drop it from >>>>> there if it turns out that we left L2 to enter L1. >>>> >>>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>>> something a bit different: After the exit from L2 to L1 and unnecessarily >>>> queuing the pending interrupt for injection, it skipped one entry into L1, >>>> and as usual after the entry the interrupt queue is cleared so next time >>>> around, when L1 one is really entered, the wrong injection is not attempted. >>>> >>>>> VMX and SVM are now identical in how they recover event injections from >>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>> idt_vectoring_info_field. >>>> >>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>> >>>> I didn't understand this last point. >>> >>> - prepare_vmcs02 sets event to be injected into L2 >>> - while trying to enter L2, a cancel condition is met >>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>> into the arch event queues - it's kept in vmcs12 >>> >> But what if we put it in arch event queue? It will be reinjected during >> next entry attempt, so nothing bad happens and we have one less if() to explain, >> or do I miss something terrible that will happen? > > I started without that if but ran into troubles with KVM-on-KVM (L1 > locks up). Let me dig out the instrumentation and check the event flow > again. OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into the architectural VCPU state, we will also trigger enable_irq_window. And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants inject. That will send us into an endless loop. Not sure if we can and should handle this scenario in enable_irq_window in a nicer way. Open for suggestions. Jan
On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: > On 2013-02-20 18:24, Jan Kiszka wrote: > > On 2013-02-20 18:01, Gleb Natapov wrote: > >> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: > >>> On 2013-02-20 15:14, Nadav Har'El wrote: > >>>> Hi, > >>>> > >>>> By the way, if you haven't seen my description of why the current code > >>>> did what it did, take a look at > >>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > >>>> Another description might also come in handy: > >>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > >>>> > >>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > >>>>> This aligns VMX more with SVM regarding event injection and recovery for > >>>>> nested guests. The changes allow to inject interrupts directly from L0 > >>>>> to L2. > >>>>> > >>>>> One difference to SVM is that we always transfer the pending event > >>>>> injection into the architectural state of the VCPU and then drop it from > >>>>> there if it turns out that we left L2 to enter L1. > >>>> > >>>> Last time I checked, if I'm remembering correctly, the nested SVM code did > >>>> something a bit different: After the exit from L2 to L1 and unnecessarily > >>>> queuing the pending interrupt for injection, it skipped one entry into L1, > >>>> and as usual after the entry the interrupt queue is cleared so next time > >>>> around, when L1 one is really entered, the wrong injection is not attempted. > >>>> > >>>>> VMX and SVM are now identical in how they recover event injections from > >>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >>>>> still contains a valid event and, if yes, transfer the content into L1's > >>>>> idt_vectoring_info_field. > >>>> > >>>>> To avoid that we incorrectly leak an event into the architectural VCPU > >>>>> state that L1 wants to inject, we skip cancellation on nested run. > >>>> > >>>> I didn't understand this last point. > >>> > >>> - prepare_vmcs02 sets event to be injected into L2 > >>> - while trying to enter L2, a cancel condition is met > >>> - we call vmx_cancel_interrupts but should now avoid filling L1's event > >>> into the arch event queues - it's kept in vmcs12 > >>> > >> But what if we put it in arch event queue? It will be reinjected during > >> next entry attempt, so nothing bad happens and we have one less if() to explain, > >> or do I miss something terrible that will happen? > > > > I started without that if but ran into troubles with KVM-on-KVM (L1 > > locks up). Let me dig out the instrumentation and check the event flow > > again. > > OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into > the architectural VCPU state, we will also trigger enable_irq_window. > And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants > inject. That will send us into an endless loop. > Why would we trigger enable_irq_window()? enable_irq_window() triggers only if interrupt is pending in one of irq chips, not in architectural VCPU state. > Not sure if we can and should handle this scenario in enable_irq_window > in a nicer way. Open for suggestions. > > Jan > > -- > Siemens AG, Corporate Technology, CT RTC ITP SDP-DE > Corporate Competence Center Embedded Linux -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-21 10:22, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: >> On 2013-02-20 18:24, Jan Kiszka wrote: >>> On 2013-02-20 18:01, Gleb Natapov wrote: >>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>>>> On 2013-02-20 15:14, Nadav Har'El wrote: >>>>>> Hi, >>>>>> >>>>>> By the way, if you haven't seen my description of why the current code >>>>>> did what it did, take a look at >>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>>>>> Another description might also come in handy: >>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>>>>> >>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>>>> to L2. >>>>>>> >>>>>>> One difference to SVM is that we always transfer the pending event >>>>>>> injection into the architectural state of the VCPU and then drop it from >>>>>>> there if it turns out that we left L2 to enter L1. >>>>>> >>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily >>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, >>>>>> and as usual after the entry the interrupt queue is cleared so next time >>>>>> around, when L1 one is really entered, the wrong injection is not attempted. >>>>>> >>>>>>> VMX and SVM are now identical in how they recover event injections from >>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>>>> idt_vectoring_info_field. >>>>>> >>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>>>> >>>>>> I didn't understand this last point. >>>>> >>>>> - prepare_vmcs02 sets event to be injected into L2 >>>>> - while trying to enter L2, a cancel condition is met >>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>>>> into the arch event queues - it's kept in vmcs12 >>>>> >>>> But what if we put it in arch event queue? It will be reinjected during >>>> next entry attempt, so nothing bad happens and we have one less if() to explain, >>>> or do I miss something terrible that will happen? >>> >>> I started without that if but ran into troubles with KVM-on-KVM (L1 >>> locks up). Let me dig out the instrumentation and check the event flow >>> again. >> >> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into >> the architectural VCPU state, we will also trigger enable_irq_window. >> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants >> inject. That will send us into an endless loop. >> > Why would we trigger enable_irq_window()? enable_irq_window() triggers > only if interrupt is pending in one of irq chips, not in architectural > VCPU state. Precisely this is the case if an IRQ for L1 arrived while we tried to enter L2 and caused the cancellation above. Jan
On Thu, Feb 21, 2013 at 10:43:57AM +0100, Jan Kiszka wrote: > On 2013-02-21 10:22, Gleb Natapov wrote: > > On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: > >> On 2013-02-20 18:24, Jan Kiszka wrote: > >>> On 2013-02-20 18:01, Gleb Natapov wrote: > >>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: > >>>>> On 2013-02-20 15:14, Nadav Har'El wrote: > >>>>>> Hi, > >>>>>> > >>>>>> By the way, if you haven't seen my description of why the current code > >>>>>> did what it did, take a look at > >>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > >>>>>> Another description might also come in handy: > >>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > >>>>>> > >>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > >>>>>>> This aligns VMX more with SVM regarding event injection and recovery for > >>>>>>> nested guests. The changes allow to inject interrupts directly from L0 > >>>>>>> to L2. > >>>>>>> > >>>>>>> One difference to SVM is that we always transfer the pending event > >>>>>>> injection into the architectural state of the VCPU and then drop it from > >>>>>>> there if it turns out that we left L2 to enter L1. > >>>>>> > >>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did > >>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily > >>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, > >>>>>> and as usual after the entry the interrupt queue is cleared so next time > >>>>>> around, when L1 one is really entered, the wrong injection is not attempted. > >>>>>> > >>>>>>> VMX and SVM are now identical in how they recover event injections from > >>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >>>>>>> still contains a valid event and, if yes, transfer the content into L1's > >>>>>>> idt_vectoring_info_field. > >>>>>> > >>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU > >>>>>>> state that L1 wants to inject, we skip cancellation on nested run. > >>>>>> > >>>>>> I didn't understand this last point. > >>>>> > >>>>> - prepare_vmcs02 sets event to be injected into L2 > >>>>> - while trying to enter L2, a cancel condition is met > >>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event > >>>>> into the arch event queues - it's kept in vmcs12 > >>>>> > >>>> But what if we put it in arch event queue? It will be reinjected during > >>>> next entry attempt, so nothing bad happens and we have one less if() to explain, > >>>> or do I miss something terrible that will happen? > >>> > >>> I started without that if but ran into troubles with KVM-on-KVM (L1 > >>> locks up). Let me dig out the instrumentation and check the event flow > >>> again. > >> > >> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into > >> the architectural VCPU state, we will also trigger enable_irq_window. > >> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants > >> inject. That will send us into an endless loop. > >> > > Why would we trigger enable_irq_window()? enable_irq_window() triggers > > only if interrupt is pending in one of irq chips, not in architectural > > VCPU state. > > Precisely this is the case if an IRQ for L1 arrived while we tried to > enter L2 and caused the cancellation above. > But during next entry the cancelled interrupt is transfered from architectural VCPU state to VM_ENTRY_INTR_INFO_FIELD by inject_pending_event()->vmx_inject_irq(), so at the point where enable_irq_window() is called the state is exactly the same no matter whether we canceled interrupt or not during previous entry attempt. What am I missing? Oh may be I am missing that if we do not cancel interrupt then inject_pending_event() will skip if (vcpu->arch.interrupt.pending) .... and will inject interrupt from APIC that caused cancellation of previous entry, but then this is a bug since this new interrupt will overwrite the one that is still in VM_ENTRY_INTR_INFO_FIELD from previous entry attempt and there may be another pending interrupt in APIC anyway that will cause enable_irq_window() too. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-21 11:06, Gleb Natapov wrote: > On Thu, Feb 21, 2013 at 10:43:57AM +0100, Jan Kiszka wrote: >> On 2013-02-21 10:22, Gleb Natapov wrote: >>> On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: >>>> On 2013-02-20 18:24, Jan Kiszka wrote: >>>>> On 2013-02-20 18:01, Gleb Natapov wrote: >>>>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>>>>>> On 2013-02-20 15:14, Nadav Har'El wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> By the way, if you haven't seen my description of why the current code >>>>>>>> did what it did, take a look at >>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>>>>>>> Another description might also come in handy: >>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>>>>>>> >>>>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>>>>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>>>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>>>>>> to L2. >>>>>>>>> >>>>>>>>> One difference to SVM is that we always transfer the pending event >>>>>>>>> injection into the architectural state of the VCPU and then drop it from >>>>>>>>> there if it turns out that we left L2 to enter L1. >>>>>>>> >>>>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>>>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily >>>>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, >>>>>>>> and as usual after the entry the interrupt queue is cleared so next time >>>>>>>> around, when L1 one is really entered, the wrong injection is not attempted. >>>>>>>> >>>>>>>>> VMX and SVM are now identical in how they recover event injections from >>>>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>>>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>>>>>> idt_vectoring_info_field. >>>>>>>> >>>>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>>>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>>>>>> >>>>>>>> I didn't understand this last point. >>>>>>> >>>>>>> - prepare_vmcs02 sets event to be injected into L2 >>>>>>> - while trying to enter L2, a cancel condition is met >>>>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>>>>>> into the arch event queues - it's kept in vmcs12 >>>>>>> >>>>>> But what if we put it in arch event queue? It will be reinjected during >>>>>> next entry attempt, so nothing bad happens and we have one less if() to explain, >>>>>> or do I miss something terrible that will happen? >>>>> >>>>> I started without that if but ran into troubles with KVM-on-KVM (L1 >>>>> locks up). Let me dig out the instrumentation and check the event flow >>>>> again. >>>> >>>> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into >>>> the architectural VCPU state, we will also trigger enable_irq_window. >>>> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants >>>> inject. That will send us into an endless loop. >>>> >>> Why would we trigger enable_irq_window()? enable_irq_window() triggers >>> only if interrupt is pending in one of irq chips, not in architectural >>> VCPU state. >> >> Precisely this is the case if an IRQ for L1 arrived while we tried to >> enter L2 and caused the cancellation above. >> > But during next entry the cancelled interrupt is transfered > from architectural VCPU state to VM_ENTRY_INTR_INFO_FIELD by > inject_pending_event()->vmx_inject_irq(), so at the point where > enable_irq_window() is called the state is exactly the same no matter > whether we canceled interrupt or not during previous entry attempt. What > am I missing? Maybe that we normally either have an external IRQ pending in some IRQ chip or in the VCPU architectural state, not both at the same time? By transferring something that doesn't come from a virtual IRQ chip of L0 (but from the one in L1) into the architectural state, we break this assumption. > Oh may be I am missing that if we do not cancel interrupt > then inject_pending_event() will skip > if (vcpu->arch.interrupt.pending) > .... If we do not cancel, we will not inject at all (due to missing KVM_REQ_EVENT). > and will inject interrupt from APIC that caused cancellation of previous > entry, but then this is a bug since this new interrupt will overwrite > the one that is still in VM_ENTRY_INTR_INFO_FIELD from previous entry > attempt and there may be another pending interrupt in APIC anyway that > will cause enable_irq_window() too. Maybe the issue is that we do not properly simulate a VMEXIT on an external interrupt during vmrun (like SVM does). Need to check for this case again... Jan
On 2013-02-21 11:18, Jan Kiszka wrote: > On 2013-02-21 11:06, Gleb Natapov wrote: >> On Thu, Feb 21, 2013 at 10:43:57AM +0100, Jan Kiszka wrote: >>> On 2013-02-21 10:22, Gleb Natapov wrote: >>>> On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: >>>>> On 2013-02-20 18:24, Jan Kiszka wrote: >>>>>> On 2013-02-20 18:01, Gleb Natapov wrote: >>>>>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>>>>>>> On 2013-02-20 15:14, Nadav Har'El wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> By the way, if you haven't seen my description of why the current code >>>>>>>>> did what it did, take a look at >>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>>>>>>>> Another description might also come in handy: >>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>>>>>>>> >>>>>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>>>>>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>>>>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>>>>>>> to L2. >>>>>>>>>> >>>>>>>>>> One difference to SVM is that we always transfer the pending event >>>>>>>>>> injection into the architectural state of the VCPU and then drop it from >>>>>>>>>> there if it turns out that we left L2 to enter L1. >>>>>>>>> >>>>>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>>>>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily >>>>>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, >>>>>>>>> and as usual after the entry the interrupt queue is cleared so next time >>>>>>>>> around, when L1 one is really entered, the wrong injection is not attempted. >>>>>>>>> >>>>>>>>>> VMX and SVM are now identical in how they recover event injections from >>>>>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>>>>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>>>>>>> idt_vectoring_info_field. >>>>>>>>> >>>>>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>>>>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>>>>>>> >>>>>>>>> I didn't understand this last point. >>>>>>>> >>>>>>>> - prepare_vmcs02 sets event to be injected into L2 >>>>>>>> - while trying to enter L2, a cancel condition is met >>>>>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>>>>>>> into the arch event queues - it's kept in vmcs12 >>>>>>>> >>>>>>> But what if we put it in arch event queue? It will be reinjected during >>>>>>> next entry attempt, so nothing bad happens and we have one less if() to explain, >>>>>>> or do I miss something terrible that will happen? >>>>>> >>>>>> I started without that if but ran into troubles with KVM-on-KVM (L1 >>>>>> locks up). Let me dig out the instrumentation and check the event flow >>>>>> again. >>>>> >>>>> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into >>>>> the architectural VCPU state, we will also trigger enable_irq_window. >>>>> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants >>>>> inject. That will send us into an endless loop. >>>>> >>>> Why would we trigger enable_irq_window()? enable_irq_window() triggers >>>> only if interrupt is pending in one of irq chips, not in architectural >>>> VCPU state. >>> >>> Precisely this is the case if an IRQ for L1 arrived while we tried to >>> enter L2 and caused the cancellation above. >>> >> But during next entry the cancelled interrupt is transfered >> from architectural VCPU state to VM_ENTRY_INTR_INFO_FIELD by >> inject_pending_event()->vmx_inject_irq(), so at the point where >> enable_irq_window() is called the state is exactly the same no matter >> whether we canceled interrupt or not during previous entry attempt. What >> am I missing? > > Maybe that we normally either have an external IRQ pending in some IRQ > chip or in the VCPU architectural state, not both at the same time? By > transferring something that doesn't come from a virtual IRQ chip of L0 > (but from the one in L1) into the architectural state, we break this > assumption. > >> Oh may be I am missing that if we do not cancel interrupt >> then inject_pending_event() will skip >> if (vcpu->arch.interrupt.pending) >> .... > > If we do not cancel, we will not inject at all (due to missing > KVM_REQ_EVENT). > >> and will inject interrupt from APIC that caused cancellation of previous >> entry, but then this is a bug since this new interrupt will overwrite >> the one that is still in VM_ENTRY_INTR_INFO_FIELD from previous entry >> attempt and there may be another pending interrupt in APIC anyway that >> will cause enable_irq_window() too. > > Maybe the issue is that we do not properly simulate a VMEXIT on an > external interrupt during vmrun (like SVM does). Need to check for this > case again... static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (to_vmx(vcpu)->nested.nested_run_pending || (vmcs12->idt_vectoring_info_field & VECTORING_INFO_VALID_MASK)) return 0; nested_vmx_vmexit(vcpu); vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT; vmcs12->vm_exit_intr_info = 0; ... I do not understand ATM why we refuse to simulate a vmexit due to an external interrupt when we are about to run L2 or have something in idt_vectoring_info_field. The external interrupt would not overwrite idt_vectoring_info_field but should end up in vm_exit_intr_info. Jan
On 2013-02-21 11:28, Jan Kiszka wrote: > On 2013-02-21 11:18, Jan Kiszka wrote: >> On 2013-02-21 11:06, Gleb Natapov wrote: >>> On Thu, Feb 21, 2013 at 10:43:57AM +0100, Jan Kiszka wrote: >>>> On 2013-02-21 10:22, Gleb Natapov wrote: >>>>> On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: >>>>>> On 2013-02-20 18:24, Jan Kiszka wrote: >>>>>>> On 2013-02-20 18:01, Gleb Natapov wrote: >>>>>>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>>>>>>>> On 2013-02-20 15:14, Nadav Har'El wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> By the way, if you haven't seen my description of why the current code >>>>>>>>>> did what it did, take a look at >>>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>>>>>>>>> Another description might also come in handy: >>>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>>>>>>>>> >>>>>>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>>>>>>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>>>>>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>>>>>>>> to L2. >>>>>>>>>>> >>>>>>>>>>> One difference to SVM is that we always transfer the pending event >>>>>>>>>>> injection into the architectural state of the VCPU and then drop it from >>>>>>>>>>> there if it turns out that we left L2 to enter L1. >>>>>>>>>> >>>>>>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>>>>>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily >>>>>>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, >>>>>>>>>> and as usual after the entry the interrupt queue is cleared so next time >>>>>>>>>> around, when L1 one is really entered, the wrong injection is not attempted. >>>>>>>>>> >>>>>>>>>>> VMX and SVM are now identical in how they recover event injections from >>>>>>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>>>>>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>>>>>>>> idt_vectoring_info_field. >>>>>>>>>> >>>>>>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>>>>>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>>>>>>>> >>>>>>>>>> I didn't understand this last point. >>>>>>>>> >>>>>>>>> - prepare_vmcs02 sets event to be injected into L2 >>>>>>>>> - while trying to enter L2, a cancel condition is met >>>>>>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>>>>>>>> into the arch event queues - it's kept in vmcs12 >>>>>>>>> >>>>>>>> But what if we put it in arch event queue? It will be reinjected during >>>>>>>> next entry attempt, so nothing bad happens and we have one less if() to explain, >>>>>>>> or do I miss something terrible that will happen? >>>>>>> >>>>>>> I started without that if but ran into troubles with KVM-on-KVM (L1 >>>>>>> locks up). Let me dig out the instrumentation and check the event flow >>>>>>> again. >>>>>> >>>>>> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into >>>>>> the architectural VCPU state, we will also trigger enable_irq_window. >>>>>> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants >>>>>> inject. That will send us into an endless loop. >>>>>> >>>>> Why would we trigger enable_irq_window()? enable_irq_window() triggers >>>>> only if interrupt is pending in one of irq chips, not in architectural >>>>> VCPU state. >>>> >>>> Precisely this is the case if an IRQ for L1 arrived while we tried to >>>> enter L2 and caused the cancellation above. >>>> >>> But during next entry the cancelled interrupt is transfered >>> from architectural VCPU state to VM_ENTRY_INTR_INFO_FIELD by >>> inject_pending_event()->vmx_inject_irq(), so at the point where >>> enable_irq_window() is called the state is exactly the same no matter >>> whether we canceled interrupt or not during previous entry attempt. What >>> am I missing? >> >> Maybe that we normally either have an external IRQ pending in some IRQ >> chip or in the VCPU architectural state, not both at the same time? By >> transferring something that doesn't come from a virtual IRQ chip of L0 >> (but from the one in L1) into the architectural state, we break this >> assumption. >> >>> Oh may be I am missing that if we do not cancel interrupt >>> then inject_pending_event() will skip >>> if (vcpu->arch.interrupt.pending) >>> .... >> >> If we do not cancel, we will not inject at all (due to missing >> KVM_REQ_EVENT). >> >>> and will inject interrupt from APIC that caused cancellation of previous >>> entry, but then this is a bug since this new interrupt will overwrite >>> the one that is still in VM_ENTRY_INTR_INFO_FIELD from previous entry >>> attempt and there may be another pending interrupt in APIC anyway that >>> will cause enable_irq_window() too. >> >> Maybe the issue is that we do not properly simulate a VMEXIT on an >> external interrupt during vmrun (like SVM does). Need to check for this >> case again... > > static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) > { > if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) { > struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > if (to_vmx(vcpu)->nested.nested_run_pending || > (vmcs12->idt_vectoring_info_field & > VECTORING_INFO_VALID_MASK)) > return 0; > nested_vmx_vmexit(vcpu); > vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT; > vmcs12->vm_exit_intr_info = 0; > ... > > I do not understand ATM why we refuse to simulate a vmexit due to an > external interrupt when we are about to run L2 or have something in > idt_vectoring_info_field. The external interrupt would not overwrite > idt_vectoring_info_field but should end up in vm_exit_intr_info. Explained in 51cfe38ea5: idt_vectoring_info_field and vm_exit_intr_info must not be valid at the same time. Jan
On Thu, Feb 21, 2013 at 11:33:30AM +0100, Jan Kiszka wrote: > On 2013-02-21 11:28, Jan Kiszka wrote: > > On 2013-02-21 11:18, Jan Kiszka wrote: > >> On 2013-02-21 11:06, Gleb Natapov wrote: > >>> On Thu, Feb 21, 2013 at 10:43:57AM +0100, Jan Kiszka wrote: > >>>> On 2013-02-21 10:22, Gleb Natapov wrote: > >>>>> On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: > >>>>>> On 2013-02-20 18:24, Jan Kiszka wrote: > >>>>>>> On 2013-02-20 18:01, Gleb Natapov wrote: > >>>>>>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: > >>>>>>>>> On 2013-02-20 15:14, Nadav Har'El wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> By the way, if you haven't seen my description of why the current code > >>>>>>>>>> did what it did, take a look at > >>>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > >>>>>>>>>> Another description might also come in handy: > >>>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > >>>>>>>>>> > >>>>>>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > >>>>>>>>>>> This aligns VMX more with SVM regarding event injection and recovery for > >>>>>>>>>>> nested guests. The changes allow to inject interrupts directly from L0 > >>>>>>>>>>> to L2. > >>>>>>>>>>> > >>>>>>>>>>> One difference to SVM is that we always transfer the pending event > >>>>>>>>>>> injection into the architectural state of the VCPU and then drop it from > >>>>>>>>>>> there if it turns out that we left L2 to enter L1. > >>>>>>>>>> > >>>>>>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did > >>>>>>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily > >>>>>>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, > >>>>>>>>>> and as usual after the entry the interrupt queue is cleared so next time > >>>>>>>>>> around, when L1 one is really entered, the wrong injection is not attempted. > >>>>>>>>>> > >>>>>>>>>>> VMX and SVM are now identical in how they recover event injections from > >>>>>>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >>>>>>>>>>> still contains a valid event and, if yes, transfer the content into L1's > >>>>>>>>>>> idt_vectoring_info_field. > >>>>>>>>>> > >>>>>>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU > >>>>>>>>>>> state that L1 wants to inject, we skip cancellation on nested run. > >>>>>>>>>> > >>>>>>>>>> I didn't understand this last point. > >>>>>>>>> > >>>>>>>>> - prepare_vmcs02 sets event to be injected into L2 > >>>>>>>>> - while trying to enter L2, a cancel condition is met > >>>>>>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event > >>>>>>>>> into the arch event queues - it's kept in vmcs12 > >>>>>>>>> > >>>>>>>> But what if we put it in arch event queue? It will be reinjected during > >>>>>>>> next entry attempt, so nothing bad happens and we have one less if() to explain, > >>>>>>>> or do I miss something terrible that will happen? > >>>>>>> > >>>>>>> I started without that if but ran into troubles with KVM-on-KVM (L1 > >>>>>>> locks up). Let me dig out the instrumentation and check the event flow > >>>>>>> again. > >>>>>> > >>>>>> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into > >>>>>> the architectural VCPU state, we will also trigger enable_irq_window. > >>>>>> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants > >>>>>> inject. That will send us into an endless loop. > >>>>>> > >>>>> Why would we trigger enable_irq_window()? enable_irq_window() triggers > >>>>> only if interrupt is pending in one of irq chips, not in architectural > >>>>> VCPU state. > >>>> > >>>> Precisely this is the case if an IRQ for L1 arrived while we tried to > >>>> enter L2 and caused the cancellation above. > >>>> > >>> But during next entry the cancelled interrupt is transfered > >>> from architectural VCPU state to VM_ENTRY_INTR_INFO_FIELD by > >>> inject_pending_event()->vmx_inject_irq(), so at the point where > >>> enable_irq_window() is called the state is exactly the same no matter > >>> whether we canceled interrupt or not during previous entry attempt. What > >>> am I missing? > >> > >> Maybe that we normally either have an external IRQ pending in some IRQ > >> chip or in the VCPU architectural state, not both at the same time? By > >> transferring something that doesn't come from a virtual IRQ chip of L0 > >> (but from the one in L1) into the architectural state, we break this > >> assumption. > >> > >>> Oh may be I am missing that if we do not cancel interrupt > >>> then inject_pending_event() will skip > >>> if (vcpu->arch.interrupt.pending) > >>> .... > >> > >> If we do not cancel, we will not inject at all (due to missing > >> KVM_REQ_EVENT). > >> > >>> and will inject interrupt from APIC that caused cancellation of previous > >>> entry, but then this is a bug since this new interrupt will overwrite > >>> the one that is still in VM_ENTRY_INTR_INFO_FIELD from previous entry > >>> attempt and there may be another pending interrupt in APIC anyway that > >>> will cause enable_irq_window() too. > >> > >> Maybe the issue is that we do not properly simulate a VMEXIT on an > >> external interrupt during vmrun (like SVM does). Need to check for this > >> case again... > > > > static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) > > { > > if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) { > > struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > if (to_vmx(vcpu)->nested.nested_run_pending || > > (vmcs12->idt_vectoring_info_field & > > VECTORING_INFO_VALID_MASK)) > > return 0; > > nested_vmx_vmexit(vcpu); > > vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT; > > vmcs12->vm_exit_intr_info = 0; > > ... > > > > I do not understand ATM why we refuse to simulate a vmexit due to an > > external interrupt when we are about to run L2 or have something in > > idt_vectoring_info_field. The external interrupt would not overwrite > > idt_vectoring_info_field but should end up in vm_exit_intr_info. > > Explained in 51cfe38ea5: idt_vectoring_info_field and vm_exit_intr_info > must not be valid at the same time. > Interestingly, if we transfer interrupt from idt_vectoring_info into arch VCPU state we can drop this check because vmx_interrupt_allowed() will not be called while there is an event to reinject. 51cfe38ea5 still does not explain why nested_run_pending is needed. We cannot #vmexit without entering L2, but we can undo VMLAUNCH/VMRESUME emulation leaving rip pointing to the instruction. We can start by moving skip_emulated_instruction() from nested_vmx_run() to nested_vmx_vmexit(). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-02-21 14:13, Gleb Natapov wrote: > On Thu, Feb 21, 2013 at 11:33:30AM +0100, Jan Kiszka wrote: >> On 2013-02-21 11:28, Jan Kiszka wrote: >>> On 2013-02-21 11:18, Jan Kiszka wrote: >>>> On 2013-02-21 11:06, Gleb Natapov wrote: >>>>> On Thu, Feb 21, 2013 at 10:43:57AM +0100, Jan Kiszka wrote: >>>>>> On 2013-02-21 10:22, Gleb Natapov wrote: >>>>>>> On Wed, Feb 20, 2013 at 06:50:50PM +0100, Jan Kiszka wrote: >>>>>>>> On 2013-02-20 18:24, Jan Kiszka wrote: >>>>>>>>> On 2013-02-20 18:01, Gleb Natapov wrote: >>>>>>>>>> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>>>>>>>>>> On 2013-02-20 15:14, Nadav Har'El wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> By the way, if you haven't seen my description of why the current code >>>>>>>>>>>> did what it did, take a look at >>>>>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>>>>>>>>>>> Another description might also come in handy: >>>>>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >>>>>>>>>>>>> This aligns VMX more with SVM regarding event injection and recovery for >>>>>>>>>>>>> nested guests. The changes allow to inject interrupts directly from L0 >>>>>>>>>>>>> to L2. >>>>>>>>>>>>> >>>>>>>>>>>>> One difference to SVM is that we always transfer the pending event >>>>>>>>>>>>> injection into the architectural state of the VCPU and then drop it from >>>>>>>>>>>>> there if it turns out that we left L2 to enter L1. >>>>>>>>>>>> >>>>>>>>>>>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>>>>>>>>>>> something a bit different: After the exit from L2 to L1 and unnecessarily >>>>>>>>>>>> queuing the pending interrupt for injection, it skipped one entry into L1, >>>>>>>>>>>> and as usual after the entry the interrupt queue is cleared so next time >>>>>>>>>>>> around, when L1 one is really entered, the wrong injection is not attempted. >>>>>>>>>>>> >>>>>>>>>>>>> VMX and SVM are now identical in how they recover event injections from >>>>>>>>>>>>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>>>>>>>>>>>> still contains a valid event and, if yes, transfer the content into L1's >>>>>>>>>>>>> idt_vectoring_info_field. >>>>>>>>>>>> >>>>>>>>>>>>> To avoid that we incorrectly leak an event into the architectural VCPU >>>>>>>>>>>>> state that L1 wants to inject, we skip cancellation on nested run. >>>>>>>>>>>> >>>>>>>>>>>> I didn't understand this last point. >>>>>>>>>>> >>>>>>>>>>> - prepare_vmcs02 sets event to be injected into L2 >>>>>>>>>>> - while trying to enter L2, a cancel condition is met >>>>>>>>>>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>>>>>>>>>> into the arch event queues - it's kept in vmcs12 >>>>>>>>>>> >>>>>>>>>> But what if we put it in arch event queue? It will be reinjected during >>>>>>>>>> next entry attempt, so nothing bad happens and we have one less if() to explain, >>>>>>>>>> or do I miss something terrible that will happen? >>>>>>>>> >>>>>>>>> I started without that if but ran into troubles with KVM-on-KVM (L1 >>>>>>>>> locks up). Let me dig out the instrumentation and check the event flow >>>>>>>>> again. >>>>>>>> >>>>>>>> OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into >>>>>>>> the architectural VCPU state, we will also trigger enable_irq_window. >>>>>>>> And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants >>>>>>>> inject. That will send us into an endless loop. >>>>>>>> >>>>>>> Why would we trigger enable_irq_window()? enable_irq_window() triggers >>>>>>> only if interrupt is pending in one of irq chips, not in architectural >>>>>>> VCPU state. >>>>>> >>>>>> Precisely this is the case if an IRQ for L1 arrived while we tried to >>>>>> enter L2 and caused the cancellation above. >>>>>> >>>>> But during next entry the cancelled interrupt is transfered >>>>> from architectural VCPU state to VM_ENTRY_INTR_INFO_FIELD by >>>>> inject_pending_event()->vmx_inject_irq(), so at the point where >>>>> enable_irq_window() is called the state is exactly the same no matter >>>>> whether we canceled interrupt or not during previous entry attempt. What >>>>> am I missing? >>>> >>>> Maybe that we normally either have an external IRQ pending in some IRQ >>>> chip or in the VCPU architectural state, not both at the same time? By >>>> transferring something that doesn't come from a virtual IRQ chip of L0 >>>> (but from the one in L1) into the architectural state, we break this >>>> assumption. >>>> >>>>> Oh may be I am missing that if we do not cancel interrupt >>>>> then inject_pending_event() will skip >>>>> if (vcpu->arch.interrupt.pending) >>>>> .... >>>> >>>> If we do not cancel, we will not inject at all (due to missing >>>> KVM_REQ_EVENT). >>>> >>>>> and will inject interrupt from APIC that caused cancellation of previous >>>>> entry, but then this is a bug since this new interrupt will overwrite >>>>> the one that is still in VM_ENTRY_INTR_INFO_FIELD from previous entry >>>>> attempt and there may be another pending interrupt in APIC anyway that >>>>> will cause enable_irq_window() too. >>>> >>>> Maybe the issue is that we do not properly simulate a VMEXIT on an >>>> external interrupt during vmrun (like SVM does). Need to check for this >>>> case again... >>> >>> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) >>> { >>> if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) { >>> struct vmcs12 *vmcs12 = get_vmcs12(vcpu); >>> if (to_vmx(vcpu)->nested.nested_run_pending || >>> (vmcs12->idt_vectoring_info_field & >>> VECTORING_INFO_VALID_MASK)) >>> return 0; >>> nested_vmx_vmexit(vcpu); >>> vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT; >>> vmcs12->vm_exit_intr_info = 0; >>> ... >>> >>> I do not understand ATM why we refuse to simulate a vmexit due to an >>> external interrupt when we are about to run L2 or have something in >>> idt_vectoring_info_field. The external interrupt would not overwrite >>> idt_vectoring_info_field but should end up in vm_exit_intr_info. >> >> Explained in 51cfe38ea5: idt_vectoring_info_field and vm_exit_intr_info >> must not be valid at the same time. >> > Interestingly, if we transfer interrupt from idt_vectoring_info into > arch VCPU state we can drop this check because vmx_interrupt_allowed() > will not be called while there is an event to reinject. 51cfe38ea5 still > does not explain why nested_run_pending is needed. We cannot #vmexit > without entering L2, but we can undo VMLAUNCH/VMRESUME emulation leaving > rip pointing to the instruction. We can start by moving > skip_emulated_instruction() from nested_vmx_run() to nested_vmx_vmexit(). That generally does not help to inject/report an external IRQ to L1 as L1 runs with IRQs disabled around VMLAUNCH/RESUME. Thus, the only way to report this IRQ is a VMEXIT. I think the ordering is hard: first inject what L1 wants to send to L2, then VMEXIT with that external IRQ in VM_EXIT_INTR_INFO. Jan
On Thu, Feb 21, 2013, Gleb Natapov wrote about "Re: [PATCH] KVM: nVMX: Rework event injection and recovery": > will not be called while there is an event to reinject. 51cfe38ea5 still > does not explain why nested_run_pending is needed. We cannot #vmexit > without entering L2, but we can undo VMLAUNCH/VMRESUME emulation leaving > rip pointing to the instruction. We can start by moving > skip_emulated_instruction() from nested_vmx_run() to nested_vmx_vmexit(). This is a very interesting idea! Don't forget to also skip_emulated_instruction() in nested_vmx_entry_failure(). And please expand the comment at the end of nested_vmx_run(), saying that also skipping the instruction is done on exit, unless the instruction needs to be retried because we needed to inject an interrupt into L1 before running it. Whether this is actually clearer than the "nested_run_pending" approach I don't know.
On Thu, Feb 21, 2013, Jan Kiszka wrote about "Re: [PATCH] KVM: nVMX: Rework event injection and recovery": > That generally does not help to inject/report an external IRQ to L1 as > L1 runs with IRQs disabled around VMLAUNCH/RESUME. Good point, I forgot that :( So it looks like nested_run_pending was necessary, after all.
On Thu, Feb 21, 2013 at 03:37:16PM +0200, Nadav Har'El wrote: > On Thu, Feb 21, 2013, Jan Kiszka wrote about "Re: [PATCH] KVM: nVMX: Rework event injection and recovery": > > That generally does not help to inject/report an external IRQ to L1 as > > L1 runs with IRQs disabled around VMLAUNCH/RESUME. > > Good point, I forgot that :( > > So it looks like nested_run_pending was necessary, after all. > Not sure (as in "this is implementation detail that is possible to avoid", not as in "this check here is incorrect!" :)). If interrupts are disabled then vmx_interrupt_allowed() should return false because interrupts are disabled, not because we emulating guest entry. This is easier said that done though, but I'll think about it. Looks like SVM does it this way. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index dd3a8a0..7d2fbd2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { - if (is_guest_mode(&vmx->vcpu)) - return; __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, VM_EXIT_INSTRUCTION_LEN, IDT_VECTORING_ERROR_CODE); @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) static void vmx_cancel_injection(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) + if (to_vmx(vcpu)->nested.nested_run_pending) return; __vmx_complete_interrupts(to_vmx(vcpu), vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long debugctlmsr; - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - if (vmcs12->idt_vectoring_info_field & - VECTORING_INFO_VALID_MASK) { - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, - vmcs12->idt_vectoring_info_field); - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, - vmcs12->vm_exit_instruction_len); - if (vmcs12->idt_vectoring_info_field & - VECTORING_INFO_DELIVER_CODE_MASK) - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, - vmcs12->idt_vectoring_error_code); - } - } - /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) vmx->entry_time = ktime_get(); @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); - if (is_guest_mode(vcpu)) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { - vmcs12->idt_vectoring_error_code = - vmcs_read32(IDT_VECTORING_ERROR_CODE); - vmcs12->vm_exit_instruction_len = - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); - } - } - vmx->loaded_vmcs->launched = 1; vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); - /* clear vm-entry fields which are to be cleared on exit */ - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) + /* drop what we picked up for L0 via vmx_complete_interrupts */ + vcpu->arch.nmi_injected = false; + kvm_clear_exception_queue(vcpu); + kvm_clear_interrupt_queue(vcpu); + + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { + /* + * Preserve the event that was supposed to be injected + * by emulating it would have been returned in + * IDT_VECTORING_INFO_FIELD. + */ + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & + INTR_INFO_VALID_MASK) { + vmcs12->idt_vectoring_info_field = + vmcs12->vm_entry_intr_info_field; + vmcs12->idt_vectoring_error_code = + vmcs12->vm_entry_exception_error_code; + vmcs12->vm_exit_instruction_len = + vmcs12->vm_entry_instruction_len; + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); + } + + /* clear vm-entry fields which are to be cleared on exit */ vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK; + } } /*
This aligns VMX more with SVM regarding event injection and recovery for nested guests. The changes allow to inject interrupts directly from L0 to L2. One difference to SVM is that we always transfer the pending event injection into the architectural state of the VCPU and then drop it from there if it turns out that we left L2 to enter L1. VMX and SVM are now identical in how they recover event injections from unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD still contains a valid event and, if yes, transfer the content into L1's idt_vectoring_info_field. To avoid that we incorrectly leak an event into the architectural VCPU state that L1 wants to inject, we skip cancellation on nested run. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- Survived moderate testing here and (currently) makes sense to me, but please review very carefully. I wouldn't be surprised if I'm still missing some subtle corner case. arch/x86/kvm/vmx.c | 57 +++++++++++++++++++++++---------------------------- 1 files changed, 26 insertions(+), 31 deletions(-)