Message ID | be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK | expand |
On Fri, May 19, 2023, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> > > While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware > I noticed that with vCPU count large enough (> 16) they sometimes froze at > boot. > With vCPU count of 64 they never booted successfully - suggesting some kind > of a race condition. > > Since adding "vnmi=0" module parameter made these guests boot successfully > it was clear that the problem is most likely (v)NMI-related. > > Running kvm-unit-tests quickly showed failing NMI-related tests cases, like > "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests > and the NMI parts of eventinj test. > > The issue was that once one NMI was being serviced no other NMI was allowed > to be set pending (NMI limit = 0), which was traced to > svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather > than for the "NMI pending" flag. > > Fix this by testing for the right flag in svm_is_vnmi_pending(). > Once this is done, the NMI-related kvm-unit-tests pass successfully and > the Windows guest no longer freezes at boot. > > Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> > --- > > It's a bit sad that no-one apparently tested the vNMI patchset with > kvm-unit-tests on an actual vNMI-enabled hardware... That's one way to put it. Santosh, what happened? This goof was present in both v3 and v4, i.e. it wasn't something that we botched when applying/massaging at the last minute. And the cover letters for both v3 and v4 state "Series ... tested on AMD EPYC-Genoa".
Hi Sean and Maciej, On 5/19/2023 9:21 PM, Sean Christopherson wrote: > On Fri, May 19, 2023, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >> >> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware >> I noticed that with vCPU count large enough (> 16) they sometimes froze at >> boot. >> With vCPU count of 64 they never booted successfully - suggesting some kind >> of a race condition. >> >> Since adding "vnmi=0" module parameter made these guests boot successfully >> it was clear that the problem is most likely (v)NMI-related. >> >> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like >> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests >> and the NMI parts of eventinj test. >> >> The issue was that once one NMI was being serviced no other NMI was allowed >> to be set pending (NMI limit = 0), which was traced to >> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather >> than for the "NMI pending" flag. >> >> Fix this by testing for the right flag in svm_is_vnmi_pending(). >> Once this is done, the NMI-related kvm-unit-tests pass successfully and >> the Windows guest no longer freezes at boot. >> >> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") >> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > > Reviewed-by: Sean Christopherson <seanjc@google.com> > >> --- >> >> It's a bit sad that no-one apparently tested the vNMI patchset with >> kvm-unit-tests on an actual vNMI-enabled hardware... > > That's one way to put it. > > Santosh, what happened? This goof was present in both v3 and v4, i.e. it wasn't > something that we botched when applying/massaging at the last minute. And the > cover letters for both v3 and v4 state "Series ... tested on AMD EPYC-Genoa". My bad that I only ran svm_test with vnmi in past using Sean's KUT branch remotes/sean-kut/svm/vnmi_test and saw that vnmi test was passing. Here are the logs: --- PASS: vNMI enabled but NMI_INTERCEPT unset! PASS: vNMI with vector 2 not injected PASS: VNMI serviced PASS: vnmi --- However when I ran mentioned tests by Maciej, I do see the failure. Thanks for this pointing out. Reviewed-by : Santosh Shukla <Santosh.Shukla@amd.com> Best Regards, Santosh
On 19.05.2023 17:51, Sean Christopherson wrote: > On Fri, May 19, 2023, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >> >> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware >> I noticed that with vCPU count large enough (> 16) they sometimes froze at >> boot. >> With vCPU count of 64 they never booted successfully - suggesting some kind >> of a race condition. >> >> Since adding "vnmi=0" module parameter made these guests boot successfully >> it was clear that the problem is most likely (v)NMI-related. >> >> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like >> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests >> and the NMI parts of eventinj test. >> >> The issue was that once one NMI was being serviced no other NMI was allowed >> to be set pending (NMI limit = 0), which was traced to >> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather >> than for the "NMI pending" flag. >> >> Fix this by testing for the right flag in svm_is_vnmi_pending(). >> Once this is done, the NMI-related kvm-unit-tests pass successfully and >> the Windows guest no longer freezes at boot. >> >> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") >> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > > Reviewed-by: Sean Christopherson <seanjc@google.com> > I can't see this in kvm/kvm.git trees or the kvm-x86 ones on GitHub - is this patch planned to be picked up for -rc5 soon? Technically, just knowing the final commit id would be sufficit for my purposes. Thanks, Maciej
On Thu, Jun 01, 2023, Maciej S. Szmigiero wrote: > On 19.05.2023 17:51, Sean Christopherson wrote: > > On Fri, May 19, 2023, Maciej S. Szmigiero wrote: > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> > > > > > > While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware > > > I noticed that with vCPU count large enough (> 16) they sometimes froze at > > > boot. > > > With vCPU count of 64 they never booted successfully - suggesting some kind > > > of a race condition. > > > > > > Since adding "vnmi=0" module parameter made these guests boot successfully > > > it was clear that the problem is most likely (v)NMI-related. > > > > > > Running kvm-unit-tests quickly showed failing NMI-related tests cases, like > > > "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests > > > and the NMI parts of eventinj test. > > > > > > The issue was that once one NMI was being serviced no other NMI was allowed > > > to be set pending (NMI limit = 0), which was traced to > > > svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather > > > than for the "NMI pending" flag. > > > > > > Fix this by testing for the right flag in svm_is_vnmi_pending(). > > > Once this is done, the NMI-related kvm-unit-tests pass successfully and > > > the Windows guest no longer freezes at boot. > > > > > > Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") > > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > I can't see this in kvm/kvm.git trees or the kvm-x86 ones on GitHub - > is this patch planned to be picked up for -rc5 soon? > > Technically, just knowing the final commit id would be sufficit for my > purposes. If Paolo doesn't pick it up by tomorrow, I'll apply it and send a fixes pull request for -rc5.
On 1.06.2023 20:04, Sean Christopherson wrote: > On Thu, Jun 01, 2023, Maciej S. Szmigiero wrote: >> On 19.05.2023 17:51, Sean Christopherson wrote: >>> On Fri, May 19, 2023, Maciej S. Szmigiero wrote: >>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>> >>>> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware >>>> I noticed that with vCPU count large enough (> 16) they sometimes froze at >>>> boot. >>>> With vCPU count of 64 they never booted successfully - suggesting some kind >>>> of a race condition. >>>> >>>> Since adding "vnmi=0" module parameter made these guests boot successfully >>>> it was clear that the problem is most likely (v)NMI-related. >>>> >>>> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like >>>> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests >>>> and the NMI parts of eventinj test. >>>> >>>> The issue was that once one NMI was being serviced no other NMI was allowed >>>> to be set pending (NMI limit = 0), which was traced to >>>> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather >>>> than for the "NMI pending" flag. >>>> >>>> Fix this by testing for the right flag in svm_is_vnmi_pending(). >>>> Once this is done, the NMI-related kvm-unit-tests pass successfully and >>>> the Windows guest no longer freezes at boot. >>>> >>>> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") >>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>> >>> Reviewed-by: Sean Christopherson <seanjc@google.com> >>> >> >> I can't see this in kvm/kvm.git trees or the kvm-x86 ones on GitHub - >> is this patch planned to be picked up for -rc5 soon? >> >> Technically, just knowing the final commit id would be sufficit for my >> purposes. > > If Paolo doesn't pick it up by tomorrow, I'll apply it and send a fixes pull > request for -rc5. Thanks Sean.
On Fri, 19 May 2023 13:26:18 +0200, Maciej S. Szmigiero wrote: > While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware > I noticed that with vCPU count large enough (> 16) they sometimes froze at > boot. > With vCPU count of 64 they never booted successfully - suggesting some kind > of a race condition. > > Since adding "vnmi=0" module parameter made these guests boot successfully > it was clear that the problem is most likely (v)NMI-related. > > [...] Applied to kvm-x86 fixes, thanks! [1/1] KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK https://github.com/kvm-x86/linux/commit/b2ce89978889 -- https://github.com/kvm-x86/linux/tree/next https://github.com/kvm-x86/linux/tree/fixes
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index ca32389f3c36..54089f990c8f 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3510,7 +3510,7 @@ static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu) if (!is_vnmi_enabled(svm)) return false; - return !!(svm->vmcb->control.int_ctl & V_NMI_BLOCKING_MASK); + return !!(svm->vmcb->control.int_ctl & V_NMI_PENDING_MASK); } static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)