Message ID | 20201109095021.9897-3-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/x86: implement NMI continuation | expand |
On 09.11.2020 10:50, Juergen Gross wrote: > @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v) > model->free_msr(v); > } > > +bool nmi_oprofile_send_virq(void) > +{ > + struct vcpu *v = this_cpu(nmi_cont_vcpu); > + > + if ( v ) > + send_guest_vcpu_virq(v, VIRQ_XENOPROF); > + > + this_cpu(nmi_cont_vcpu) = NULL; What if, by the time we make it here, a 2nd NMI has arrived? I agree the next overflow interrupt shouldn't arrive this quickly, but I also think you want to zap the per-CPU variable first here, and ... > + > + return v; > +} > + > static int nmi_callback(const struct cpu_user_regs *regs, int cpu) > { > int xen_mode, ovf; > > ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); > xen_mode = ring_0(regs); > - if ( ovf && is_active(current->domain) && !xen_mode ) > - send_guest_vcpu_virq(current, VIRQ_XENOPROF); > + if ( ovf && is_active(current->domain) && !xen_mode ) { > + this_cpu(nmi_cont_vcpu) = current; ... avoid overwriting any non-NULL value here. That's then of course still not closing the window, but has (imo) overall better behavior. Also, style-wise, going through the file it looks to be mainly Linux style, so may I suggest your additions / changes to be done that way, rather than extending use of this funny mixed style? Jan
On 11.11.20 16:45, Jan Beulich wrote: > On 09.11.2020 10:50, Juergen Gross wrote: >> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v) >> model->free_msr(v); >> } >> >> +bool nmi_oprofile_send_virq(void) >> +{ >> + struct vcpu *v = this_cpu(nmi_cont_vcpu); >> + >> + if ( v ) >> + send_guest_vcpu_virq(v, VIRQ_XENOPROF); >> + >> + this_cpu(nmi_cont_vcpu) = NULL; > > What if, by the time we make it here, a 2nd NMI has arrived? I > agree the next overflow interrupt shouldn't arrive this > quickly, but I also think you want to zap the per-CPU variable > first here, and ... How could that happen? This function is activated only from NMI context in case the NMI happened in guest mode. And it will be executed with higher priority than any guest, so there is a zero chance another NMI in guest mode can happen in between. I can add a comment in this regard if you want. > >> + >> + return v; >> +} >> + >> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >> { >> int xen_mode, ovf; >> >> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >> xen_mode = ring_0(regs); >> - if ( ovf && is_active(current->domain) && !xen_mode ) >> - send_guest_vcpu_virq(current, VIRQ_XENOPROF); >> + if ( ovf && is_active(current->domain) && !xen_mode ) { >> + this_cpu(nmi_cont_vcpu) = current; > > ... avoid overwriting any non-NULL value here. That's then of > course still not closing the window, but has (imo) overall > better behavior. > > Also, style-wise, going through the file it looks to be mainly > Linux style, so may I suggest your additions / changes to be > done that way, rather than extending use of this funny mixed > style? Works for me. Juergen
On 11.11.2020 16:48, Jürgen Groß wrote: > On 11.11.20 16:45, Jan Beulich wrote: >> On 09.11.2020 10:50, Juergen Gross wrote: >>> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v) >>> model->free_msr(v); >>> } >>> >>> +bool nmi_oprofile_send_virq(void) >>> +{ >>> + struct vcpu *v = this_cpu(nmi_cont_vcpu); >>> + >>> + if ( v ) >>> + send_guest_vcpu_virq(v, VIRQ_XENOPROF); >>> + >>> + this_cpu(nmi_cont_vcpu) = NULL; >> >> What if, by the time we make it here, a 2nd NMI has arrived? I >> agree the next overflow interrupt shouldn't arrive this >> quickly, but I also think you want to zap the per-CPU variable >> first here, and ... > > How could that happen? This function is activated only from NMI > context in case the NMI happened in guest mode. And it will be > executed with higher priority than any guest, so there is a zero > chance another NMI in guest mode can happen in between. While I'll admit I didn't pay attention to the bogus (as far as HVM is concerned) xen_mode check, my understanding is that the self-IPI will be delivered once we're back in guest mode, as that's the first time IRQs would be on again (even event checking gets deferred by sending a self-IPI). If another NMI was latched by that time, it would take precedence over the IRQ and would also be delivered on the guest mode insn that the IRET returned to. I agree though that this is benign, as the vCPU wouldn't have been context switched out yet, i.e. current is still the same and there'll then merely be two NMI instances folded into one. However, I still think the ordering would better be changed, to set a good precedent. >>> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >>> { >>> int xen_mode, ovf; >>> >>> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >>> xen_mode = ring_0(regs); Unrelated to the patch here (i.e. just as an observation), this use of ring_0() looks bogus when the NMI occurred in HVM guest mode. Jan
On 12.11.20 11:23, Jan Beulich wrote: > On 11.11.2020 16:48, Jürgen Groß wrote: >> On 11.11.20 16:45, Jan Beulich wrote: >>> On 09.11.2020 10:50, Juergen Gross wrote: >>>> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v) >>>> model->free_msr(v); >>>> } >>>> >>>> +bool nmi_oprofile_send_virq(void) >>>> +{ >>>> + struct vcpu *v = this_cpu(nmi_cont_vcpu); >>>> + >>>> + if ( v ) >>>> + send_guest_vcpu_virq(v, VIRQ_XENOPROF); >>>> + >>>> + this_cpu(nmi_cont_vcpu) = NULL; >>> >>> What if, by the time we make it here, a 2nd NMI has arrived? I >>> agree the next overflow interrupt shouldn't arrive this >>> quickly, but I also think you want to zap the per-CPU variable >>> first here, and ... >> >> How could that happen? This function is activated only from NMI >> context in case the NMI happened in guest mode. And it will be >> executed with higher priority than any guest, so there is a zero >> chance another NMI in guest mode can happen in between. > > While I'll admit I didn't pay attention to the bogus (as far as > HVM is concerned) xen_mode check, my understanding is that the > self-IPI will be delivered once we're back in guest mode, as > that's the first time IRQs would be on again (even event checking > gets deferred by sending a self-IPI). If another NMI was latched > by that time, it would take precedence over the IRQ and would > also be delivered on the guest mode insn that the IRET returned > to. > > I agree though that this is benign, as the vCPU wouldn't have > been context switched out yet, i.e. current is still the same > and there'll then merely be two NMI instances folded into one. Correct. > > However, I still think the ordering would better be changed, to > set a good precedent. Okay, if you want that. > >>>> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >>>> { >>>> int xen_mode, ovf; >>>> >>>> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >>>> xen_mode = ring_0(regs); > > Unrelated to the patch here (i.e. just as an observation), this > use of ring_0() looks bogus when the NMI occurred in HVM guest > mode. An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI reason, or just be handled completely inside the guest, right? I don't see how this test should ever result in xen_mode being false for an HVM guest. Juergen
On 12.11.2020 11:48, Jürgen Groß wrote: > On 12.11.20 11:23, Jan Beulich wrote: >> On 11.11.2020 16:48, Jürgen Groß wrote: >>> On 11.11.20 16:45, Jan Beulich wrote: >>>> On 09.11.2020 10:50, Juergen Gross wrote: >>>>> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >>>>> { >>>>> int xen_mode, ovf; >>>>> >>>>> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >>>>> xen_mode = ring_0(regs); >> >> Unrelated to the patch here (i.e. just as an observation), this >> use of ring_0() looks bogus when the NMI occurred in HVM guest >> mode. > > An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI > reason, or just be handled completely inside the guest, right? Yes, and in the former case for VMX it would be handed on to do_nmi(), with the guest register state. For SVM it would get handled on the next STGI, i.e. would indeed never surface from HVM guest mode. > I don't see how this test should ever result in xen_mode being > false for an HVM guest. I think, because of hvm_invalidate_regs_fields(), on VMX it would be consistently true in release builds and consistently false in debug ones. Jan
On 12.11.20 12:05, Jan Beulich wrote: > On 12.11.2020 11:48, Jürgen Groß wrote: >> On 12.11.20 11:23, Jan Beulich wrote: >>> On 11.11.2020 16:48, Jürgen Groß wrote: >>>> On 11.11.20 16:45, Jan Beulich wrote: >>>>> On 09.11.2020 10:50, Juergen Gross wrote: >>>>>> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >>>>>> { >>>>>> int xen_mode, ovf; >>>>>> >>>>>> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >>>>>> xen_mode = ring_0(regs); >>> >>> Unrelated to the patch here (i.e. just as an observation), this >>> use of ring_0() looks bogus when the NMI occurred in HVM guest >>> mode. >> >> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI >> reason, or just be handled completely inside the guest, right? > > Yes, and in the former case for VMX it would be handed on to do_nmi(), > with the guest register state. For SVM it would get handled on the > next STGI, i.e. would indeed never surface from HVM guest mode. > >> I don't see how this test should ever result in xen_mode being >> false for an HVM guest. > > I think, because of hvm_invalidate_regs_fields(), on VMX it would be > consistently true in release builds and consistently false in debug > ones. Ah, okay. I searched for do_nmi(), but the vmx code uses the exception table instead. So I guess this should be: xen_mode = !guest_mode(regs); Juergen
On 12.11.2020 12:27, Jürgen Groß wrote: > On 12.11.20 12:05, Jan Beulich wrote: >> On 12.11.2020 11:48, Jürgen Groß wrote: >>> On 12.11.20 11:23, Jan Beulich wrote: >>>> On 11.11.2020 16:48, Jürgen Groß wrote: >>>>> On 11.11.20 16:45, Jan Beulich wrote: >>>>>> On 09.11.2020 10:50, Juergen Gross wrote: >>>>>>> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >>>>>>> { >>>>>>> int xen_mode, ovf; >>>>>>> >>>>>>> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >>>>>>> xen_mode = ring_0(regs); >>>> >>>> Unrelated to the patch here (i.e. just as an observation), this >>>> use of ring_0() looks bogus when the NMI occurred in HVM guest >>>> mode. >>> >>> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI >>> reason, or just be handled completely inside the guest, right? >> >> Yes, and in the former case for VMX it would be handed on to do_nmi(), >> with the guest register state. For SVM it would get handled on the >> next STGI, i.e. would indeed never surface from HVM guest mode. >> >>> I don't see how this test should ever result in xen_mode being >>> false for an HVM guest. >> >> I think, because of hvm_invalidate_regs_fields(), on VMX it would be >> consistently true in release builds and consistently false in debug >> ones. > > Ah, okay. I searched for do_nmi(), but the vmx code uses the exception > table instead. > > So I guess this should be: > > xen_mode = !guest_mode(regs); Yes, I think so. Just that guest_mode() also has its issues (my patch "x86: refine guest_mode()" improving it at least some is still pending Andrew's go / no-go / improvement suggestions), so whether it's suitable to use here may need some careful evaluation. Jan
On 12.11.20 12:36, Jan Beulich wrote: > On 12.11.2020 12:27, Jürgen Groß wrote: >> On 12.11.20 12:05, Jan Beulich wrote: >>> On 12.11.2020 11:48, Jürgen Groß wrote: >>>> On 12.11.20 11:23, Jan Beulich wrote: >>>>> On 11.11.2020 16:48, Jürgen Groß wrote: >>>>>> On 11.11.20 16:45, Jan Beulich wrote: >>>>>>> On 09.11.2020 10:50, Juergen Gross wrote: >>>>>>>> static int nmi_callback(const struct cpu_user_regs *regs, int cpu) >>>>>>>> { >>>>>>>> int xen_mode, ovf; >>>>>>>> >>>>>>>> ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); >>>>>>>> xen_mode = ring_0(regs); >>>>> >>>>> Unrelated to the patch here (i.e. just as an observation), this >>>>> use of ring_0() looks bogus when the NMI occurred in HVM guest >>>>> mode. >>>> >>>> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI >>>> reason, or just be handled completely inside the guest, right? >>> >>> Yes, and in the former case for VMX it would be handed on to do_nmi(), >>> with the guest register state. For SVM it would get handled on the >>> next STGI, i.e. would indeed never surface from HVM guest mode. >>> >>>> I don't see how this test should ever result in xen_mode being >>>> false for an HVM guest. >>> >>> I think, because of hvm_invalidate_regs_fields(), on VMX it would be >>> consistently true in release builds and consistently false in debug >>> ones. >> >> Ah, okay. I searched for do_nmi(), but the vmx code uses the exception >> table instead. >> >> So I guess this should be: >> >> xen_mode = !guest_mode(regs); > > Yes, I think so. Just that guest_mode() also has its issues (my patch > "x86: refine guest_mode()" improving it at least some is still pending > Andrew's go / no-go / improvement suggestions), so whether it's > suitable to use here may need some careful evaluation. I'll leave the test as is for now. We can revisit it when your patch has been committed. Juergen
diff --git a/xen/arch/x86/oprofile/nmi_int.c b/xen/arch/x86/oprofile/nmi_int.c index 0f103d80a6..5f17cba0b2 100644 --- a/xen/arch/x86/oprofile/nmi_int.c +++ b/xen/arch/x86/oprofile/nmi_int.c @@ -38,6 +38,8 @@ static unsigned long saved_lvtpc[NR_CPUS]; static char *cpu_type; +static DEFINE_PER_CPU(struct vcpu *, nmi_cont_vcpu); + static int passive_domain_msr_op_checks(unsigned int msr, int *typep, int *indexp) { struct vpmu_struct *vpmu = vcpu_vpmu(current); @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v) model->free_msr(v); } +bool nmi_oprofile_send_virq(void) +{ + struct vcpu *v = this_cpu(nmi_cont_vcpu); + + if ( v ) + send_guest_vcpu_virq(v, VIRQ_XENOPROF); + + this_cpu(nmi_cont_vcpu) = NULL; + + return v; +} + static int nmi_callback(const struct cpu_user_regs *regs, int cpu) { int xen_mode, ovf; ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs); xen_mode = ring_0(regs); - if ( ovf && is_active(current->domain) && !xen_mode ) - send_guest_vcpu_virq(current, VIRQ_XENOPROF); + if ( ovf && is_active(current->domain) && !xen_mode ) { + this_cpu(nmi_cont_vcpu) = current; + trigger_nmi_continuation(); + } if ( ovf == 2 ) current->arch.nmi_pending = true; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 5005ac6e6e..7cb7d7e09c 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -65,6 +65,7 @@ #include <asm/debugger.h> #include <asm/msr.h> #include <asm/nmi.h> +#include <asm/xenoprof.h> #include <asm/shared.h> #include <asm/x86_emulate.h> #include <asm/traps.h> @@ -1804,6 +1805,9 @@ bool nmi_check_continuation(void) { bool ret = false; + if ( nmi_oprofile_send_virq() ) + ret = true; + return ret; } diff --git a/xen/include/asm-x86/xenoprof.h b/xen/include/asm-x86/xenoprof.h index 1026ba2e1f..cf6af8c5df 100644 --- a/xen/include/asm-x86/xenoprof.h +++ b/xen/include/asm-x86/xenoprof.h @@ -69,6 +69,8 @@ int passive_domain_do_rdmsr(unsigned int msr, uint64_t *msr_content); int passive_domain_do_wrmsr(unsigned int msr, uint64_t msr_content); void passive_domain_destroy(struct vcpu *v); +bool nmi_oprofile_send_virq(void); + #else static inline int passive_domain_do_rdmsr(unsigned int msr, @@ -85,6 +87,11 @@ static inline int passive_domain_do_wrmsr(unsigned int msr, static inline void passive_domain_destroy(struct vcpu *v) {} +static inline bool nmi_oprofile_send_virq(void) +{ + return false; +} + #endif /* CONFIG_XENOPROF */ #endif /* __ASM_X86_XENOPROF_H__ */
Instead of calling send_guest_vcpu_virq() from NMI context use the NMI continuation framework for that purpose. This avoids taking locks in NMI mode. Signed-off-by: Juergen Gross <jgross@suse.com> --- V4: - rework to less generic approach --- xen/arch/x86/oprofile/nmi_int.c | 20 ++++++++++++++++++-- xen/arch/x86/traps.c | 4 ++++ xen/include/asm-x86/xenoprof.h | 7 +++++++ 3 files changed, 29 insertions(+), 2 deletions(-)