Message ID | 20210917084559.22673-5-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/trace: Fix leakage of uninitialised stack into the tracebuffer | expand |
On 17.09.2021 10:45, Andrew Cooper wrote: > It is pointless to write all 6 entries and only consume the useful subset. > bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(), > weighing in at 11% of the function arranging unread zeroes on the stack, and > 8% for svm_vmexit_handler(). > > add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867) > Function old new delta > hvm_msr_write_intercept 1049 1033 -16 > vmx_enable_intr_window 238 214 -24 > svm_enable_intr_window 337 313 -24 > hvmemul_write_xcr 115 91 -24 > hvmemul_write_cr 350 326 -24 > hvmemul_read_xcr 115 91 -24 > hvmemul_read_cr 146 122 -24 > hvm_mov_to_cr 438 414 -24 > hvm_mov_from_cr 253 229 -24 > vmx_intr_assist 1150 1118 -32 > svm_intr_assist 459 427 -32 > hvm_rdtsc_intercept 138 106 -32 > hvm_msr_read_intercept 898 866 -32 > vmx_vmenter_helper 1142 1094 -48 > vmx_inject_event 813 765 -48 > svm_vmenter_helper 238 190 -48 > hvm_hlt 197 146 -51 > svm_inject_event 1678 1614 -64 > svm_vmexit_handler 5880 5416 -464 > vmx_vmexit_handler 7281 6473 -808 > Total: Before=3644184, After=3642317, chg -0.05% > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > Normally I wouldn't recommend patches like this for backport, but > {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines > dropped... The change in size is indeed unexpectedly large for these two functions. However, what I find puzzling is that TRACEBUFFER is enabled by default (i.e. also in release builds) in the first place, and that it can only be disabled when EXPERT. More gains could be had towards dropped code if the option wasn't on by default in at least release builds. "Debugging or performance analysis" (as its help text says) after all isn't a primary target of release builds. IOW what I'd prefer to consider a backport candidate would be a patch changing the option's default. Thoughts? > --- a/xen/include/asm-x86/hvm/trace.h > +++ b/xen/include/asm-x86/hvm/trace.h > @@ -67,38 +67,30 @@ > #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \ > TRACE_6D(_e, d1, d2, d3, d4) > > -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \ > +#define HVMTRACE_ND(evt, modifier, cycles, ...) \ > do { \ > if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt ) \ > { \ > - struct { \ > - u32 d[6]; \ > - } _d; \ > - _d.d[0]=(d1); \ > - _d.d[1]=(d2); \ > - _d.d[2]=(d3); \ > - _d.d[3]=(d4); \ > - _d.d[4]=(d5); \ > - _d.d[5]=(d6); \ > + uint32_t _d[] = { __VA_ARGS__ }; \ > __trace_var(TRC_HVM_ ## evt | (modifier), cycles, \ > - sizeof(*_d.d) * count, &_d); \ > + sizeof(_d), _d); \ > } \ > } while(0) > > #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6) \ > - HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6) > + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6) > #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5) \ > - HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5, 0) > + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5) > #define HVMTRACE_4D(evt, d1, d2, d3, d4) \ > - HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4, 0, 0) > + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4) > #define HVMTRACE_3D(evt, d1, d2, d3) \ > - HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3, 0, 0, 0) > + HVMTRACE_ND(evt, 0, 0, d1, d2, d3) > #define HVMTRACE_2D(evt, d1, d2) \ > - HVMTRACE_ND(evt, 0, 0, 2, d1, d2, 0, 0, 0, 0) > + HVMTRACE_ND(evt, 0, 0, d1, d2) > #define HVMTRACE_1D(evt, d1) \ > - HVMTRACE_ND(evt, 0, 0, 1, d1, 0, 0, 0, 0, 0) > + HVMTRACE_ND(evt, 0, 0, d1) > #define HVMTRACE_0D(evt) \ > - HVMTRACE_ND(evt, 0, 0, 0, 0, 0, 0, 0, 0, 0) > + HVMTRACE_ND(evt, 0, 0) These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down the road we will want to have just a single wrapper macro adding the modifier and cycles arguments, otherwise making use of variable arguments as well? Jan
On 20/09/2021 10:05, Jan Beulich wrote: > On 17.09.2021 10:45, Andrew Cooper wrote: >> It is pointless to write all 6 entries and only consume the useful subset. >> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(), >> weighing in at 11% of the function arranging unread zeroes on the stack, and >> 8% for svm_vmexit_handler(). >> >> add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867) >> Function old new delta >> hvm_msr_write_intercept 1049 1033 -16 >> vmx_enable_intr_window 238 214 -24 >> svm_enable_intr_window 337 313 -24 >> hvmemul_write_xcr 115 91 -24 >> hvmemul_write_cr 350 326 -24 >> hvmemul_read_xcr 115 91 -24 >> hvmemul_read_cr 146 122 -24 >> hvm_mov_to_cr 438 414 -24 >> hvm_mov_from_cr 253 229 -24 >> vmx_intr_assist 1150 1118 -32 >> svm_intr_assist 459 427 -32 >> hvm_rdtsc_intercept 138 106 -32 >> hvm_msr_read_intercept 898 866 -32 >> vmx_vmenter_helper 1142 1094 -48 >> vmx_inject_event 813 765 -48 >> svm_vmenter_helper 238 190 -48 >> hvm_hlt 197 146 -51 >> svm_inject_event 1678 1614 -64 >> svm_vmexit_handler 5880 5416 -464 >> vmx_vmexit_handler 7281 6473 -808 >> Total: Before=3644184, After=3642317, chg -0.05% >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks, but this is buggy. There are direct callers of HVMTRACE_ND() which need adjustments too. There is also a further optimisation for the 0 case which drops even more. > >> Normally I wouldn't recommend patches like this for backport, but >> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines >> dropped... > The change in size is indeed unexpectedly large for these two functions. > However, what I find puzzling is that TRACEBUFFER is enabled by default > (i.e. also in release builds) in the first place, and that it can only > be disabled when EXPERT. Its not surprising in the slightest. TRACEBUFFER long predates Kconfig. > More gains could be had towards dropped code if > the option wasn't on by default in at least release builds. "Debugging > or performance analysis" (as its help text says) after all isn't a > primary target of release builds. All performance analysis needs doing on release builds. > IOW what I'd prefer to consider a backport candidate would be a patch > changing the option's default. Thoughts? I very much doubt that XenServer are the only people who use xentrace in customer environments. I'm -1 to changing the default in staging, and firmly against doing so in older releases. >> --- a/xen/include/asm-x86/hvm/trace.h >> +++ b/xen/include/asm-x86/hvm/trace.h >> @@ -67,38 +67,30 @@ >> #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \ >> TRACE_6D(_e, d1, d2, d3, d4) >> >> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \ >> +#define HVMTRACE_ND(evt, modifier, cycles, ...) \ >> do { \ >> if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt ) \ >> { \ >> - struct { \ >> - u32 d[6]; \ >> - } _d; \ >> - _d.d[0]=(d1); \ >> - _d.d[1]=(d2); \ >> - _d.d[2]=(d3); \ >> - _d.d[3]=(d4); \ >> - _d.d[4]=(d5); \ >> - _d.d[5]=(d6); \ >> + uint32_t _d[] = { __VA_ARGS__ }; \ >> __trace_var(TRC_HVM_ ## evt | (modifier), cycles, \ >> - sizeof(*_d.d) * count, &_d); \ >> + sizeof(_d), _d); \ >> } \ >> } while(0) >> >> #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6) \ >> - HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6) >> #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5) \ >> - HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5) >> #define HVMTRACE_4D(evt, d1, d2, d3, d4) \ >> - HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4) >> #define HVMTRACE_3D(evt, d1, d2, d3) \ >> - HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3) >> #define HVMTRACE_2D(evt, d1, d2) \ >> - HVMTRACE_ND(evt, 0, 0, 2, d1, d2, 0, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2) >> #define HVMTRACE_1D(evt, d1) \ >> - HVMTRACE_ND(evt, 0, 0, 1, d1, 0, 0, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1) >> #define HVMTRACE_0D(evt) \ >> - HVMTRACE_ND(evt, 0, 0, 0, 0, 0, 0, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0) > These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down > the road we will want to have just a single wrapper macro adding the > modifier and cycles arguments, otherwise making use of variable > arguments as well? Same on the plain TRACE() side. There is an awful lot of cleanup to do here. Other findings include HVM records using the non-HVM helpers (to have cycles included), and examples such as vpic_ack_pending_irq() which duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be written out. ~Andrew
On 20.09.2021 13:02, Andrew Cooper wrote: > On 20/09/2021 10:05, Jan Beulich wrote: >> On 17.09.2021 10:45, Andrew Cooper wrote: >>> It is pointless to write all 6 entries and only consume the useful subset. >>> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(), >>> weighing in at 11% of the function arranging unread zeroes on the stack, and >>> 8% for svm_vmexit_handler(). >>> >>> add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867) >>> Function old new delta >>> hvm_msr_write_intercept 1049 1033 -16 >>> vmx_enable_intr_window 238 214 -24 >>> svm_enable_intr_window 337 313 -24 >>> hvmemul_write_xcr 115 91 -24 >>> hvmemul_write_cr 350 326 -24 >>> hvmemul_read_xcr 115 91 -24 >>> hvmemul_read_cr 146 122 -24 >>> hvm_mov_to_cr 438 414 -24 >>> hvm_mov_from_cr 253 229 -24 >>> vmx_intr_assist 1150 1118 -32 >>> svm_intr_assist 459 427 -32 >>> hvm_rdtsc_intercept 138 106 -32 >>> hvm_msr_read_intercept 898 866 -32 >>> vmx_vmenter_helper 1142 1094 -48 >>> vmx_inject_event 813 765 -48 >>> svm_vmenter_helper 238 190 -48 >>> hvm_hlt 197 146 -51 >>> svm_inject_event 1678 1614 -64 >>> svm_vmexit_handler 5880 5416 -464 >>> vmx_vmexit_handler 7281 6473 -808 >>> Total: Before=3644184, After=3642317, chg -0.05% >>> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> > > Thanks, but this is buggy. There are direct callers of HVMTRACE_ND() > which need adjustments too. Is this really "buggy" and "need" and not merely "incomplete" and "want"? (Just for my own understanding in terms of what I may have overlooked.) Jan
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index afb1ccb342c2..f0e10dec046e 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1052,7 +1052,7 @@ void svm_vmenter_helper(const struct cpu_user_regs *regs) if ( unlikely(tb_init_done) ) HVMTRACE_ND(VMENTRY, nestedhvm_vcpu_in_guestmode(curr) ? TRC_HVM_NESTEDFLAG : 0, - 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0); + 1/*cycles*/); svm_sync_vmcb(curr, vmcb_needs_vmsave); @@ -2565,12 +2565,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) if ( hvm_long_mode_active(v) ) HVMTRACE_ND(VMEXIT64, vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 0, - 1/*cycles*/, 3, exit_reason, - regs->eip, regs->rip >> 32, 0, 0, 0); + 1/*cycles*/, exit_reason, TRC_PAR_LONG(regs->rip)); else HVMTRACE_ND(VMEXIT, vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 0, - 1/*cycles*/, 2, exit_reason, - regs->eip, 0, 0, 0, 0); + 1/*cycles*/, exit_reason, regs->eip); if ( vcpu_guestmode ) { diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index b0a42d05f86a..d403e2d8060a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3864,11 +3864,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) __vmread(VM_EXIT_REASON, &exit_reason); if ( hvm_long_mode_active(v) ) - HVMTRACE_ND(VMEXIT64, 0, 1/*cycles*/, 3, exit_reason, - regs->eip, regs->rip >> 32, 0, 0, 0); + HVMTRACE_ND(VMEXIT64, 0, 1/*cycles*/, exit_reason, + TRC_PAR_LONG(regs->rip)); else - HVMTRACE_ND(VMEXIT, 0, 1/*cycles*/, 2, exit_reason, - regs->eip, 0, 0, 0, 0); + HVMTRACE_ND(VMEXIT, 0, 1/*cycles*/, exit_reason, regs->eip); perfc_incra(vmexits, exit_reason); @@ -4645,7 +4644,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs) if ( unlikely(curr->arch.hvm.vmx.lbr_flags & LBR_FIXUP_MASK) ) lbr_fixup(); - HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0); + HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/); __vmwrite(GUEST_RIP, regs->rip); __vmwrite(GUEST_RSP, regs->rsp); diff --git a/xen/include/asm-x86/hvm/trace.h b/xen/include/asm-x86/hvm/trace.h index 5cd459b855b7..2bbac45044ce 100644 --- a/xen/include/asm-x86/hvm/trace.h +++ b/xen/include/asm-x86/hvm/trace.h @@ -67,38 +67,30 @@ #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \ TRACE_6D(_e, d1, d2, d3, d4) -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \ +#define HVMTRACE_ND(evt, modifier, cycles, ...) \ do { \ if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt ) \ { \ - struct { \ - u32 d[6]; \ - } _d; \ - _d.d[0]=(d1); \ - _d.d[1]=(d2); \ - _d.d[2]=(d3); \ - _d.d[3]=(d4); \ - _d.d[4]=(d5); \ - _d.d[5]=(d6); \ + uint32_t _d[] = { __VA_ARGS__ }; \ __trace_var(TRC_HVM_ ## evt | (modifier), cycles, \ - sizeof(*_d.d) * count, &_d); \ + sizeof(_d), _d); \ } \ } while(0) #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6) \ - HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6) + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6) #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5) \ - HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5, 0) + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5) #define HVMTRACE_4D(evt, d1, d2, d3, d4) \ - HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4, 0, 0) + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4) #define HVMTRACE_3D(evt, d1, d2, d3) \ - HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3, 0, 0, 0) + HVMTRACE_ND(evt, 0, 0, d1, d2, d3) #define HVMTRACE_2D(evt, d1, d2) \ - HVMTRACE_ND(evt, 0, 0, 2, d1, d2, 0, 0, 0, 0) + HVMTRACE_ND(evt, 0, 0, d1, d2) #define HVMTRACE_1D(evt, d1) \ - HVMTRACE_ND(evt, 0, 0, 1, d1, 0, 0, 0, 0, 0) + HVMTRACE_ND(evt, 0, 0, d1) #define HVMTRACE_0D(evt) \ - HVMTRACE_ND(evt, 0, 0, 0, 0, 0, 0, 0, 0, 0) + HVMTRACE_ND(evt, 0, 0) #define HVMTRACE_LONG_1D(evt, d1) \ HVMTRACE_2D(evt ## 64, (d1) & 0xFFFFFFFF, (d1) >> 32)
It is pointless to write all 6 entries and only consume the useful subset. bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(), weighing in at 11% of the function arranging unread zeroes on the stack, and 8% for svm_vmexit_handler(). add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867) Function old new delta hvm_msr_write_intercept 1049 1033 -16 vmx_enable_intr_window 238 214 -24 svm_enable_intr_window 337 313 -24 hvmemul_write_xcr 115 91 -24 hvmemul_write_cr 350 326 -24 hvmemul_read_xcr 115 91 -24 hvmemul_read_cr 146 122 -24 hvm_mov_to_cr 438 414 -24 hvm_mov_from_cr 253 229 -24 vmx_intr_assist 1150 1118 -32 svm_intr_assist 459 427 -32 hvm_rdtsc_intercept 138 106 -32 hvm_msr_read_intercept 898 866 -32 vmx_vmenter_helper 1142 1094 -48 vmx_inject_event 813 765 -48 svm_vmenter_helper 238 190 -48 hvm_hlt 197 146 -51 svm_inject_event 1678 1614 -64 svm_vmexit_handler 5880 5416 -464 vmx_vmexit_handler 7281 6473 -808 Total: Before=3644184, After=3642317, chg -0.05% Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: George Dunlap <George.Dunlap@eu.citrix.com> CC: Ian Jackson <iwj@xenproject.org> CC: Jan Beulich <JBeulich@suse.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Wei Liu <wl@xen.org> CC: Julien Grall <julien@xen.org> CC: Dario Faggioli <dfaggioli@suse.com> Normally I wouldn't recommend patches like this for backport, but {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines dropped... --- xen/arch/x86/hvm/svm/svm.c | 8 +++----- xen/arch/x86/hvm/vmx/vmx.c | 9 ++++----- xen/include/asm-x86/hvm/trace.h | 28 ++++++++++------------------ 3 files changed, 17 insertions(+), 28 deletions(-)