Message ID | 0ba4903d-638d-408e-bcd4-7c13cb56e078@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/entry: shrink insn size for some of our EFLAGS manipulation | expand |
On 05/03/2024 2:04 pm, Jan Beulich wrote: > --- a/xen/arch/x86/x86_64/entry.S > +++ b/xen/arch/x86/x86_64/entry.S > @@ -52,7 +52,7 @@ UNLIKELY_END(syscall_no_callback) > movq %rax,TRAPBOUNCE_eip(%rdx) > movb %cl,TRAPBOUNCE_flags(%rdx) > call create_bounce_frame > - andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp) > + andb $~(X86_EFLAGS_DF >> 8), UREGS_eflags + 1(%rsp) The other adjustments are fine, but what on earth are we doing with DF here? This looks straight up buggy. We've got no legitimate reason to be playing with the guest's view of DF. ~Andrew
On 06.03.2024 11:33, Andrew Cooper wrote: > On 05/03/2024 2:04 pm, Jan Beulich wrote: >> --- a/xen/arch/x86/x86_64/entry.S >> +++ b/xen/arch/x86/x86_64/entry.S >> @@ -52,7 +52,7 @@ UNLIKELY_END(syscall_no_callback) >> movq %rax,TRAPBOUNCE_eip(%rdx) >> movb %cl,TRAPBOUNCE_flags(%rdx) >> call create_bounce_frame >> - andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp) >> + andb $~(X86_EFLAGS_DF >> 8), UREGS_eflags + 1(%rsp) > > The other adjustments are fine, but what on earth are we doing with DF here? > > This looks straight up buggy. We've got no legitimate reason to be > playing with the guest's view of DF. This is the PV equivalent of the SYSCALL_MASK MSR, isn't it? With it not really being an (emulated) MSR, but an agreement that guests will only ever care about having DF cleared (besides the requested way of dealing with the event mask). One of the many things not written down anywhere ... Jan
On 06/03/2024 10:49 am, Jan Beulich wrote: > On 06.03.2024 11:33, Andrew Cooper wrote: >> On 05/03/2024 2:04 pm, Jan Beulich wrote: >>> --- a/xen/arch/x86/x86_64/entry.S >>> +++ b/xen/arch/x86/x86_64/entry.S >>> @@ -52,7 +52,7 @@ UNLIKELY_END(syscall_no_callback) >>> movq %rax,TRAPBOUNCE_eip(%rdx) >>> movb %cl,TRAPBOUNCE_flags(%rdx) >>> call create_bounce_frame >>> - andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp) >>> + andb $~(X86_EFLAGS_DF >> 8), UREGS_eflags + 1(%rsp) >> The other adjustments are fine, but what on earth are we doing with DF here? >> >> This looks straight up buggy. We've got no legitimate reason to be >> playing with the guest's view of DF. > This is the PV equivalent of the SYSCALL_MASK MSR, isn't it? Well, is it? It definitely never existed in 32bit builds of Xen, when the int80 direct trap existed. I don't see any evidence of it applying anywhere for compat PV guests either, even those with syscall enabled. > With it not > really being an (emulated) MSR, but an agreement that guests will only ever > care about having DF cleared (besides the requested way of dealing with the > event mask). One of the many things not written down anywhere ... If it's not written down, it doesn't exist... And even if this is supposed to be a PV-FMASK-ish thing, it's buggy because it apples also when #UD is raised for no registered callback. (And yes, I realise there is a chronology issues here (the #UD check is the newer element), but it really will corrupt state as presented in a SIGSEGV frame. The question we need to answer is whether there is any remotely-credible way that a PV guest kernel author could be expecting this behaviour and relying on it. I honestly don't think there is. It fails the principle of least surprise compared to native behaviour, 32bit PV behaviour, and to every non-SYSCALL based 64bit event also. It either needs writing down somewhere (and the #UD case fixing), or it needs deleing, because continuing to leave it in this state isn't ok. ~Andrew
On 06.03.2024 12:14, Andrew Cooper wrote: > On 06/03/2024 10:49 am, Jan Beulich wrote: >> On 06.03.2024 11:33, Andrew Cooper wrote: >>> On 05/03/2024 2:04 pm, Jan Beulich wrote: >>>> --- a/xen/arch/x86/x86_64/entry.S >>>> +++ b/xen/arch/x86/x86_64/entry.S >>>> @@ -52,7 +52,7 @@ UNLIKELY_END(syscall_no_callback) >>>> movq %rax,TRAPBOUNCE_eip(%rdx) >>>> movb %cl,TRAPBOUNCE_flags(%rdx) >>>> call create_bounce_frame >>>> - andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp) >>>> + andb $~(X86_EFLAGS_DF >> 8), UREGS_eflags + 1(%rsp) >>> The other adjustments are fine, but what on earth are we doing with DF here? >>> >>> This looks straight up buggy. We've got no legitimate reason to be >>> playing with the guest's view of DF. >> This is the PV equivalent of the SYSCALL_MASK MSR, isn't it? > > Well, is it? > > It definitely never existed in 32bit builds of Xen, when the int80 > direct trap existed. > > I don't see any evidence of it applying anywhere for compat PV guests > either, even those with syscall enabled. Neither int80 nor the 32-bit mode syscall apply the flags mask. Therefore why would we mimic such behavior in Xen? >> With it not >> really being an (emulated) MSR, but an agreement that guests will only ever >> care about having DF cleared (besides the requested way of dealing with the >> event mask). One of the many things not written down anywhere ... > > If it's not written down, it doesn't exist... IOW the PV interface as a whole largely doesn't exist. > And even if this is supposed to be a PV-FMASK-ish thing, it's buggy > because it apples also when #UD is raised for no registered callback. > (And yes, I realise there is a chronology issues here (the #UD check is > the newer element), but it really will corrupt state as presented in a > SIGSEGV frame. > > The question we need to answer is whether there is any remotely-credible > way that a PV guest kernel author could be expecting this behaviour and > relying on it. > > I honestly don't think there is. I was going to say XenoLinux and its forward ports did, but upstream Linux does, too. The native and PV paths being largely shared, there's no CLD in sight anywhere. > It fails the principle of least surprise compared to native behaviour, > 32bit PV behaviour, and to every non-SYSCALL based 64bit event also. Why "every"? switch_to_kernel is used solely for syscall handling. And why do you consider this behavior surprising, when really it would be surprising for the flag to remain untouched from what user mode had, when taking into account how the MSR is set in the native case? > It either needs writing down somewhere (and the #UD case fixing), or it > needs deleing, because continuing to leave it in this state isn't ok. I can try to determine whether addressing the #UD case can be done in a non-intrusive way. If so, I can split off that part of the change here, and make it a separate one with the MOVL->MOVB just done on the side. But really this is scope creep: The change as is has no (intended) change in behavior. It ought to be fine to go in independently; the further work you ask for could be done up front or later on. Jan
--- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -52,7 +52,7 @@ UNLIKELY_END(syscall_no_callback) movq %rax,TRAPBOUNCE_eip(%rdx) movb %cl,TRAPBOUNCE_flags(%rdx) call create_bounce_frame - andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp) + andb $~(X86_EFLAGS_DF >> 8), UREGS_eflags + 1(%rsp) /* %rbx: struct vcpu */ test_all_events: ASSERT_NOT_IN_ATOMIC @@ -223,7 +223,7 @@ LABEL_LOCAL(.Lrestore_rcx_iret_exit_to_g /* No special register assumptions. */ iret_exit_to_guest: andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), EFRAME_eflags(%rsp) - orl $X86_EFLAGS_IF, EFRAME_eflags(%rsp) + orb $X86_EFLAGS_IF >> 8, EFRAME_eflags + 1(%rsp) addq $8,%rsp .Lft0: iretq _ASM_PRE_EXTABLE(.Lft0, handle_exception) @@ -346,7 +346,7 @@ LABEL(sysenter_eflags_saved, 0) GET_STACK_END(bx) /* PUSHF above has saved EFLAGS.IF clear (the caller had it set). */ - orl $X86_EFLAGS_IF, UREGS_eflags(%rsp) + orb $X86_EFLAGS_IF >> 8, UREGS_eflags + 1(%rsp) mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx test %rcx, %rcx jz .Lsyse_cr3_okay @@ -361,11 +361,11 @@ LABEL(sysenter_eflags_saved, 0) cmpb $0,VCPU_sysenter_disables_events(%rbx) movq VCPU_sysenter_addr(%rbx),%rax setne %cl - testl $X86_EFLAGS_NT,UREGS_eflags(%rsp) + testb $X86_EFLAGS_NT >> 8, UREGS_eflags + 1(%rsp) leaq VCPU_trap_bounce(%rbx),%rdx UNLIKELY_START(nz, sysenter_nt_set) pushfq - andl $~X86_EFLAGS_NT,(%rsp) + andb $~(X86_EFLAGS_NT >> 8), 1(%rsp) popfq UNLIKELY_END(sysenter_nt_set) testq %rax,%rax
Much like was recently done for setting entry vector, and along the lines of what we already had in handle_exception_saved, avoid 32-bit immediates where 8-bit ones do. Reduces .text.entry size by 16 bytes in my non-CET reference build, while in my CET reference build section size doesn't change (there and in .text only padding space increases). Inspired by other long->byte conversion work. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- Numbers above are biased by me also having the straight-line-speculation change in the tree, thus every JMP is followed by an INT3. Without that, .text.entry size would also shrink by 16 bytes in the CET build.