diff mbox series

[kvm-unit-tests] x86/emulator: Test POP-SS blocking

Message ID 20220821220647.1420411-1-mhal@rbox.co (mailing list archive)
State New, archived
Headers show
Series [kvm-unit-tests] x86/emulator: Test POP-SS blocking | expand

Commit Message

Michal Luczaj Aug. 21, 2022, 10:06 p.m. UTC
Verify that CPU interruptibility due to POP SS is set correctly (and #DB
is suppressed).

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Initially I wanted to test #DB suppression with EFLAGS.TF=1, but it
turned out that making the emulator set ctxt->interruptibility like this

	asm volatile("pushf\n\t"
		     "push %[flags]\n\t"
		     "popf\n\t"
		     KVM_FEP "mov %[ss], %%ss\n\t"
		     "popf"
		     :
		     : [ss] "r" (read_ss()),
		       [flags] "r" (read_rflags() | X86_EFLAGS_TF)
		     : "memory");

results in "KVM: entry failed, hardware error 0x80000021".
kvm_intel.dump_invalid_vmcs=1 tells me at that moment

Interruptibility = 00000002
DebugExceptions = 0x0000000000000000

so perhaps it's related to the problem described in
https://lkml.kernel.org/kvm/20220120000624.655815-1-seanjc@google.com/ ?
That said, I don't know if combining FEP+blocking+TF+DB is a misuse on
my side, a bug, or a detail that happens to be unimplemented.

Anyway, the POP-SS blocking test avoids touching EFLAGS.TF by using
DR0/7.

Note that doing this the ASM_TRY() way would require extending
setup_idt() (to handle #DB) and introducing another ASM_TRY() variant
(one without the initial `movl $0, %%gs:4`).

 x86/Makefile.i386 |  3 ++-
 x86/popss.c       | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 x86/unittests.cfg |  4 ++++
 3 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 x86/popss.c

Comments

Michal Luczaj Aug. 22, 2022, 2:40 a.m. UTC | #1
On 8/22/22 00:06, Michal Luczaj wrote:
> Note that doing this the ASM_TRY() way would require extending
> setup_idt() (to handle #DB) and introducing another ASM_TRY() variant
> (one without the initial `movl $0, %%gs:4`).

Replying to self as I was wrong regarding the need for another ASM_TRY() variant.
Once setup_idt() is told to handle #DB, test can be simplified to something like

	asm volatile("push %[ss]\n\t"
		     __ASM_TRY(KVM_FEP "pop %%ss", "1f")
		     "ex_blocked: mov $1, %[success]\n\t"
		     "1:"
		     : [success] "+g" (success)
		     : [ss] "r" (read_ss())
		     : "memory");

Should I resend the patch?

Sorry for the noise,
Michal
Sean Christopherson Aug. 22, 2022, 3:42 p.m. UTC | #2
On Mon, Aug 22, 2022, Michal Luczaj wrote:
> On 8/22/22 00:06, Michal Luczaj wrote:
> > Note that doing this the ASM_TRY() way would require extending
> > setup_idt() (to handle #DB) and introducing another ASM_TRY() variant
> > (one without the initial `movl $0, %%gs:4`).
> 
> Replying to self as I was wrong regarding the need for another ASM_TRY() variant.
> Once setup_idt() is told to handle #DB,

Hmm, it might be a moot point for this patch (see below), but my vote is to have
setup_idt() wire up all known handlers to check_exception_table().  I don't see
any reason to skip some vectors.  Code with __ASM_TRY() will explode no matter what,
so it's not like it risks suppressing completely unexpected faults.

	for (i = 0; i < 32; i++) {
		if (!idt_handlers[i])
			continue;

		set_idt_entry(i, idt_handlers[i], 0);
		handle_exception(i, check_exception_table);
	}

> test can be simplified to something like
> 
> 	asm volatile("push %[ss]\n\t"
> 		     __ASM_TRY(KVM_FEP "pop %%ss", "1f")
> 		     "ex_blocked: mov $1, %[success]\n\t"

So I'm 99% certain this only passes because KVM doesn't emulate the `mov $1, %[success]`.
kvm_vcpu_check_code_breakpoint() honors EFLAGS.RF, but not MOV/POP_SS blocking.
I.e. I would expect this to fail if the `mov` were also tagged KVM_FEP.

Single-step and data #DBs appear to be broken as well, I don't see anything that
will suppress them for MOV/POP_SS.  Of course, KVM doesn't emulate data #DBs at
all, so testing that case is probably not worthwhile at this time.

The reason I say the setup_idt() change is a moot point is because I think it
would be better to put this testcase into debug.c (where "this" testcase is really
going to be multiple testcases).  With macro shenanigans (or code patching), it
should be fairly easy to run all testcases with both FEP=0 and FEP=1.

Then it's "just" a matter of adding a code #DB testcase.  __run_single_step_db_test()
and run_ss_db_test() aren't fundamentally tied to single-step, i.e. can simply be
renamed.  For simplicity, I'd say just skip the usermode test for code #DBs, IMO
it's not worth the extra coverage.
Nadav Amit Aug. 22, 2022, 6:30 p.m. UTC | #3
On Aug 22, 2022, at 8:42 AM, Sean Christopherson <seanjc@google.com> wrote:

> On Mon, Aug 22, 2022, Michal Luczaj wrote:
>> On 8/22/22 00:06, Michal Luczaj wrote:
>>> Note that doing this the ASM_TRY() way would require extending
>>> setup_idt() (to handle #DB) and introducing another ASM_TRY() variant
>>> (one without the initial `movl $0, %%gs:4`).
>> 
>> Replying to self as I was wrong regarding the need for another ASM_TRY() variant.
>> Once setup_idt() is told to handle #DB,
> 
> Hmm, it might be a moot point for this patch (see below), but my vote is to have
> setup_idt() wire up all known handlers to check_exception_table().  I don't see
> any reason to skip some vectors.  Code with __ASM_TRY() will explode no matter what,
> so it's not like it risks suppressing completely unexpected faults.

I would just like to point out that this whole FEP approach is not friendly
for other hypervisors or bare-metal testings. While people might have mixed
feelings about other hypervisors, the benefit of running the tests on
bare-metal should be clear.

Specifically in this test, it is rather simple to avoid using FEP.

Oh, well.
Sean Christopherson Aug. 22, 2022, 6:37 p.m. UTC | #4
On Mon, Aug 22, 2022, Nadav Amit wrote:
> On Aug 22, 2022, at 8:42 AM, Sean Christopherson <seanjc@google.com> wrote:
> 
> > On Mon, Aug 22, 2022, Michal Luczaj wrote:
> >> On 8/22/22 00:06, Michal Luczaj wrote:
> >>> Note that doing this the ASM_TRY() way would require extending
> >>> setup_idt() (to handle #DB) and introducing another ASM_TRY() variant
> >>> (one without the initial `movl $0, %%gs:4`).
> >> 
> >> Replying to self as I was wrong regarding the need for another ASM_TRY() variant.
> >> Once setup_idt() is told to handle #DB,
> > 
> > Hmm, it might be a moot point for this patch (see below), but my vote is to have
> > setup_idt() wire up all known handlers to check_exception_table().  I don't see
> > any reason to skip some vectors.  Code with __ASM_TRY() will explode no matter what,
> > so it's not like it risks suppressing completely unexpected faults.
> 
> I would just like to point out that this whole FEP approach is not friendly
> for other hypervisors or bare-metal testings. While people might have mixed
> feelings about other hypervisors, the benefit of running the tests on
> bare-metal should be clear.

If people want to run KUT on other hypervisors that support a different FEP (KVM
stole it from Xen after all), then I have no objection to genericizing KVM_FEP
and making it a config option.

> Specifically in this test, it is rather simple to avoid using FEP.

I'm saying KUT should do both, i.e. run the same sequences with and without
forced emulation on the "critical" instructions.  That provides full coverage for
bare metal, and also provides coverage for both "fast" and "slow" emulation in
KVM (and other future hypervisors if other prefixes are ever supported).
Michal Luczaj Aug. 23, 2022, 12:16 a.m. UTC | #5
On 8/22/22 17:42, Sean Christopherson wrote:
>> On 8/22/22 00:06, Michal Luczaj wrote:
>> test can be simplified to something like
>>
>> 	asm volatile("push %[ss]\n\t"
>> 		     __ASM_TRY(KVM_FEP "pop %%ss", "1f")
>> 		     "ex_blocked: mov $1, %[success]\n\t"
> 
> So I'm 99% certain this only passes because KVM doesn't emulate the `mov $1, %[success]`.
> kvm_vcpu_check_code_breakpoint() honors EFLAGS.RF, but not MOV/POP_SS blocking.
> I.e. I would expect this to fail if the `mov` were also tagged KVM_FEP.

I wasn't sure if I understood you correctly, so, FWIW, I've ran:

static void test_pop_ss_blocking_try(void)
{
	int success;

	write_dr7(DR7_FIXED_1 |
		  DR7_GLOBAL_ENABLE_DRx(0) |
		  DR7_EXECUTE_DRx(0) |
		  DR7_LEN_1_DRx(0));

#define POPSS(desc, fep1, fep2)					\
	asm volatile("mov $0, %[success]\n\t"			\
		     "lea 1f, %%eax\n\r"			\
		     "mov %%eax, %%dr0\n\r"			\
		     "push %%ss\n\t"				\
		     __ASM_TRY(fep1 "pop %%ss", "2f")		\
		     "1:" fep2 "mov $1, %[success]\n\t"		\
		     "2:"					\
		     : [success] "=g" (success)			\
		     :						\
		     : "eax", "memory");			\
	report(success, desc ": #DB suppressed after POP SS")

	POPSS("no fep", "", "");
	POPSS("fep pop-ss", KVM_FEP, "");
	POPSS("fep mov-1", "", KVM_FEP);
	POPSS("fep pop-ss/fep mov-1", KVM_FEP, KVM_FEP);

	write_dr7(DR7_FIXED_1);
}

and got:

em_pop_sreg unpatched:
PASS: no fep: #DB suppressed after POP SS
FAIL: fep pop-ss: #DB suppressed after POP SS
PASS: fep mov-1: #DB suppressed after POP SS
FAIL: fep pop-ss/fep mov-1: #DB suppressed after POP SS

em_pop_sreg patched:
PASS: no fep: #DB suppressed after POP SS
PASS: fep pop-ss: #DB suppressed after POP SS
PASS: fep mov-1: #DB suppressed after POP SS
PASS: fep pop-ss/fep mov-1: #DB suppressed after POP SS

For the sake of completeness: basically the same, but MOV SS, i.e.

	asm volatile("mov $0, %[success]\n\t"			\
		     "lea 1f, %%eax\n\r"			\
		     "mov %%eax, %%dr0\n\r"			\
		     "mov %%ss, %%eax\n\t"			\
		     __ASM_TRY(fep1 "mov %%eax, %%ss", "2f")	\
		     "1:" fep2 "mov $1, %[success]\n\t"		\
		     "2:"					\
		     : [success] "=g" (success)			\
		     :						\
		     : "eax", "memory");			\

PASS: no fep: #DB suppressed after MOV SS
PASS: fep mov-ss: #DB suppressed after MOV SS
PASS: fep mov-1: #DB suppressed after MOV SS
PASS: fep mov-ss/fep mov-1: #DB suppressed after MOV SS

> The reason I say the setup_idt() change is a moot point is because I think it
> would be better to put this testcase into debug.c (where "this" testcase is really
> going to be multiple testcases).  With macro shenanigans (or code patching), it
> should be fairly easy to run all testcases with both FEP=0 and FEP=1.
>
> Then it's "just" a matter of adding a code #DB testcase.  __run_single_step_db_test()
> and run_ss_db_test() aren't fundamentally tied to single-step, i.e. can simply be
> renamed.  For simplicity, I'd say just skip the usermode test for code #DBs, IMO
> it's not worth the extra coverage.

So that would involve a 32-bit segment for POP SS and making use of DR0/DR7 instead
of single stepping? I guess I can try.

Thanks,
Michal
Sean Christopherson Aug. 24, 2022, 6:32 p.m. UTC | #6
On Tue, Aug 23, 2022, Michal Luczaj wrote:
> On 8/22/22 17:42, Sean Christopherson wrote:
> >> On 8/22/22 00:06, Michal Luczaj wrote:
> >> test can be simplified to something like
> >>
> >> 	asm volatile("push %[ss]\n\t"
> >> 		     __ASM_TRY(KVM_FEP "pop %%ss", "1f")
> >> 		     "ex_blocked: mov $1, %[success]\n\t"
> > 
> > So I'm 99% certain this only passes because KVM doesn't emulate the `mov $1, %[success]`.
> > kvm_vcpu_check_code_breakpoint() honors EFLAGS.RF, but not MOV/POP_SS blocking.
> > I.e. I would expect this to fail if the `mov` were also tagged KVM_FEP.
> 
> I wasn't sure if I understood you correctly

Yep, you got the gist.

> so, FWIW, I've ran:
> 
> static void test_pop_ss_blocking_try(void)
> {
> 	int success;
> 
> 	write_dr7(DR7_FIXED_1 |
> 		  DR7_GLOBAL_ENABLE_DRx(0) |
> 		  DR7_EXECUTE_DRx(0) |
> 		  DR7_LEN_1_DRx(0));
> 
> #define POPSS(desc, fep1, fep2)					\
> 	asm volatile("mov $0, %[success]\n\t"			\
> 		     "lea 1f, %%eax\n\r"			\

Note, hardcoding EAX doesn't compile for 64-bit KUT.  It's kinda gross, but we
can fudge around this without #ifdeffery by using a dummy input constraint.

> 		     "mov %%eax, %%dr0\n\r"			\
> 		     "push %%ss\n\t"				\
> 		     __ASM_TRY(fep1 "pop %%ss", "2f")		\

No need for __ASM_TRY if this is thrown into debug.c.

> 		     "1:" fep2 "mov $1, %[success]\n\t"		\
> 		     "2:"					\
> 		     : [success] "=g" (success)			\
> 		     :						\
> 		     : "eax", "memory");			\
> 	report(success, desc ": #DB suppressed after POP SS")

To avoid potential "leakage", wrap this whole thing in ({ ... }), and declare
"success" inside that scope.

Aha!  Even better.  s/success/r, make it unsigned long, and force it to be a
register.  Then the blob can use the register as scratch before clearing it via
XOR to indicate success (though that's somewhat pointless, see below).

> 	POPSS("no fep", "", "");
> 	POPSS("fep pop-ss", KVM_FEP, "");
> 	POPSS("fep mov-1", "", KVM_FEP);
> 	POPSS("fep pop-ss/fep mov-1", KVM_FEP, KVM_FEP);
> 
> 	write_dr7(DR7_FIXED_1);
> }
> 
> and got:
> 
> em_pop_sreg unpatched:
> PASS: no fep: #DB suppressed after POP SS
> FAIL: fep pop-ss: #DB suppressed after POP SS
> PASS: fep mov-1: #DB suppressed after POP SS
> FAIL: fep pop-ss/fep mov-1: #DB suppressed after POP SS
> 
> em_pop_sreg patched:
> PASS: no fep: #DB suppressed after POP SS
> PASS: fep pop-ss: #DB suppressed after POP SS
> PASS: fep mov-1: #DB suppressed after POP SS
> PASS: fep pop-ss/fep mov-1: #DB suppressed after POP SS
> 
> For the sake of completeness: basically the same, but MOV SS, i.e.
> 
> 	asm volatile("mov $0, %[success]\n\t"			\
> 		     "lea 1f, %%eax\n\r"			\
> 		     "mov %%eax, %%dr0\n\r"			\
> 		     "mov %%ss, %%eax\n\t"			\
> 		     __ASM_TRY(fep1 "mov %%eax, %%ss", "2f")	\
> 		     "1:" fep2 "mov $1, %[success]\n\t"		\
> 		     "2:"					\
> 		     : [success] "=g" (success)			\
> 		     :						\
> 		     : "eax", "memory");			\
> 
> PASS: no fep: #DB suppressed after MOV SS
> PASS: fep mov-ss: #DB suppressed after MOV SS
> PASS: fep mov-1: #DB suppressed after MOV SS
> PASS: fep mov-ss/fep mov-1: #DB suppressed after MOV SS

This passes for 3 reasons:

  1. The test more than likely doesn't skip the instruction on #DB, and so
     success will be set regardless of whether or not a #DB occurred.

  2. DR0 needs to be loaded with the address of the MOV, not the address of the
     FEP prefix.  Somewhat amusingly, this means my patch to fix redundant #DB
     checking is wrong[*]

  3. My personal favorite.  On VM-Exits due to exceptions (#UD), Intel CPUs set
     EFLAGS.RF=1 and thus effectively suppress #DBs on FEP instructions.

       If the VM exit is caused directly by an event that would normally be delivered
       through the IDT, the value saved is that which would appear in the saved RFLAGS
       image (either that which would be saved on the stack had the event been delivered
       through a trap or interrupt gate1 or into the old task-state segment had the event
       been delivered through a task gate) had the event been delivered through the IDT.
       See below for VM exits due to task switches caused by task gates in the IDT.

     I'm pretty sure this inarguably a KVM bug.  The SDM states the EFLAGS.RF=0
     on instruction-based VM-Exits, and since the intent in the FEP case (but not
     the general #UD case) is to simulate an instruction exit...

       If the VM exit is caused by an attempt to execute an instruction that
       unconditionally causes VM exits or one that was configured to do with a
       VM-execution control, the value saved is 0.

I'll fold a fix for #3 and for the MOV/POP-SS blocking code #DB interaction into
the aforementioned series[*].

As expected, I get two failures (FEP on the XOR testcases) with this tweak to KVM,

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb5cecb19cf5..e5fd0b936a48 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7261,6 +7261,7 @@ int handle_ud(struct kvm_vcpu *vcpu)
            kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
                                sig, sizeof(sig), &e) == 0 &&
            memcmp(sig, kvm_emulate_prefix, sizeof(sig)) == 0) {
+               kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) & ~X86_EFLAGS_RF);
                kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
                emul_type = EMULTYPE_TRAP_UD_FORCED;
        }

and this as a testcase.

diff --git a/x86/debug.c b/x86/debug.c
index b66bf047..ab29e94e 100644
--- a/x86/debug.c
+++ b/x86/debug.c
@@ -352,8 +352,44 @@ static unsigned long singlestep_with_movss_blocking_and_dr7_gd(void)
        return start_rip;
 }
 
+static void test_mov_ss_code_db(bool fep_available)
+{
+       write_dr7(DR7_FIXED_1 |
+                 DR7_GLOBAL_ENABLE_DRx(0) |
+                 DR7_EXECUTE_DRx(0) |
+                 DR7_LEN_1_DRx(0));
+
+#define MOVSS_DB(desc, fep1, fep2)                             \
+({                                                             \
+       unsigned long r;                                        \
+                                                               \
+       n = 0;                                                  \
+       asm volatile("lea 1f(%%rip), %0\n\t"                    \
+                    "mov %0, %%dr0\n\t"                        \
+                    "mov %%ss, %0\n\t"                         \
+                    fep1 "mov  %0, %%ss\n\t"                   \
+                    fep2 "1: xor %0, %0\n\t"                   \
+                    "2:"                                       \
+                    : "=r" (r)                                 \
+                    :                                          \
+                    : "memory");                               \
+       report(!n && !r, desc ": #DB suppressed after MOV SS"); \
+})
+
+       MOVSS_DB("no fep", "", "");
+       if (fep_available) {
+               MOVSS_DB("fep MOV-SS", KVM_FEP, "");
+               MOVSS_DB("fep XOR", "", KVM_FEP);
+               MOVSS_DB("fep MOV-SS/fep XOR", KVM_FEP, KVM_FEP);
+       }
+
+       write_dr7(DR7_FIXED_1);
+}
+
 int main(int ac, char **av)
 {
+       /* Check for KVM's FEP support prior to usurpsing the #UD handler. */
+       bool fep_available = is_fep_available();
        unsigned long cr4;
 
        handle_exception(DB_VECTOR, handle_db);
@@ -417,6 +453,8 @@ int main(int ac, char **av)
        run_ss_db_test(singlestep_with_movss_blocking_and_icebp);
        run_ss_db_test(singlestep_with_movss_blocking_and_dr7_gd);
 
+       test_mov_ss_code_db(fep_available);
+
        n = 0;
        write_dr1((void *)&value);
        write_dr6(DR6_BS);


[*] https://lore.kernel.org/all/20220723005137.1649592-4-seanjc@google.com

> > The reason I say the setup_idt() change is a moot point is because I think it
> > would be better to put this testcase into debug.c (where "this" testcase is really
> > going to be multiple testcases).  With macro shenanigans (or code patching), it
> > should be fairly easy to run all testcases with both FEP=0 and FEP=1.
> >
> > Then it's "just" a matter of adding a code #DB testcase.  __run_single_step_db_test()
> > and run_ss_db_test() aren't fundamentally tied to single-step, i.e. can simply be
> > renamed.  For simplicity, I'd say just skip the usermode test for code #DBs, IMO
> > it's not worth the extra coverage.
> 
> So that would involve a 32-bit segment for POP SS and making use of DR0/DR7 instead
> of single stepping? I guess I can try.

Eh, let's completely skip usermode for code #DBs and not tweak __run_single_step_db_test().
It's easier to just have a standalone function.
Michal Luczaj Aug. 24, 2022, 9:49 p.m. UTC | #7
On 8/24/22 20:32, Sean Christopherson wrote:
> Eh, let's completely skip usermode for code #DBs and not tweak __run_single_step_db_test().
> It's easier to just have a standalone function.

Something like this?

static void test_pop_ss_code_db(bool fep_available)
{
	write_ss(KERNEL_DS);

	write_dr7(DR7_FIXED_1 |
		  DR7_ENABLE_DRx(0) |
		  DR7_EXECUTE_DRx(0) |
		  DR7_LEN_1_DRx(0));

#define POPSS_DB(desc, fep1, fep2)				\
({								\
	unsigned int r;						\
								\
	n = 0;							\
	asm volatile(/* jump to 32-bit code segment */		\
		     "ljmp *1f\n\t"				\
		     "1:\n\t"					\
		     "	.long 2f\n\t"				\
		     "	.word " xstr(KERNEL_CS32) "\n\t"	\
		     /* exercise POP SS blocking */		\
		     ".code32\n\t"				\
		     "2: lea 3f, %0\n\t"			\
		     "mov %0, %%dr0\n\t"			\
		     "push %%ss\n\t"				\
		     fep1 "pop %%ss\n\t"			\
		     fep2 "3: xor %0, %0\n\t"			\
		     /* back to long mode */			\
		     "ljmp %[cs64], $4f\n\t"			\
		     ".code64\n\t"				\
		     "4:"					\
		     : "=R" (r)					\
		     : [cs64] "i" (KERNEL_CS64)			\
		     : "memory");				\
								\
	report(!n && !r, desc ": #DB suppressed after POP SS"); \
})

	POPSS_DB("no fep", "", "");
	if (fep_available) {
		POPSS_DB("fep POP-SS", KVM_FEP, "");
		POPSS_DB("fep XOR", "", KVM_FEP);
		POPSS_DB("fep POP-SS/fep XOR", KVM_FEP, KVM_FEP);
	}

	write_dr7(DR7_FIXED_1);
}

Results:

em_pop_sreg unpatched
PASS: no fep: #DB suppressed after POP SS
FAIL: fep POP-SS: #DB suppressed after POP SS
PASS: fep XOR: #DB suppressed after POP SS
PASS: fep POP-SS/fep XOR: #DB suppressed after POP SS

em_pop_sreg patched
PASS: no fep: #DB suppressed after POP SS
PASS: fep POP-SS: #DB suppressed after POP SS
PASS: fep XOR: #DB suppressed after POP SS
PASS: fep POP-SS/fep XOR: #DB suppressed after POP SS

thanks,
Michal
Sean Christopherson Aug. 25, 2022, 5:03 p.m. UTC | #8
On Wed, Aug 24, 2022, Michal Luczaj wrote:
> On 8/24/22 20:32, Sean Christopherson wrote:
> > Eh, let's completely skip usermode for code #DBs and not tweak __run_single_step_db_test().
> > It's easier to just have a standalone function.
> 
> Something like this?
> 
> static void test_pop_ss_code_db(bool fep_available)
> {
> 	write_ss(KERNEL_DS);
> 
> 	write_dr7(DR7_FIXED_1 |
> 		  DR7_ENABLE_DRx(0) |
> 		  DR7_EXECUTE_DRx(0) |
> 		  DR7_LEN_1_DRx(0));
> 
> #define POPSS_DB(desc, fep1, fep2)				\
> ({								\
> 	unsigned int r;						\
> 								\
> 	n = 0;							\
> 	asm volatile(/* jump to 32-bit code segment */		\
> 		     "ljmp *1f\n\t"				\
> 		     "1:\n\t"					\
> 		     "	.long 2f\n\t"				\
> 		     "	.word " xstr(KERNEL_CS32) "\n\t"	\
> 		     /* exercise POP SS blocking */		\
> 		     ".code32\n\t"				\
> 		     "2: lea 3f, %0\n\t"			\
> 		     "mov %0, %%dr0\n\t"			\
> 		     "push %%ss\n\t"				\
> 		     fep1 "pop %%ss\n\t"			\
> 		     fep2 "3: xor %0, %0\n\t"			\
> 		     /* back to long mode */			\
> 		     "ljmp %[cs64], $4f\n\t"			\
> 		     ".code64\n\t"				\

Ooh, I see what you meant by temporarily switching to 32-bit mode.  I was thinking
we could just make the POP SS testcase 32-bit only, but I didn't realize this test
is 64-bit only.  Argh, and so is emulate.c.  And now I get why you added a brand
new test.

Let's just add a new test.  The above can work, but it relies on the code and
stack being mapped with a 32-bit address, e.g. will break if KUT is ever changed
to not map everything low in the virtual address space.

I think it makes sense to rename emulator.c => emulator64.c, and then start a new
emulator.c for tests that apply to both 32-bit and 64-bit KUT.

I'll send a small series, the behavior is also different for AMD CPUs (I coded up
99% of this yesterday before realizing this morning that debug.c is 64-bit only).
Michal Luczaj Aug. 25, 2022, 5:32 p.m. UTC | #9
On 8/25/22 19:03, Sean Christopherson wrote:
> On Wed, Aug 24, 2022, Michal Luczaj wrote:
>> 								\
>> 	n = 0;							\
>> 	asm volatile(/* jump to 32-bit code segment */		\
>> 		     "ljmp *1f\n\t"				\
>> 		     "1:\n\t"					\
>> 		     "	.long 2f\n\t"				\
>> 		     "	.word " xstr(KERNEL_CS32) "\n\t"	\
>> 		     /* exercise POP SS blocking */		\
>> 		     ".code32\n\t"				\
>> 		     "2: lea 3f, %0\n\t"			\
>> 		     "mov %0, %%dr0\n\t"			\
>> 		     "push %%ss\n\t"				\
>> 		     fep1 "pop %%ss\n\t"			\
>> 		     fep2 "3: xor %0, %0\n\t"			\
>> 		     /* back to long mode */			\
>> 		     "ljmp %[cs64], $4f\n\t"			\
>> 		     ".code64\n\t"				\
> 
> Ooh, I see what you meant by temporarily switching to 32-bit mode.  I was thinking
> we could just make the POP SS testcase 32-bit only, but I didn't realize this test
> is 64-bit only.  Argh, and so is emulate.c.  And now I get why you added a brand
> new test.
>
> Let's just add a new test.  The above can work, but it relies on the code and
> stack being mapped with a 32-bit address, e.g. will break if KUT is ever changed
> to not map everything low in the virtual address space.

Yeah, it is fragile in that sense. But does it mean code such as
vmx_tests.c:into_guest_main() or svm_tests.c:svm_of_test_guest() should be moved
to 32-bit binaries?

Michal
Sean Christopherson Aug. 25, 2022, 5:56 p.m. UTC | #10
On Thu, Aug 25, 2022, Michal Luczaj wrote:
> On 8/25/22 19:03, Sean Christopherson wrote:
> > On Wed, Aug 24, 2022, Michal Luczaj wrote:
> >> 								\
> >> 	n = 0;							\
> >> 	asm volatile(/* jump to 32-bit code segment */		\
> >> 		     "ljmp *1f\n\t"				\
> >> 		     "1:\n\t"					\
> >> 		     "	.long 2f\n\t"				\
> >> 		     "	.word " xstr(KERNEL_CS32) "\n\t"	\
> >> 		     /* exercise POP SS blocking */		\
> >> 		     ".code32\n\t"				\
> >> 		     "2: lea 3f, %0\n\t"			\
> >> 		     "mov %0, %%dr0\n\t"			\
> >> 		     "push %%ss\n\t"				\
> >> 		     fep1 "pop %%ss\n\t"			\
> >> 		     fep2 "3: xor %0, %0\n\t"			\
> >> 		     /* back to long mode */			\
> >> 		     "ljmp %[cs64], $4f\n\t"			\
> >> 		     ".code64\n\t"				\
> > 
> > Ooh, I see what you meant by temporarily switching to 32-bit mode.  I was thinking
> > we could just make the POP SS testcase 32-bit only, but I didn't realize this test
> > is 64-bit only.  Argh, and so is emulate.c.  And now I get why you added a brand
> > new test.
> >
> > Let's just add a new test.  The above can work, but it relies on the code and
> > stack being mapped with a 32-bit address, e.g. will break if KUT is ever changed
> > to not map everything low in the virtual address space.
> 
> Yeah, it is fragile in that sense. But does it mean code such as
> vmx_tests.c:into_guest_main() or svm_tests.c:svm_of_test_guest() should be moved
> to 32-bit binaries?

Ideally?  Yeah.  In practice, it's just not worth replicating all the nSVM/nVMX
setup for 32-bit binaries.
diff mbox series

Patch

diff --git a/x86/Makefile.i386 b/x86/Makefile.i386
index 0a845e6..b570ae2 100644
--- a/x86/Makefile.i386
+++ b/x86/Makefile.i386
@@ -9,6 +9,7 @@  arch_LDFLAGS = -m elf_i386
 cflatobjs += lib/x86/setjmp32.o lib/ldiv32.o
 
 tests = $(TEST_DIR)/taskswitch.$(exe) $(TEST_DIR)/taskswitch2.$(exe) \
-	$(TEST_DIR)/cmpxchg8b.$(exe) $(TEST_DIR)/la57.$(exe)
+	$(TEST_DIR)/cmpxchg8b.$(exe) $(TEST_DIR)/la57.$(exe) \
+	$(TEST_DIR)/popss.$(exe)
 
 include $(SRCDIR)/$(TEST_DIR)/Makefile.common
diff --git a/x86/popss.c b/x86/popss.c
new file mode 100644
index 0000000..7201f1e
--- /dev/null
+++ b/x86/popss.c
@@ -0,0 +1,59 @@ 
+#include <asm/debugreg.h>
+
+#include "processor.h"
+#include "libcflat.h"
+#include "vmalloc.h"
+#include "desc.h"
+
+static void test_pop_ss_blocking_handler(struct ex_regs *regs)
+{
+	extern char test_pop_ss_blocking_cont;
+
+	regs->rip = (ulong)&test_pop_ss_blocking_cont;
+}
+
+static void test_pop_ss_blocking(void)
+{
+	extern char db_blocked;
+	int success = 0;
+	handler old;
+
+	old = handle_exception(DB_VECTOR, test_pop_ss_blocking_handler);
+
+	write_dr0(&db_blocked);
+	write_dr7(DR7_FIXED_1 |
+		  DR7_GLOBAL_ENABLE_DRx(0) |
+		  DR7_EXECUTE_DRx(0) |
+		  DR7_LEN_1_DRx(0));
+
+	/*
+	 * The idea is that #DB on the instruction following POP SS should be
+	 * suppressed. If the exception is actually suppressed, `success` gets
+	 * set to 1, otherwise exception handler advances RIP away.
+	 */
+	asm volatile("push %[ss]\n\t"
+		     KVM_FEP "pop %%ss\n\t"
+		     "db_blocked: mov $1, %[success]\n\t"
+		     "test_pop_ss_blocking_cont:"
+		     : [success] "+g" (success)
+		     : [ss] "r" (read_ss())
+		     : "memory");
+
+	write_dr7(DR7_FIXED_1);
+	handle_exception(DB_VECTOR, old);
+
+	report(success, "#DB suppressed after POP SS");
+}
+
+int main(void)
+{
+	setup_vm();
+
+	if (is_fep_available())
+		test_pop_ss_blocking();
+	else
+		report_skip("skipping POP-SS blocking test, "
+			    "use kvm.force_emulation_prefix=1 to enable");
+
+	return report_summary();
+}
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index ed65185..8d4e917 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -305,6 +305,10 @@  extra_params = -cpu qemu64,+umip
 file = la57.flat
 arch = i386
 
+[popss]
+file = popss.flat
+arch = i386
+
 [vmx]
 file = vmx.flat
 extra_params = -cpu max,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test -vmx_init_signal_test -vmx_apic_passthrough_tpr_threshold_test -apic_reg_virt_test -virt_x2apic_mode_test -vmx_pf_exception_test -vmx_pf_no_vpid_test -vmx_pf_invvpid_test -vmx_pf_vpid_test"