Message ID | 20230403105618.41118-4-minipli@grsecurity.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Tests for CR0.WP=0/1 r/o write access | expand |
On 03.04.23 12:56, Mathias Krause wrote: > Add support to enforce access tests to be handled by the emulator, if > supported by KVM. Exclude it from the ac_test_exec() test, though, to > not slow it down too much. I tend to read a lot of objdumps and when initially looking at the generated code it was kinda hard to recognize the FEP instruction, as the FEP actually decodes to UD2 followed by some IMUL instruction that lacks a byte, so when objdump does its linear disassembly, it eats a byte from the to-be-emulated instruction. Like, "FEP; int $0xc3" would decode to: 0: 0f 0b ud2 2: 6b 76 6d cd imul $0xffffffcd,0x6d(%rsi),%esi 6: c3 retq This is slightly confusing, especially when the shortened instruction is actually a valid one as above ("retq" vs "int $0xc3"). I have the below diff to "fix" that. It adds 0x3e to the FEP which would restore objdump's ability to generate a proper disassembly that won't destroy the to-be-emulated instruction. As 0x3e decodes to the DS prefix byte, which the emulator assumes by default anyways, this should mostly be a no-op. However, it helped me to get a proper code dump. If there's interest, I can send a proper patch. If not, this might help others to understand garbled objdumps involving the FEP ;) --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -104,6 +104,7 @@ typedef struct __attribute__((packed)) { /* Forced emulation prefix, used to invoke the emulator unconditionally. */ #define KVM_FEP "ud2; .byte 'k', 'v', 'm';" +#define KVM_FEP_PRETTY KVM_FEP ".byte 0x3e;" #define ASM_TRY_FEP(catch) __ASM_TRY(KVM_FEP, catch) static inline bool is_fep_available(void) diff --git a/x86/access.c b/x86/access.c index eab3959bc871..ab1913313fbb 100644 --- a/x86/access.c +++ b/x86/access.c @@ -811,7 +811,7 @@ static int ac_test_do_access(ac_test_t *at) asm volatile ("mov $fixed2, %%rsi \n\t" "cmp $0, %[fep] \n\t" "jz 1f \n\t" - KVM_FEP + KVM_FEP_PRETTY "1: mov (%[addr]), %[reg] \n\t" "fixed2:" : [reg]"=r"(r), [fault]"=a"(fault), "=b"(e) @@ -838,12 +838,12 @@ static int ac_test_do_access(ac_test_t *at) "jnz 1f \n\t" "cmp $0, %[fep] \n\t" "jz 0f \n\t" - KVM_FEP + KVM_FEP_PRETTY "0: mov (%[addr]), %[reg] \n\t" "jmp done \n\t" "1: cmp $0, %[fep] \n\t" "jz 0f \n\t" - KVM_FEP + KVM_FEP_PRETTY "0: mov %[reg], (%[addr]) \n\t" "jmp done \n\t" "2: call *%[addr] \n\t" Thanks, Mathias
On Mon, Apr 03, 2023, Mathias Krause wrote: > On 03.04.23 12:56, Mathias Krause wrote: > > Add support to enforce access tests to be handled by the emulator, if > > supported by KVM. Exclude it from the ac_test_exec() test, though, to > > not slow it down too much. > > I tend to read a lot of objdumps and when initially looking at the > generated code it was kinda hard to recognize the FEP instruction, as > the FEP actually decodes to UD2 followed by some IMUL instruction that > lacks a byte, so when objdump does its linear disassembly, it eats a > byte from the to-be-emulated instruction. Like, "FEP; int $0xc3" would > decode to: > 0: 0f 0b ud2 > 2: 6b 76 6d cd imul $0xffffffcd,0x6d(%rsi),%esi > 6: c3 retq > This is slightly confusing, especially when the shortened instruction is > actually a valid one as above ("retq" vs "int $0xc3"). > > I have the below diff to "fix" that. It adds 0x3e to the FEP which would > restore objdump's ability to generate a proper disassembly that won't > destroy the to-be-emulated instruction. As 0x3e decodes to the DS prefix > byte, which the emulator assumes by default anyways, this should mostly > be a no-op. However, it helped me to get a proper code dump.a I agree that the objdump output is annoying, but I don't love the idea of cramming in a prefix that's _mostly_ a nop. Given that FEP utilizes extremely specialized, off-by-default KVM code, what about reworking FEP in KVM itself to play nice with objdump (and other disasm tools)? E.g. "officially" change the magic prefix to include a trailing 0x3e. Though IMO, even better would be a magic value that decodes to a multi-byte nop, e.g. 0F 1F 44 00 00. The only "requirement" is that the magic value doesn't get false positives, and I can't imagine any of our test environments generate a ud2 followed by a multi-byte nop. Keeping KVM-Unit-Tests and KVM synchronized on the correct FEP value would be a pain, but disconnects between KVM and KUT are nothing new.
On 03.04.23 21:06, Sean Christopherson wrote: > On Mon, Apr 03, 2023, Mathias Krause wrote: >> On 03.04.23 12:56, Mathias Krause wrote: >>> Add support to enforce access tests to be handled by the emulator, if >>> supported by KVM. Exclude it from the ac_test_exec() test, though, to >>> not slow it down too much. >> >> I tend to read a lot of objdumps and when initially looking at the >> generated code it was kinda hard to recognize the FEP instruction, as >> the FEP actually decodes to UD2 followed by some IMUL instruction that >> lacks a byte, so when objdump does its linear disassembly, it eats a >> byte from the to-be-emulated instruction. Like, "FEP; int $0xc3" would >> decode to: >> 0: 0f 0b ud2 >> 2: 6b 76 6d cd imul $0xffffffcd,0x6d(%rsi),%esi >> 6: c3 retq >> This is slightly confusing, especially when the shortened instruction is >> actually a valid one as above ("retq" vs "int $0xc3"). >> >> I have the below diff to "fix" that. It adds 0x3e to the FEP which would >> restore objdump's ability to generate a proper disassembly that won't >> destroy the to-be-emulated instruction. As 0x3e decodes to the DS prefix >> byte, which the emulator assumes by default anyways, this should mostly >> be a no-op. However, it helped me to get a proper code dump.a > > I agree that the objdump output is annoying, but I don't love the idea of cramming > in a prefix that's _mostly_ a nop. > > Given that FEP utilizes extremely specialized, off-by-default KVM code, what about > reworking FEP in KVM itself to play nice with objdump (and other disasm tools)? > E.g. "officially" change the magic prefix to include a trailing 0x3e. Though IMO, > even better would be a magic value that decodes to a multi-byte nop, e.g. > 0F 1F 44 00 00. The only "requirement" is that the magic value doesn't get > false positives, and I can't imagine any of our test environments generate a ud2 > followed by a multi-byte nop. Well, the above is a bad choice, actually, as that mirrors what GNU as might generate when asked to generate a 5-byte NOP, e.g. for filling the inter-function gap. And if there is a UD2 as last instruction and we're unlucky to hit it, e.g. because we're hitting some UBSAN instrumented error handling, we'll instead trigger "forced emulation" in KVM and fall-through to the next function. Have fun debugging that! ;P $ echo 'foo: ret; ud2; .balign 8; baz: int3' | as - && objdump -d [...] 0000000000000000 <foo>: 0: c3 retq 1: 0f 0b ud2 3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 0000000000000008 <baz>: 8: cc int3 > > Keeping KVM-Unit-Tests and KVM synchronized on the correct FEP value would be a > pain, but disconnects between KVM and KUT are nothing new. Nah, that would be an ABI break, IMO a no-go. What I was suggesting was pretty much the patch: change selective users in KUT to make them objdump-friendly. The FEP as-is is ABI, IMO, and cannot be changed any more. We could add a FEP2 that's more objdump-friedly, but there's really no need to support yet another byte combination in KVM itself. It can all be handled by its users, e.g. in KUT as proposed with the 0x3e suffix. But, as I said, it's mostly just me trying to read disassembly and I see others don't have the need to. So this doesn't need to lead anywhere. But I thought I bring it up, in case there's others questioning why some of the KUT code dumps look so weird in objdump ;) Thanks, Mathias
On Mon, Apr 03, 2023, Mathias Krause wrote: > Add support to enforce access tests to be handled by the emulator, if > supported by KVM. Exclude it from the ac_test_exec() test, though, to > not slow it down too much. IMO, the slowdown is nowhere near bad enought to warrant exclusion. On bare metal without KASAN and other debug gunk, the total runtime with EPT enabled is <6s. With EPT disabled, it's <8s. In a VM, they times are <16s and <26s respectively. Those are perfectly reasonable, and forcing emulation actually makes the EPT case, interesting. And the KASAN/debug builds are so horrendously slow that I think we should figure out a way to special case those kernels anyways. If you don't object, I'll include FEP as a regular flag when applying. One other fun thing the usage from vmx_pf_exception_test_guest(), which runs afoul of a KVM bug. The VMX #PF test runs the access test as an L2 guest (from KVM's perspective), i.e. triggers emulation from L2. KVM's emulator is goofy and checks nested intercepts for PAUSE even on vanilla NOPs. SVM filters out non-PAUSE instructions on the backend, but VMX does not (VMX doesn't have any logic for PAUSE interception and just ends up injecting a #UD). I'll post this as a KVM patch. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9ae4044f076f..1e560457bf9a 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7898,6 +7898,21 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu, /* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */ break; + case x86_intercept_pause: + /* + * PAUSE is a single-byte NOP with a REPE prefix, i.e. collides + * with vanilla NOPs in the emulator. Apply the interception + * check only to actual PAUSE instructions. Don't check + * PAUSE-loop-exiting, software can't expect a given PAUSE to + * exit, i.e. KVM is within its rights to allow L2 to execute + * the PAUSE. + */ + if ((info->rep_prefix != REPE_PREFIX) || + !nested_cpu_has2(vmcs12, CPU_BASED_PAUSE_EXITING)) + return X86EMUL_CONTINUE; + + break; + /* TODO: check more intercepts... */ default: break;
On Mon, Apr 03, 2023, Mathias Krause wrote: > On 03.04.23 21:06, Sean Christopherson wrote: > > Keeping KVM-Unit-Tests and KVM synchronized on the correct FEP value would be a > > pain, but disconnects between KVM and KUT are nothing new. > > Nah, that would be an ABI break, IMO a no-go. What I was suggesting was > pretty much the patch: change selective users in KUT to make them > objdump-friendly. The FEP as-is is ABI, IMO I would be amazed if anything other that KVM's own tests utilizes the feature. IMO, it really shouldn't be considered ABI. Paolo?
On 04.04.23 02:04, Sean Christopherson wrote: > On Mon, Apr 03, 2023, Mathias Krause wrote: >> Add support to enforce access tests to be handled by the emulator, if >> supported by KVM. Exclude it from the ac_test_exec() test, though, to >> not slow it down too much. > > IMO, the slowdown is nowhere near bad enought to warrant exclusion. On bare metal > without KASAN and other debug gunk, the total runtime with EPT enabled is <6s. > With EPT disabled, it's <8s. In a VM, they times are <16s and <26s respectively. > Those are perfectly reasonable, and forcing emulation actually makes the EPT case, > interesting. And the KASAN/debug builds are so horrendously slow that I think we > should figure out a way to special case those kernels anyways. You must have a more beefy machine than I do. Testing bare metal on a NUC12 (i7-1260P) with kvm.ko loaded with force_emulation_prefix=1 and not excluding AC_FEP_BIT from ac_test_bump() gives me a runtime of little over 41s with EPT enabled and, funnily, only 9s with EPT disabled, as that implicitly excludes the CR4.PKE tests, reducing the number of tests to run by a factor of 10 (~38 million tests down do 3.8 million). The non-EPT run took 56s in a VM but the EPT one ran into the 90s timeout. After lifting that, it was 2m22s. :/ For comparison, the runtime of the access test on bare with this series applied as-is is 10s with EPT enabled and 5s with EPT disabled. In a VM I get 10s with EPT and 29s without. > > If you don't object, I'll include FEP as a regular flag when applying. My concerns are that the additional FEP access tests will push the runtime over the 90s limit not only for my setup but for some CI setups as well, as they are notoriously resource constraint (and I vaguely remember a discussion about already hitting timeouts for the gitlab CI pipeline?). So yes, I do object. Please keep the AC_FEP_BIT excluded from the ac_test_bump() tests. It'll cause trouble for CI setups, I'm certain. > > One other fun thing the usage from vmx_pf_exception_test_guest(), which runs afoul > of a KVM bug. The VMX #PF test runs the access test as an L2 guest (from KVM's > perspective), i.e. triggers emulation from L2. KVM's emulator is goofy and checks > nested intercepts for PAUSE even on vanilla NOPs. SVM filters out non-PAUSE instructions > on the backend, but VMX does not (VMX doesn't have any logic for PAUSE interception > and just ends up injecting a #UD). Hehe, nice :D > > I'll post this as a KVM patch. > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 9ae4044f076f..1e560457bf9a 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7898,6 +7898,21 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu, > /* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */ > break; > > + case x86_intercept_pause: > + /* > + * PAUSE is a single-byte NOP with a REPE prefix, i.e. collides > + * with vanilla NOPs in the emulator. Apply the interception > + * check only to actual PAUSE instructions. Don't check > + * PAUSE-loop-exiting, software can't expect a given PAUSE to > + * exit, i.e. KVM is within its rights to allow L2 to execute > + * the PAUSE. > + */ > + if ((info->rep_prefix != REPE_PREFIX) || > + !nested_cpu_has2(vmcs12, CPU_BASED_PAUSE_EXITING)) > + return X86EMUL_CONTINUE; > + > + break; > + > /* TODO: check more intercepts... */ > default: > break; > Thanks, Mathias
On Tue, Apr 04, 2023, Mathias Krause wrote: > On 04.04.23 02:04, Sean Christopherson wrote: > > On Mon, Apr 03, 2023, Mathias Krause wrote: > >> Add support to enforce access tests to be handled by the emulator, if > >> supported by KVM. Exclude it from the ac_test_exec() test, though, to > >> not slow it down too much. > > > > IMO, the slowdown is nowhere near bad enought to warrant exclusion. On bare metal > > without KASAN and other debug gunk, the total runtime with EPT enabled is <6s. > > With EPT disabled, it's <8s. In a VM, they times are <16s and <26s respectively. > > Those are perfectly reasonable, and forcing emulation actually makes the EPT case, > > interesting. And the KASAN/debug builds are so horrendously slow that I think we > > should figure out a way to special case those kernels anyways. > > You must have a more beefy machine than I do. I doubt it, my numbers are from a Haswell with a whopping 6 cores (and the core count should be irrelevant). > Testing bare metal on a NUC12 (i7-1260P) with kvm.ko loaded with > force_emulation_prefix=1 and not excluding AC_FEP_BIT from ac_test_bump() > gives me a runtime of little over 41s with EPT enabled and, funnily, only 9s > with EPT disabled, as that implicitly excludes the CR4.PKE tests, reducing > the number of tests to run by a factor of 10 (~38 million tests down do 3.8 > million). Ah, right, fancy new features. Running on an Icelake, i.e. with 5-level paging and PKRU support, is indeed quite painful. After much fiddling, I think the best option is to add a separate config entry to enable FEP, and have that entry be nodefault, i.e. a "manual" testcase. Ditto for the nVMX #PF variant. That will allow CI and other runners to enable the test for compatible configs, e.g. when running on bare metal, without causing problems for existing setups. Well, unless there are setups that do a generic "-g nodefault", but x86 doesn't currently have any nodefault tests so that's quite unlikely. The only downside is that the CR0.WP testcase will also become manual only. We could obviously have it ignore the opt-in flag, but there's value is containing it to the opt-in testcase, e.g. it becomes very obvious that emulation is relevant to the failure when the FEP version fails but the non-FEP version does not. And I also think we should mark the VPID-based variants nodefault, as they have 4+ minute runtimes in VMs, i.e. we should encourage use of "-g nodefault" in CI when appropriate. I'll post a v4, there are other cleanups needed in the access test, e.g. the darn thing doesn't use report_summary() and so actually getting it to report a SKIP is impossible.
On 04.04.23 18:36, Sean Christopherson wrote: > On Tue, Apr 04, 2023, Mathias Krause wrote: >> Testing bare metal on a NUC12 (i7-1260P) with kvm.ko loaded with >> force_emulation_prefix=1 and not excluding AC_FEP_BIT from ac_test_bump() >> gives me a runtime of little over 41s with EPT enabled and, funnily, only 9s >> with EPT disabled, as that implicitly excludes the CR4.PKE tests, reducing >> the number of tests to run by a factor of 10 (~38 million tests down do 3.8 >> million). > > Ah, right, fancy new features. Running on an Icelake, i.e. with 5-level paging > and PKRU support, is indeed quite painful. Jepp. > > After much fiddling, I think the best option is to add a separate config entry > to enable FEP, and have that entry be nodefault, i.e. a "manual" testcase. Ditto > for the nVMX #PF variant. That will allow CI and other runners to enable the > test for compatible configs, e.g. when running on bare metal, without causing > problems for existing setups. Well, unless there are setups that do a generic > "-g nodefault", but x86 doesn't currently have any nodefault tests so that's > quite unlikely. Yeah, that works too. I was also thinking of a commandline parameter to allow ac_test_bump() to take the FEP bit into account to allow testing forced emulation, but not by default. > > The only downside is that the CR0.WP testcase will also become manual only. We > could obviously have it ignore the opt-in flag, but there's value is containing > it to the opt-in testcase, e.g. it becomes very obvious that emulation is relevant > to the failure when the FEP version fails but the non-FEP version does not. Well, I think there's much more value in doing the CR0.WP emulation tests by default than there is to bind them to the "manual" test. They're fast, only two accesses, so runtime is no argument here. They also test an important aspects of KVM's MMU role, so we shouldn't bury these behind a "nodefault" flag. I'd rather see the command line option be a toggle for the access bits permutation test only, which are dog slow with forced emulation. > And > I also think we should mark the VPID-based variants nodefault, as they have 4+ > minute runtimes in VMs, i.e. we should encourage use of "-g nodefault" in CI when > appropriate. Ok, no objection from my side, as I have no dog in that fight ;) > > I'll post a v4, there are other cleanups needed in the access test, e.g. the darn > thing doesn't use report_summary() and so actually getting it to report a SKIP is > impossible. I think that's just because it's such an old test that predates the reporting infrastructure. But yeah, it could get some spring cleanup, like dropping the #defines for true and false ;) Thanks, Mathias
diff --git a/x86/access.c b/x86/access.c index a01278451b96..674077297978 100644 --- a/x86/access.c +++ b/x86/access.c @@ -82,6 +82,13 @@ enum { AC_CPU_CR4_SMEP_BIT, AC_CPU_CR4_PKE_BIT, + NR_AC_TEST_FLAGS, + + /* + * synthetic flags follow, won't be exercised by ac_test_exec(). + */ + AC_FEP_BIT, + NR_AC_FLAGS }; @@ -121,6 +128,8 @@ enum { #define AC_CPU_CR4_SMEP_MASK (1 << AC_CPU_CR4_SMEP_BIT) #define AC_CPU_CR4_PKE_MASK (1 << AC_CPU_CR4_PKE_BIT) +#define AC_FEP_MASK (1 << AC_FEP_BIT) + const char *ac_names[] = { [AC_PTE_PRESENT_BIT] = "pte.p", [AC_PTE_ACCESSED_BIT] = "pte.a", @@ -152,6 +161,7 @@ const char *ac_names[] = { [AC_CPU_CR0_WP_BIT] = "cr0.wp", [AC_CPU_CR4_SMEP_BIT] = "cr4.smep", [AC_CPU_CR4_PKE_BIT] = "cr4.pke", + [AC_FEP_BIT] = "fep", }; static inline void *va(pt_element_t phys) @@ -396,7 +406,7 @@ static void ac_test_init(ac_test_t *at, unsigned long virt, ac_pt_env_t *pt_env) static int ac_test_bump_one(ac_test_t *at) { at->flags = ((at->flags | invalid_mask) + 1) & ~invalid_mask; - return at->flags < (1 << NR_AC_FLAGS); + return at->flags < (1 << NR_AC_TEST_FLAGS); } #define F(x) ((flags & x##_MASK) != 0) @@ -799,10 +809,13 @@ static int ac_test_do_access(ac_test_t *at) if (F(AC_ACCESS_TWICE)) { asm volatile ("mov $fixed2, %%rsi \n\t" - "mov (%[addr]), %[reg] \n\t" + "cmp $0, %[fep] \n\t" + "jz 1f \n\t" + KVM_FEP + "1: mov (%[addr]), %[reg] \n\t" "fixed2:" : [reg]"=r"(r), [fault]"=a"(fault), "=b"(e) - : [addr]"r"(at->virt) + : [addr]"r"(at->virt), [fep]"r"(F(AC_FEP)) : "rsi"); fault = 0; } @@ -823,9 +836,15 @@ static int ac_test_do_access(ac_test_t *at) "jnz 2f \n\t" "cmp $0, %[write] \n\t" "jnz 1f \n\t" - "mov (%[addr]), %[reg] \n\t" + "cmp $0, %[fep] \n\t" + "jz 0f \n\t" + KVM_FEP + "0: mov (%[addr]), %[reg] \n\t" "jmp done \n\t" - "1: mov %[reg], (%[addr]) \n\t" + "1: cmp $0, %[fep] \n\t" + "jz 0f \n\t" + KVM_FEP + "0: mov %[reg], (%[addr]) \n\t" "jmp done \n\t" "2: call *%[addr] \n\t" "done: \n" @@ -843,6 +862,7 @@ static int ac_test_do_access(ac_test_t *at) [write]"r"(F(AC_ACCESS_WRITE)), [user]"r"(F(AC_ACCESS_USER)), [fetch]"r"(F(AC_ACCESS_FETCH)), + [fep]"r"(F(AC_FEP)), [user_ds]"i"(USER_DS), [user_cs]"i"(USER_CS), [user_stack_top]"r"(user_stack + sizeof user_stack), @@ -1233,6 +1253,11 @@ int ac_test_run(int pt_levels) invalid_mask |= AC_PTE_BIT36_MASK; } + if (!is_fep_available()) { + printf("FEP not available, skipping emulation tests\n"); + invalid_mask |= AC_FEP_MASK; + } + ac_env_int(&pt_env, pt_levels); ac_test_init(&at, 0xffff923400000000ul, &pt_env);
Add support to enforce access tests to be handled by the emulator, if supported by KVM. Exclude it from the ac_test_exec() test, though, to not slow it down too much. Signed-off-by: Mathias Krause <minipli@grsecurity.net> --- x86/access.c | 35 ++++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-)