Message ID | 20230728001606.2275586-3-mhal@rbox.co (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sync_regs() TOCTOU issues | expand |
On Fri, Jul 28, 2023, Michal Luczaj wrote: > Attempt to modify vcpu->run->s.regs _after_ the sanity checks performed by > KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:sync_regs(). This could lead to some > nonsensical vCPU states accompanied by kernel splats. > > Signed-off-by: Michal Luczaj <mhal@rbox.co> > --- > .../selftests/kvm/x86_64/sync_regs_test.c | 124 ++++++++++++++++++ > 1 file changed, 124 insertions(+) > > diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c > index 2da89fdc2471..feebc7d44c17 100644 > --- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c > +++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c > @@ -15,12 +15,14 @@ > #include <stdlib.h> > #include <string.h> > #include <sys/ioctl.h> > +#include <pthread.h> > > #include "test_util.h" > #include "kvm_util.h" > #include "processor.h" > > #define UCALL_PIO_PORT ((uint16_t)0x1000) > +#define TIMEOUT 2 /* seconds, roughly */ I think it makes sense to make this a const in race_sync_regs(), that way its usage is a bit more obvious. > struct ucall uc_none = { > .cmd = UCALL_NONE, > @@ -80,6 +82,124 @@ static void compare_vcpu_events(struct kvm_vcpu_events *left, > #define TEST_SYNC_FIELDS (KVM_SYNC_X86_REGS|KVM_SYNC_X86_SREGS|KVM_SYNC_X86_EVENTS) > #define INVALID_SYNC_FIELD 0x80000000 > > +/* > + * WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm] > + * > + * arch/x86/kvm/x86.c:kvm_check_and_inject_events(): > + * WARN_ON_ONCE(vcpu->arch.exception.injected && > + * vcpu->arch.exception.pending); > + */ For comments in selftests, describe what's happening without referencing KVM code, things like this in particular will become stale sooner than later. It's a-ok (and encouraged) to put the WARNs and function references in changelogs though, as those are explicitly tied to a specific time in history. > +static void race_sync_regs(void *racer, bool poke_mmu) > +{ > + struct kvm_translation tr; > + struct kvm_vcpu *vcpu; > + struct kvm_run *run; > + struct kvm_vm *vm; > + pthread_t thread; > + time_t t; > + > + vm = vm_create_with_one_vcpu(&vcpu, guest_code); > + run = vcpu->run; > + > + run->kvm_valid_regs = KVM_SYNC_X86_SREGS; > + vcpu_run(vcpu); > + TEST_REQUIRE(run->s.regs.sregs.cr4 & X86_CR4_PAE); This can be an assert, and should also check EFER.LME. Jump-starting in long mode is a property of selftests, i.e. not something that should ever randomly "fail". > + run->kvm_valid_regs = 0; > + > + ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0); > + > + for (t = time(NULL) + TIMEOUT; time(NULL) < t;) { > + __vcpu_run(vcpu); > + > + if (poke_mmu) { Rather than pass a boolean, I think it makes sense to do if (racer == race_sregs_cr4) It's arguably just trading ugliness for subtlety, but IMO it's worth avoiding the boolean. > + tr = (struct kvm_translation) { .linear_address = 0 }; > + __vcpu_ioctl(vcpu, KVM_TRANSLATE, &tr); > + } > + } > + > + ASSERT_EQ(pthread_cancel(thread), 0); > + ASSERT_EQ(pthread_join(thread, NULL), 0); > + > + /* > + * If kvm->bugged then we won't survive TEST_ASSERT(). Leak. > + * > + * kvm_vm_free() > + * __vm_mem_region_delete() > + * vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, ®ion->region) > + * _vm_ioctl(vm, cmd, #cmd, arg) > + * TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)) > + */ We want the assert, it makes failures explicit. The signature is a bit unfortunate, but the WARN in the kernel log should provide a big clue. > + if (!poke_mmu) > + kvm_vm_free(vm); > +} > + > int main(int argc, char *argv[]) > { > struct kvm_vcpu *vcpu; > @@ -218,5 +338,9 @@ int main(int argc, char *argv[]) > > kvm_vm_free(vm); > > + race_sync_regs(race_sregs_cr4, true); > + race_sync_regs(race_events_exc, false); > + race_sync_regs(race_events_inj_pen, false); I'll fix up all of the above when applying, and will also split this into three patches, mostly so that each splat can be covered in a changelog, i.e. is tied to its testcase.
On 8/2/23 23:07, Sean Christopherson wrote: > On Fri, Jul 28, 2023, Michal Luczaj wrote: >> +#define TIMEOUT 2 /* seconds, roughly */ > > I think it makes sense to make this a const in race_sync_regs(), that way its > usage is a bit more obvious. Yeah, sure. >> +/* >> + * WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm] >> + * >> + * arch/x86/kvm/x86.c:kvm_check_and_inject_events(): >> + * WARN_ON_ONCE(vcpu->arch.exception.injected && >> + * vcpu->arch.exception.pending); >> + */ > > For comments in selftests, describe what's happening without referencing KVM code, > things like this in particular will become stale sooner than later. It's a-ok > (and encouraged) to put the WARNs and function references in changelogs though, > as those are explicitly tied to a specific time in history. Right, I'll try to remember. Actually, those comments were notes for myself and then I've just left them thinking they can't hurt. But I agree that wasn't the best idea. >> +static void race_sync_regs(void *racer, bool poke_mmu) >> +{ >> + struct kvm_translation tr; >> + struct kvm_vcpu *vcpu; >> + struct kvm_run *run; >> + struct kvm_vm *vm; >> + pthread_t thread; >> + time_t t; >> + >> + vm = vm_create_with_one_vcpu(&vcpu, guest_code); >> + run = vcpu->run; >> + >> + run->kvm_valid_regs = KVM_SYNC_X86_SREGS; >> + vcpu_run(vcpu); >> + TEST_REQUIRE(run->s.regs.sregs.cr4 & X86_CR4_PAE); > > This can be an assert, and should also check EFER.LME. Jump-starting in long mode > is a property of selftests, i.e. not something that should ever randomly "fail". Right, sorry for the misuse. >> + run->kvm_valid_regs = 0; >> + >> + ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0); >> + >> + for (t = time(NULL) + TIMEOUT; time(NULL) < t;) { >> + __vcpu_run(vcpu); >> + >> + if (poke_mmu) { > > Rather than pass a boolean, I think it makes sense to do > > if (racer == race_sregs_cr4) > > It's arguably just trading ugliness for subtlety, but IMO it's worth avoiding > the boolean. Ah, ok. >> + /* >> + * If kvm->bugged then we won't survive TEST_ASSERT(). Leak. >> + * >> + * kvm_vm_free() >> + * __vm_mem_region_delete() >> + * vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, ®ion->region) >> + * _vm_ioctl(vm, cmd, #cmd, arg) >> + * TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)) >> + */ > > We want the assert, it makes failures explicit. The signature is a bit unfortunate, > but the WARN in the kernel log should provide a big clue. Sure, I get it. And not that there is a way to check if VM is bugged/dead? > I'll fix up all of the above when applying, and will also split this into three > patches, mostly so that each splat can be covered in a changelog, i.e. is tied > to its testcase. Great, thank you for all the comments and fixes! Michal
On Thu, Aug 03, 2023, Michal Luczaj wrote: > On 8/2/23 23:07, Sean Christopherson wrote: > > On Fri, Jul 28, 2023, Michal Luczaj wrote: > >> + /* > >> + * If kvm->bugged then we won't survive TEST_ASSERT(). Leak. > >> + * > >> + * kvm_vm_free() > >> + * __vm_mem_region_delete() > >> + * vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, ®ion->region) > >> + * _vm_ioctl(vm, cmd, #cmd, arg) > >> + * TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)) > >> + */ > > > > We want the assert, it makes failures explicit. The signature is a bit unfortunate, > > but the WARN in the kernel log should provide a big clue. > > Sure, I get it. And not that there is a way to check if VM is bugged/dead? KVM doesn't expost the bugged/dead information, though I suppose userspace could probe that information by doing an ioctl() that is guaranteed to succeeed and looking for -EIO, e.g. KVM_CHECK_EXTENSION on the VM. I was going to say that it's not worth trying to detect a bugged/dead VM in selftests, because it requires having the pointer to the VM, and that's not typically available when an assert fails, but the obviously solution is to tap into the VM and vCPU ioctl() helpers. That's also good motivation to add helpers and consolidate asserts for ioctls() that return fds, i.e. for which a positive return is considered success. With the below (partial conversion), the failing testcase yields this. Using a heuristic isn't ideal, but practically speaking I can't see a way for the -EIO check to go awry, and anything to make debugging errors easier is definitely worth doing IMO. ==== Test Assertion Failure ==== lib/kvm_util.c:689: false pid=80347 tid=80347 errno=5 - Input/output error 1 0x00000000004039ab: __vm_mem_region_delete at kvm_util.c:689 (discriminator 5) 2 0x0000000000404660: kvm_vm_free at kvm_util.c:724 (discriminator 12) 3 0x0000000000402ac9: race_sync_regs at sync_regs_test.c:193 4 0x0000000000401cb7: main at sync_regs_test.c:334 (discriminator 6) 5 0x0000000000418263: __libc_start_call_main at libc-start.o:? 6 0x00000000004198af: __libc_start_main_impl at ??:? 7 0x0000000000401d90: _start at ??:? KVM killed/bugged the VM, check kernel log for clues diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 07732a157ccd..e48ac57be13a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -258,17 +258,42 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { } kvm_do_ioctl((vm)->fd, cmd, arg); \ }) +/* + * Assert that a VM or vCPU ioctl() succeeded (obviously), with extra magic to + * detect if the ioctl() failed because KVM killed/bugged the VM. To detect a + * dead VM, probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM + * since before selftests existed and (b) should never outright fail, i.e. is + * supposed to return 0 or 1. If KVM kills a VM, KVM returns -EIO for all + * ioctl()s for the VM and its vCPUs, including KVM_CHECK_EXTENSION. + */ +#define TEST_ASSERT_VM_VCPU_IOCTL_SUCCESS(name, ret, vm) \ +do { \ + int __errno = errno; \ + \ + static_assert_is_vm(vm); \ + \ + if (!ret) \ + break; \ + \ + if (errno == EIO && \ + __vm_ioctl(vm, KVM_CHECK_EXTENSION, (void *)KVM_CAP_USER_MEMORY) < 0) { \ + TEST_ASSERT(errno == EIO, "KVM killed the VM, should return -EIO"); \ + TEST_FAIL("KVM killed/bugged the VM, check kernel log for clues"); \ + } \ + errno = __errno; \ + TEST_FAIL(__KVM_IOCTL_ERROR(name, ret)); \ +} while (0) + #define _vm_ioctl(vm, cmd, name, arg) \ ({ \ int ret = __vm_ioctl(vm, cmd, arg); \ \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)); \ + TEST_ASSERT_VM_VCPU_IOCTL_SUCCESS(name, ret, vm); \ }) #define vm_ioctl(vm, cmd, arg) \ _vm_ioctl(vm, cmd, #cmd, arg) - static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } #define __vcpu_ioctl(vcpu, cmd, arg) \ @@ -281,7 +306,7 @@ static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } ({ \ int ret = __vcpu_ioctl(vcpu, cmd, arg); \ \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)); \ + TEST_ASSERT_VM_VCPU_IOCTL_SUCCESS(name, ret, vcpu->vm); \ }) #define vcpu_ioctl(vcpu, cmd, arg) \
On 8/3/23 18:41, Sean Christopherson wrote: > KVM doesn't expost the bugged/dead information, though I suppose userspace could > probe that information by doing an ioctl() that is guaranteed to succeeed and > looking for -EIO, e.g. KVM_CHECK_EXTENSION on the VM. > > I was going to say that it's not worth trying to detect a bugged/dead VM in > selftests, because it requires having the pointer to the VM, and that's not > typically available when an assert fails, but the obviously solution is to tap > into the VM and vCPU ioctl() helpers. That's also good motivation to add helpers > and consolidate asserts for ioctls() that return fds, i.e. for which a positive > return is considered success. > > With the below (partial conversion), the failing testcase yields this. Using a > heuristic isn't ideal, but practically speaking I can't see a way for the -EIO > check to go awry, and anything to make debugging errors easier is definitely worth > doing IMO. > > ==== Test Assertion Failure ==== > lib/kvm_util.c:689: false > pid=80347 tid=80347 errno=5 - Input/output error > 1 0x00000000004039ab: __vm_mem_region_delete at kvm_util.c:689 (discriminator 5) > 2 0x0000000000404660: kvm_vm_free at kvm_util.c:724 (discriminator 12) > 3 0x0000000000402ac9: race_sync_regs at sync_regs_test.c:193 > 4 0x0000000000401cb7: main at sync_regs_test.c:334 (discriminator 6) > 5 0x0000000000418263: __libc_start_call_main at libc-start.o:? > 6 0x00000000004198af: __libc_start_main_impl at ??:? > 7 0x0000000000401d90: _start at ??:? > KVM killed/bugged the VM, check kernel log for clues Yes, such automatic reporting of dead VMs is a really nice feature. > diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h > index 07732a157ccd..e48ac57be13a 100644 > --- a/tools/testing/selftests/kvm/include/kvm_util_base.h > +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h > @@ -258,17 +258,42 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { } > kvm_do_ioctl((vm)->fd, cmd, arg); \ > }) > > +/* > + * Assert that a VM or vCPU ioctl() succeeded (obviously), with extra magic to > + * detect if the ioctl() failed because KVM killed/bugged the VM. To detect a > + * dead VM, probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM > + * since before selftests existed and (b) should never outright fail, i.e. is > + * supposed to return 0 or 1. If KVM kills a VM, KVM returns -EIO for all > + * ioctl()s for the VM and its vCPUs, including KVM_CHECK_EXTENSION. > + */ Do you think it's worth mentioning the ioctl() always returning -EIO in case of kvm->mm != current->mm? I suppose that's something purely hypothetical in this context. thanks, Michal
On Thu, Aug 03, 2023, Michal Luczaj wrote: > On 8/3/23 18:41, Sean Christopherson wrote: > > +/* > > + * Assert that a VM or vCPU ioctl() succeeded (obviously), with extra magic to > > + * detect if the ioctl() failed because KVM killed/bugged the VM. To detect a > > + * dead VM, probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM > > + * since before selftests existed and (b) should never outright fail, i.e. is > > + * supposed to return 0 or 1. If KVM kills a VM, KVM returns -EIO for all > > + * ioctl()s for the VM and its vCPUs, including KVM_CHECK_EXTENSION. > > + */ > > Do you think it's worth mentioning the ioctl() always returning -EIO in case of > kvm->mm != current->mm? I suppose that's something purely hypothetical in this > context. Hmm, probably not? Practically speaking, that scenario should really only ever happen when someone is developing a new selftest. Though I suppose a blurb in the comment wouldn't hurt.
diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c index 2da89fdc2471..feebc7d44c17 100644 --- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c +++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c @@ -15,12 +15,14 @@ #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> +#include <pthread.h> #include "test_util.h" #include "kvm_util.h" #include "processor.h" #define UCALL_PIO_PORT ((uint16_t)0x1000) +#define TIMEOUT 2 /* seconds, roughly */ struct ucall uc_none = { .cmd = UCALL_NONE, @@ -80,6 +82,124 @@ static void compare_vcpu_events(struct kvm_vcpu_events *left, #define TEST_SYNC_FIELDS (KVM_SYNC_X86_REGS|KVM_SYNC_X86_SREGS|KVM_SYNC_X86_EVENTS) #define INVALID_SYNC_FIELD 0x80000000 +/* + * WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm] + * + * arch/x86/kvm/x86.c:kvm_check_and_inject_events(): + * WARN_ON_ONCE(vcpu->arch.exception.injected && + * vcpu->arch.exception.pending); + */ +static void *race_events_inj_pen(void *arg) +{ + struct kvm_run *run = (struct kvm_run *)arg; + struct kvm_vcpu_events *events = &run->s.regs.events; + + for (;;) { + WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); + WRITE_ONCE(events->flags, 0); + WRITE_ONCE(events->exception.injected, 1); + WRITE_ONCE(events->exception.pending, 1); + + pthread_testcancel(); + } + + return NULL; +} + +/* + * WARNING: CPU: 0 PID: 1107 at arch/x86/kvm/x86.c:547 kvm_check_and_inject_events+0x4a0/0x500 [kvm] + * + * arch/x86/kvm/x86.c:exception_type(): + * WARN_ON(vector > 31 || vector == NMI_VECTOR) + */ +static void *race_events_exc(void *arg) +{ + struct kvm_run *run = (struct kvm_run *)arg; + struct kvm_vcpu_events *events = &run->s.regs.events; + + for (;;) { + WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); + WRITE_ONCE(events->flags, 0); + WRITE_ONCE(events->exception.pending, 1); + WRITE_ONCE(events->exception.nr, 255); + + pthread_testcancel(); + } + + return NULL; +} + +/* + * WARNING: CPU: 0 PID: 1142 at arch/x86/kvm/mmu/paging_tmpl.h:358 paging32_walk_addr_generic+0x431/0x8f0 [kvm] + * + * arch/x86/kvm/mmu/paging_tmpl.h: + * KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm) + */ +static void *race_sregs_cr4(void *arg) +{ + struct kvm_run *run = (struct kvm_run *)arg; + __u64 *cr4 = &run->s.regs.sregs.cr4; + __u64 pae_enabled = *cr4; + __u64 pae_disabled = *cr4 & ~X86_CR4_PAE; + + for (;;) { + WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_SREGS); + WRITE_ONCE(*cr4, pae_enabled); + asm volatile(".rept 512\n\t" + "nop\n\t" + ".endr"); + WRITE_ONCE(*cr4, pae_disabled); + + pthread_testcancel(); + } + + return NULL; +} + +static void race_sync_regs(void *racer, bool poke_mmu) +{ + struct kvm_translation tr; + struct kvm_vcpu *vcpu; + struct kvm_run *run; + struct kvm_vm *vm; + pthread_t thread; + time_t t; + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + run = vcpu->run; + + run->kvm_valid_regs = KVM_SYNC_X86_SREGS; + vcpu_run(vcpu); + TEST_REQUIRE(run->s.regs.sregs.cr4 & X86_CR4_PAE); + run->kvm_valid_regs = 0; + + ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0); + + for (t = time(NULL) + TIMEOUT; time(NULL) < t;) { + __vcpu_run(vcpu); + + if (poke_mmu) { + tr = (struct kvm_translation) { .linear_address = 0 }; + __vcpu_ioctl(vcpu, KVM_TRANSLATE, &tr); + } + } + + ASSERT_EQ(pthread_cancel(thread), 0); + ASSERT_EQ(pthread_join(thread, NULL), 0); + + /* + * If kvm->bugged then we won't survive TEST_ASSERT(). Leak. + * + * kvm_vm_free() + * __vm_mem_region_delete() + * vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, ®ion->region) + * _vm_ioctl(vm, cmd, #cmd, arg) + * TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)) + */ + if (!poke_mmu) + kvm_vm_free(vm); +} + int main(int argc, char *argv[]) { struct kvm_vcpu *vcpu; @@ -218,5 +338,9 @@ int main(int argc, char *argv[]) kvm_vm_free(vm); + race_sync_regs(race_sregs_cr4, true); + race_sync_regs(race_events_exc, false); + race_sync_regs(race_events_inj_pen, false); + return 0; }
Attempt to modify vcpu->run->s.regs _after_ the sanity checks performed by KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:sync_regs(). This could lead to some nonsensical vCPU states accompanied by kernel splats. Signed-off-by: Michal Luczaj <mhal@rbox.co> --- .../selftests/kvm/x86_64/sync_regs_test.c | 124 ++++++++++++++++++ 1 file changed, 124 insertions(+)