Message ID | 20230728001606.2275586-1-mhal@rbox.co (mailing list archive) |
---|---|
Headers | show |
Series | sync_regs() TOCTOU issues | expand |
On Fri, 28 Jul 2023 02:12:56 +0200, Michal Luczaj wrote: > Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they > have exclusive rights to structs they operate on. While this is true when > coming from an ioctl handler (caller makes a local copy of user's data), > sync_regs() breaks this contract; a pointer to a user-modifiable memory > (vcpu->run->s.regs) is provided. This can lead to a situation when incoming > data is checked and/or sanitized only to be re-set by a user thread running > in parallel. > > [...] Applied to kvm-x86 selftests (there are in-flight reworks for selftests that will conflict, and I didn't want to split the testcases from the fix). As mentioned in my reply to patch 2, I split up the selftests patch and massaged things a bit. Please holler if you disagree with any of the changes. Thanks much! [1/4] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues https://github.com/kvm-x86/linux/commit/0d033770d43a [2/4] KVM: selftests: Extend x86's sync_regs_test to check for CR4 races https://github.com/kvm-x86/linux/commit/ae895cbe613a [3/4] KVM: selftests: Extend x86's sync_regs_test to check for event vector races https://github.com/kvm-x86/linux/commit/60c4063b4752 [4/4] KVM: selftests: Extend x86's sync_regs_test to check for exception races https://github.com/kvm-x86/linux/commit/0de704d2d6c8 -- https://github.com/kvm-x86/linux/tree/next https://github.com/kvm-x86/linux/tree/fixes
On Wed, Aug 02, 2023, Sean Christopherson wrote: > On Fri, 28 Jul 2023 02:12:56 +0200, Michal Luczaj wrote: > > Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they > > have exclusive rights to structs they operate on. While this is true when > > coming from an ioctl handler (caller makes a local copy of user's data), > > sync_regs() breaks this contract; a pointer to a user-modifiable memory > > (vcpu->run->s.regs) is provided. This can lead to a situation when incoming > > data is checked and/or sanitized only to be re-set by a user thread running > > in parallel. > > > > [...] > > Applied to kvm-x86 selftests (there are in-flight reworks for selftests > that will conflict, and I didn't want to split the testcases from the fix). > > As mentioned in my reply to patch 2, I split up the selftests patch and > massaged things a bit. Please holler if you disagree with any of the > changes. > > Thanks much! > > [1/4] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues > https://github.com/kvm-x86/linux/commit/0d033770d43a > [2/4] KVM: selftests: Extend x86's sync_regs_test to check for CR4 races > https://github.com/kvm-x86/linux/commit/ae895cbe613a > [3/4] KVM: selftests: Extend x86's sync_regs_test to check for event vector races > https://github.com/kvm-x86/linux/commit/60c4063b4752 > [4/4] KVM: selftests: Extend x86's sync_regs_test to check for exception races > https://github.com/kvm-x86/linux/commit/0de704d2d6c8 Argh, apparently I didn't run these on AMD. The exception injection test hangs because the vCPU hits triple fault shutdown, and because the VMCB is technically undefined on shutdown, KVM synthesizes INIT. That starts the vCPU at the reset vector and it happily fetches zeroes util being killed. This fixes the issue, and I confirmed all three testcases repro the KVM bug with it. I'll post formally tomorrow. --- .../testing/selftests/kvm/x86_64/sync_regs_test.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c index 93fac74ca0a7..55e9b68e6947 100644 --- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c +++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c @@ -94,6 +94,7 @@ static void *race_events_inj_pen(void *arg) for (;;) { WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); WRITE_ONCE(events->flags, 0); + WRITE_ONCE(events->exception.nr, GP_VECTOR); WRITE_ONCE(events->exception.injected, 1); WRITE_ONCE(events->exception.pending, 1); @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg) for (;;) { WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); WRITE_ONCE(events->flags, 0); + WRITE_ONCE(events->exception.nr, GP_VECTOR); WRITE_ONCE(events->exception.pending, 1); WRITE_ONCE(events->exception.nr, 255); @@ -152,6 +154,7 @@ static noinline void *race_sregs_cr4(void *arg) static void race_sync_regs(void *racer) { const time_t TIMEOUT = 2; /* seconds, roughly */ + struct kvm_x86_state *state; struct kvm_translation tr; struct kvm_vcpu *vcpu; struct kvm_run *run; @@ -178,8 +181,17 @@ static void race_sync_regs(void *racer) TEST_ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0); + state = vcpu_save_state(vcpu); + for (t = time(NULL) + TIMEOUT; time(NULL) < t;) { - __vcpu_run(vcpu); + /* + * Reload known good state if the vCPU triple faults, e.g. due + * to the unhandled #GPs being injected. VMX preserves state + * on shutdown, but SVM synthesizes an INIT as the VMCB state + * is architecturally undefined on triple fault. + */ + if (!__vcpu_run(vcpu) && run->exit_reason == KVM_EXIT_SHUTDOWN) + vcpu_load_state(vcpu, state); if (racer == race_sregs_cr4) { tr = (struct kvm_translation) { .linear_address = 0 }; @@ -190,6 +202,7 @@ static void race_sync_regs(void *racer) TEST_ASSERT_EQ(pthread_cancel(thread), 0); TEST_ASSERT_EQ(pthread_join(thread, NULL), 0); + kvm_x86_state_cleanup(state); kvm_vm_free(vm); } base-commit: 722b2afc50abbfaa74accbc52911f9b5e8719c95 --
On 8/15/23 02:48, Sean Christopherson wrote: > ... > Argh, apparently I didn't run these on AMD. The exception injection test hangs > because the vCPU hits triple fault shutdown, and because the VMCB is technically > undefined on shutdown, KVM synthesizes INIT. That starts the vCPU at the reset > vector and it happily fetches zeroes util being killed. Thank you for getting this. I should have mentioned, due to lack of access to AMD hardware, I've only tested on Intel. > @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg) > for (;;) { > WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); > WRITE_ONCE(events->flags, 0); > + WRITE_ONCE(events->exception.nr, GP_VECTOR); > WRITE_ONCE(events->exception.pending, 1); > WRITE_ONCE(events->exception.nr, 255); Here you're setting events->exception.nr twice. Is it deliberate? Thanks again, Michal
On Tue, Aug 15, 2023, Michal Luczaj wrote: > On 8/15/23 02:48, Sean Christopherson wrote: > > ... > > Argh, apparently I didn't run these on AMD. The exception injection test hangs > > because the vCPU hits triple fault shutdown, and because the VMCB is technically > > undefined on shutdown, KVM synthesizes INIT. That starts the vCPU at the reset > > vector and it happily fetches zeroes util being killed. > > Thank you for getting this. I should have mentioned, due to lack of access to > AMD hardware, I've only tested on Intel. > > > @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg) > > for (;;) { > > WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); > > WRITE_ONCE(events->flags, 0); > > + WRITE_ONCE(events->exception.nr, GP_VECTOR); > > WRITE_ONCE(events->exception.pending, 1); > > WRITE_ONCE(events->exception.nr, 255); > > Here you're setting events->exception.nr twice. Is it deliberate? Heh, yes and no. It's partly leftover from a brief attempt to gracefully eat the fault in the guest. However, unless there's magic I'm missing, race_events_exc() needs to set a "good" vector in every iteration, otherwise only the first iteration will be able to hit the "check good, consume bad" scenario. For race_events_inj_pen(), it should be sufficient to set the vector just once, outside of the loop. I do think it should be explicitly set, as subtly relying on '0' being a valid exception is a bit mean (though it does work).
On 8/15/23 17:40, Sean Christopherson wrote: > On Tue, Aug 15, 2023, Michal Luczaj wrote: >>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg) >>> for (;;) { >>> WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); >>> WRITE_ONCE(events->flags, 0); >>> + WRITE_ONCE(events->exception.nr, GP_VECTOR); >>> WRITE_ONCE(events->exception.pending, 1); >>> WRITE_ONCE(events->exception.nr, 255); >> >> Here you're setting events->exception.nr twice. Is it deliberate? > > Heh, yes and no. It's partly leftover from a brief attempt to gracefully eat the > fault in the guest. > > However, unless there's magic I'm missing, race_events_exc() needs to set a "good" > vector in every iteration, otherwise only the first iteration will be able to hit > the "check good, consume bad" scenario. I think I understand what you mean. I see things slightly different: because if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) { ... } else { events->exception.pending = 0; events->exception_has_payload = 0; } zeroes exception.pending on every iteration, even though exception.nr may already be > 31, KVM does not necessary return -EINVAL at if ((events->exception.injected || events->exception.pending) && (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR)) return -EINVAL; It would if the racer set exception.pending before this check, but if it does it after the check, then KVM goes vcpu->arch.exception.pending = events->exception.pending; vcpu->arch.exception.vector = events->exception.nr; which later triggers the WARN. That said, if I you think setting and re-setting exception.nr is more efficient (as in: racy), I'm all for it. > For race_events_inj_pen(), it should be sufficient to set the vector just once, > outside of the loop. I do think it should be explicitly set, as subtly relying > on '0' being a valid exception is a bit mean (though it does work). Sure, I get it.
On Tue, Aug 15, 2023, Michal Luczaj wrote: > On 8/15/23 17:40, Sean Christopherson wrote: > > On Tue, Aug 15, 2023, Michal Luczaj wrote: > >>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg) > >>> for (;;) { > >>> WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); > >>> WRITE_ONCE(events->flags, 0); > >>> + WRITE_ONCE(events->exception.nr, GP_VECTOR); > >>> WRITE_ONCE(events->exception.pending, 1); > >>> WRITE_ONCE(events->exception.nr, 255); > >> > >> Here you're setting events->exception.nr twice. Is it deliberate? > > > > Heh, yes and no. It's partly leftover from a brief attempt to gracefully eat the > > fault in the guest. > > > > However, unless there's magic I'm missing, race_events_exc() needs to set a "good" > > vector in every iteration, otherwise only the first iteration will be able to hit > > the "check good, consume bad" scenario. > > I think I understand what you mean. I see things slightly different: because > > if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) { > ... > } else { > events->exception.pending = 0; > events->exception_has_payload = 0; > } > > zeroes exception.pending on every iteration, even though exception.nr may > already be > 31, KVM does not necessary return -EINVAL at > > if ((events->exception.injected || events->exception.pending) && > (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR)) > return -EINVAL; > > It would if the racer set exception.pending before this check, but if it does it > after the check, then KVM goes > > vcpu->arch.exception.pending = events->exception.pending; > vcpu->arch.exception.vector = events->exception.nr; > > which later triggers the WARN. That said, if I you think setting and re-setting > exception.nr is more efficient (as in: racy), I'm all for it. My goal isn't to make it easier to hit the *known* TOCTOU, it's to make the test more valuable after that known bug has been fixed. I.e. I don't want to rely on KVM to update kvm_run (which was arguably a bug even if there weren't a TOCTOU issue). It's kinda silly, because realistically this test is likely only ever going to find TOCTOU bugs, but so long as the test can consistently the known bug, my preference is to make it as "generic" as possible from a coverage perspective.
On 8/15/23 20:15, Sean Christopherson wrote: > On Tue, Aug 15, 2023, Michal Luczaj wrote: >> On 8/15/23 17:40, Sean Christopherson wrote: >>> On Tue, Aug 15, 2023, Michal Luczaj wrote: >>>>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg) >>>>> for (;;) { >>>>> WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS); >>>>> WRITE_ONCE(events->flags, 0); >>>>> + WRITE_ONCE(events->exception.nr, GP_VECTOR); >>>>> WRITE_ONCE(events->exception.pending, 1); >>>>> WRITE_ONCE(events->exception.nr, 255); >>>> >>>> Here you're setting events->exception.nr twice. Is it deliberate? >>> >>> Heh, yes and no. It's partly leftover from a brief attempt to gracefully eat the >>> fault in the guest. >>> >>> However, unless there's magic I'm missing, race_events_exc() needs to set a "good" >>> vector in every iteration, otherwise only the first iteration will be able to hit >>> the "check good, consume bad" scenario. >> >> I think I understand what you mean. I see things slightly different: because >> >> if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) { >> ... >> } else { >> events->exception.pending = 0; >> events->exception_has_payload = 0; >> } >> >> zeroes exception.pending on every iteration, even though exception.nr may >> already be > 31, KVM does not necessary return -EINVAL at >> >> if ((events->exception.injected || events->exception.pending) && >> (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR)) >> return -EINVAL; >> >> It would if the racer set exception.pending before this check, but if it does it >> after the check, then KVM goes >> >> vcpu->arch.exception.pending = events->exception.pending; >> vcpu->arch.exception.vector = events->exception.nr; >> >> which later triggers the WARN. That said, if I you think setting and re-setting >> exception.nr is more efficient (as in: racy), I'm all for it. > > My goal isn't to make it easier to hit the *known* TOCTOU, it's to make the test > more valuable after that known bug has been fixed. Aha! Yup, turns out I did not understand what you meant after all :) Sorry. > I.e. I don't want to rely on > KVM to update kvm_run (which was arguably a bug even if there weren't a TOCTOU > issue). It's kinda silly, because realistically this test is likely only ever > going to find TOCTOU bugs, but so long as the test can consistently the known bug, > my preference is to make it as "generic" as possible from a coverage perspective. Sure, that makes sense.