Message ID | 20230728001606.2275586-2-mhal@rbox.co (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sync_regs() TOCTOU issues | expand |
On Fri, Jul 28, 2023, Michal Luczaj wrote: > In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed > __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own > copy of data. > > Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they > have exclusive rights to structs they operate on. While this is true when > coming from an ioctl handler (caller makes a local copy of user's data), > sync_regs() breaks this contract; a pointer to a user-modifiable memory > (vcpu->run->s.regs) is provided. This can lead to a situation when incoming > data is checked and/or sanitized only to be re-set by a user thread running > in parallel. LOL, the really hilarious part is that the guilty, Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS") also added this comment... /* kvm_sync_regs struct included by kvm_run struct */ struct kvm_sync_regs { /* Members of this structure are potentially malicious. * Care must be taken by code reading, esp. interpreting, * data fields from them inside KVM to prevent TOCTOU and * double-fetch types of vulnerabilities. */ struct kvm_regs regs; struct kvm_sregs sregs; struct kvm_vcpu_events events; }; though Radim did remove something so maybe the comment isn't as ironic as it looks. [Removed wrapper around check for reserved kvm_valid_regs. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Anyways... > Signed-off-by: Michal Luczaj <mhal@rbox.co> > --- > A note: when servicing kvm_run->kvm_dirty_regs, changes made by > __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of > vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this > ok? I would be amazed if anyone cares. Given the justification and the author, This reduces ioctl overhead which is particularly important when userspace is making synchronous guest state modifications (e.g. when emulating and/or intercepting instructions). Signed-off-by: Ken Hofsass <hofsass@google.com> I am pretty sure this was added to optimize a now-abandoned Google effort to do emulation in uesrspace. I bring that up because I was going to suggest that we might be able to get away with a straight revert, as QEMU doesn't use the flag and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g. github, so sadly I think we're stuck with the feature :-(
On 8/1/23 01:49, Sean Christopherson wrote: > On Fri, Jul 28, 2023, Michal Luczaj wrote: >> Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they >> have exclusive rights to structs they operate on. While this is true when >> coming from an ioctl handler (caller makes a local copy of user's data), >> sync_regs() breaks this contract; a pointer to a user-modifiable memory >> (vcpu->run->s.regs) is provided. This can lead to a situation when incoming >> data is checked and/or sanitized only to be re-set by a user thread running >> in parallel. > > LOL, the really hilarious part is that the guilty, > > Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS") > > also added this comment... > > /* kvm_sync_regs struct included by kvm_run struct */ > struct kvm_sync_regs { > /* Members of this structure are potentially malicious. > * Care must be taken by code reading, esp. interpreting, > * data fields from them inside KVM to prevent TOCTOU and > * double-fetch types of vulnerabilities. > */ > struct kvm_regs regs; > struct kvm_sregs sregs; > struct kvm_vcpu_events events; > }; > > though Radim did remove something so maybe the comment isn't as ironic as it looks. > > [Removed wrapper around check for reserved kvm_valid_regs. - Radim] > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> > > Anyways... Nah, from what I can see, it wasn't Radim's tweak that introduced the TOCTOUs[1]. [1] https://lore.kernel.org/kvm/20180202210434.GC27896@flask/ >> A note: when servicing kvm_run->kvm_dirty_regs, changes made by >> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of >> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this >> ok? > > I would be amazed if anyone cares. Given the justification and the author, > > This reduces ioctl overhead which is particularly important when userspace > is making synchronous guest state modifications (e.g. when emulating and/or > intercepting instructions). > > Signed-off-by: Ken Hofsass <hofsass@google.com> > > I am pretty sure this was added to optimize a now-abandoned Google effort to do > emulation in uesrspace. I bring that up because I was going to suggest that we > might be able to get away with a straight revert, as QEMU doesn't use the flag > and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g. > github, so sadly I think we're stuck with the feature :-( All right, so assuming the revert is not happening and the API is not misused (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date values in vcpu->run->s.regs), is assignment copying struct kvm_vcpu_events events = vcpu->run->s.regs.events; the right approach or should it be a memcpy(), like in ioctl handlers? thanks, Michal
On Tue, Aug 01, 2023, Michal Luczaj wrote: > On 8/1/23 01:49, Sean Christopherson wrote: > >> A note: when servicing kvm_run->kvm_dirty_regs, changes made by > >> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of > >> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this > >> ok? > > > > I would be amazed if anyone cares. Given the justification and the author, > > > > This reduces ioctl overhead which is particularly important when userspace > > is making synchronous guest state modifications (e.g. when emulating and/or > > intercepting instructions). > > > > Signed-off-by: Ken Hofsass <hofsass@google.com> > > > > I am pretty sure this was added to optimize a now-abandoned Google effort to do > > emulation in uesrspace. I bring that up because I was going to suggest that we > > might be able to get away with a straight revert, as QEMU doesn't use the flag > > and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g. > > github, so sadly I think we're stuck with the feature :-( > > All right, so assuming the revert is not happening and the API is not misused > (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date > values in vcpu->run->s.regs), is assignment copying > > struct kvm_vcpu_events events = vcpu->run->s.regs.events; > > the right approach or should it be a memcpy(), like in ioctl handlers? Both approaches are fine, though I am gaining a preference for the copy-by-value method. With gcc-12 and probably most compilers, the code generation is identical for both as the compiler generates a call to memcpy() to handle the the struct assignment. The advantage of copy-by-value for structs, and why I think I now prefer it, is that it provides type safety. E.g. this compiles without complaint memcpy(&events, &vcpu->run->s.regs.sregs, sizeof(events)); whereas this struct kvm_vcpu_events events = vcpu->run->s.regs.sregs; yields arch/x86/kvm/x86.c: In function ‘sync_regs’: arch/x86/kvm/x86.c:11793:49: error: invalid initializer 11793 | struct kvm_vcpu_events events = vcpu->run->s.regs.sregs; | ^~~~ The downside is that it's less obvious when reading the code that there is a large-ish memcpy happening, but IMO it's worth gaining the type safety.
On 8/2/23 21:18, Sean Christopherson wrote: > On Tue, Aug 01, 2023, Michal Luczaj wrote: >> All right, so assuming the revert is not happening and the API is not misused >> (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date >> values in vcpu->run->s.regs), is assignment copying >> >> struct kvm_vcpu_events events = vcpu->run->s.regs.events; >> >> the right approach or should it be a memcpy(), like in ioctl handlers? > > Both approaches are fine, though I am gaining a preference for the copy-by-value > method. With gcc-12 and probably most compilers, the code generation is identical > for both as the compiler generates a call to memcpy() to handle the the struct > assignment. > > The advantage of copy-by-value for structs, and why I think I now prefer it, is > that it provides type safety. E.g. this compiles without complaint > > memcpy(&events, &vcpu->run->s.regs.sregs, sizeof(events)); > > whereas this > > struct kvm_vcpu_events events = vcpu->run->s.regs.sregs; > > yields > > arch/x86/kvm/x86.c: In function ‘sync_regs’: > arch/x86/kvm/x86.c:11793:49: error: invalid initializer > 11793 | struct kvm_vcpu_events events = vcpu->run->s.regs.sregs; > | ^~~~ > > The downside is that it's less obvious when reading the code that there is a > large-ish memcpy happening, but IMO it's worth gaining the type safety. Sure, that makes sense. I was a bit concerned how a padding within a struct might affect performance of such copy-by-value, but (obviously?) there's no padding in kvm_sregs, nor kvm_vcpu_events... Anyway, while there, could you take a look at __set_sregs_common()? *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0); vcpu->arch.cr0 = sregs->cr0; That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0() take care of it, but I may be missing something (even if selftests pass with that line removed). thanks, Michal
On 8/3/23 02:13, Michal Luczaj wrote: > Anyway, while there, could you take a look at __set_sregs_common()? > > *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; > static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0); > vcpu->arch.cr0 = sregs->cr0; > > That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0() > take care of it, but I may be missing something (even if selftests pass with > that line removed). kvm_set_cr0 assumes that the static call sets vcpu->arch.cr0, so indeed it can be removed: static_call(kvm_x86_set_cr0)(vcpu, cr0); kvm_post_set_cr0(vcpu, old_cr0, cr0); return 0; Neither __set_sregs_common nor its callers does not call kvm_post_set_cr0... Not great, even though most uses of KVM_SET_SREGS are probably limited to reset in most "usual" VMMs. It's probably enough to replace this line: *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; with a call to the function just before __set_sregs_common returns. Paolo
On 8/3/23 19:48, Paolo Bonzini wrote: > On 8/3/23 02:13, Michal Luczaj wrote: >> Anyway, while there, could you take a look at __set_sregs_common()? >> >> *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; >> static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0); >> vcpu->arch.cr0 = sregs->cr0; >> >> That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0() >> take care of it, but I may be missing something (even if selftests pass with >> that line removed). > > kvm_set_cr0 assumes that the static call sets vcpu->arch.cr0, so indeed > it can be removed: I guess the same can be done in enter_smm()? cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG); static_call(kvm_x86_set_cr0)(vcpu, cr0); vcpu->arch.cr0 = cr0; > Neither __set_sregs_common nor its callers does not call > kvm_post_set_cr0... Not great, even though most uses of KVM_SET_SREGS > are probably limited to reset in most "usual" VMMs. It's probably > enough to replace this line: > > *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; > > with a call to the function just before __set_sregs_common returns. What about kvm_post_set_cr4() then? Should it be introduced to __set_sregs_common() as well? thanks, Michal
On 8/3/23 23:15, Michal Luczaj wrote: >> *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; >> >> with a call to the function just before __set_sregs_common returns. > What about kvm_post_set_cr4() then? Should it be introduced to > __set_sregs_common() as well? Yes, indeed, but it starts getting a bit unwieldy. If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs and __set_sregs2 call kvm_mmu_reset_context(). Paolo
On 8/4/23 11:53, Paolo Bonzini wrote: > On 8/3/23 23:15, Michal Luczaj wrote: >>> *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; >>> >>> with a call to the function just before __set_sregs_common returns. >> What about kvm_post_set_cr4() then? Should it be introduced to >> __set_sregs_common() as well? > > Yes, indeed, but it starts getting a bit unwieldy. > > If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, > we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs > and __set_sregs2 call kvm_mmu_reset_context(). Something like this? @@ -11562,8 +11562,10 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) if (ret) return ret; - if (mmu_reset_needed) + if (mmu_reset_needed) { kvm_mmu_reset_context(vcpu); + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); + } max_bits = KVM_NR_INTERRUPTS; pending_vec = find_first_bit( @@ -11604,8 +11606,10 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2) mmu_reset_needed = 1; vcpu->arch.pdptrs_from_userspace = true; } - if (mmu_reset_needed) + if (mmu_reset_needed) { kvm_mmu_reset_context(vcpu); + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); + } return 0; }
On 8/4/23 19:50, Michal Luczaj wrote: > On 8/4/23 11:53, Paolo Bonzini wrote: >> On 8/3/23 23:15, Michal Luczaj wrote: >>>> *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; >>>> >>>> with a call to the function just before __set_sregs_common returns. >>> What about kvm_post_set_cr4() then? Should it be introduced to >>> __set_sregs_common() as well? >> >> Yes, indeed, but it starts getting a bit unwieldy. >> >> If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, >> we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs >> and __set_sregs2 call kvm_mmu_reset_context(). > > Something like this? > > @@ -11562,8 +11562,10 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) > if (ret) > return ret; > > - if (mmu_reset_needed) > + if (mmu_reset_needed) { > kvm_mmu_reset_context(vcpu); > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); > + } > > max_bits = KVM_NR_INTERRUPTS; > pending_vec = find_first_bit( > @@ -11604,8 +11606,10 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2) > mmu_reset_needed = 1; > vcpu->arch.pdptrs_from_userspace = true; > } > - if (mmu_reset_needed) > + if (mmu_reset_needed) { > kvm_mmu_reset_context(vcpu); > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); > + } > return 0; > } I guess I'll just post a patch then. There it is: https://lore.kernel.org/kvm/20230814222358.707877-1-mhal@rbox.co/ thanks, Michal
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7f4246e4255f..eb94081bd7e4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11787,15 +11787,22 @@ static int sync_regs(struct kvm_vcpu *vcpu) __set_regs(vcpu, &vcpu->run->s.regs.regs); vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS; } + if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) { - if (__set_sregs(vcpu, &vcpu->run->s.regs.sregs)) + struct kvm_sregs sregs = vcpu->run->s.regs.sregs; + + if (__set_sregs(vcpu, &sregs)) return -EINVAL; + vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS; } + if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) { - if (kvm_vcpu_ioctl_x86_set_vcpu_events( - vcpu, &vcpu->run->s.regs.events)) + struct kvm_vcpu_events events = vcpu->run->s.regs.events; + + if (kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events)) return -EINVAL; + vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_EVENTS; }
In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own copy of data. Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they have exclusive rights to structs they operate on. While this is true when coming from an ioctl handler (caller makes a local copy of user's data), sync_regs() breaks this contract; a pointer to a user-modifiable memory (vcpu->run->s.regs) is provided. This can lead to a situation when incoming data is checked and/or sanitized only to be re-set by a user thread running in parallel. Signed-off-by: Michal Luczaj <mhal@rbox.co> --- A note: when servicing kvm_run->kvm_dirty_regs, changes made by __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this ok? arch/x86/kvm/x86.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)