diff mbox series

[1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues

Message ID 20230728001606.2275586-2-mhal@rbox.co (mailing list archive)
State New, archived
Headers show
Series sync_regs() TOCTOU issues | expand

Commit Message

Michal Luczaj July 28, 2023, 12:12 a.m. UTC
In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed
__set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own
copy of data.

Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
have exclusive rights to structs they operate on. While this is true when
coming from an ioctl handler (caller makes a local copy of user's data),
sync_regs() breaks this contract; a pointer to a user-modifiable memory
(vcpu->run->s.regs) is provided. This can lead to a situation when incoming
data is checked and/or sanitized only to be re-set by a user thread running
in parallel.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
A note: when servicing kvm_run->kvm_dirty_regs, changes made by
__set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
ok?

 arch/x86/kvm/x86.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Sean Christopherson July 31, 2023, 11:49 p.m. UTC | #1
On Fri, Jul 28, 2023, Michal Luczaj wrote:
> In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed
> __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own
> copy of data.
> 
> Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
> have exclusive rights to structs they operate on. While this is true when
> coming from an ioctl handler (caller makes a local copy of user's data),
> sync_regs() breaks this contract; a pointer to a user-modifiable memory
> (vcpu->run->s.regs) is provided. This can lead to a situation when incoming
> data is checked and/or sanitized only to be re-set by a user thread running
> in parallel.

LOL, the really hilarious part is that the guilty,

  Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS")

also added this comment...

  /* kvm_sync_regs struct included by kvm_run struct */
  struct kvm_sync_regs {
	/* Members of this structure are potentially malicious.
	 * Care must be taken by code reading, esp. interpreting,
	 * data fields from them inside KVM to prevent TOCTOU and
	 * double-fetch types of vulnerabilities.
	 */
	struct kvm_regs regs;
	struct kvm_sregs sregs;
	struct kvm_vcpu_events events;
  };

though Radim did remove something so maybe the comment isn't as ironic as it looks.

    [Removed wrapper around check for reserved kvm_valid_regs. - Radim]
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

Anyways...

> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> A note: when servicing kvm_run->kvm_dirty_regs, changes made by
> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
> ok?

I would be amazed if anyone cares.  Given the justification and the author,

    This reduces ioctl overhead which is particularly important when userspace
    is making synchronous guest state modifications (e.g. when emulating and/or
    intercepting instructions).
    
    Signed-off-by: Ken Hofsass <hofsass@google.com>

I am pretty sure this was added to optimize a now-abandoned Google effort to do
emulation in uesrspace.  I bring that up because I was going to suggest that we
might be able to get away with a straight revert, as QEMU doesn't use the flag
and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g.
github, so sadly I think we're stuck with the feature :-(
Michal Luczaj Aug. 1, 2023, 12:37 p.m. UTC | #2
On 8/1/23 01:49, Sean Christopherson wrote:
> On Fri, Jul 28, 2023, Michal Luczaj wrote:
>> Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
>> have exclusive rights to structs they operate on. While this is true when
>> coming from an ioctl handler (caller makes a local copy of user's data),
>> sync_regs() breaks this contract; a pointer to a user-modifiable memory
>> (vcpu->run->s.regs) is provided. This can lead to a situation when incoming
>> data is checked and/or sanitized only to be re-set by a user thread running
>> in parallel.
> 
> LOL, the really hilarious part is that the guilty,
> 
>   Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS")
> 
> also added this comment...
> 
>   /* kvm_sync_regs struct included by kvm_run struct */
>   struct kvm_sync_regs {
> 	/* Members of this structure are potentially malicious.
> 	 * Care must be taken by code reading, esp. interpreting,
> 	 * data fields from them inside KVM to prevent TOCTOU and
> 	 * double-fetch types of vulnerabilities.
> 	 */
> 	struct kvm_regs regs;
> 	struct kvm_sregs sregs;
> 	struct kvm_vcpu_events events;
>   };
> 
> though Radim did remove something so maybe the comment isn't as ironic as it looks.
> 
>     [Removed wrapper around check for reserved kvm_valid_regs. - Radim]
>     Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> 
> Anyways...

Nah, from what I can see, it wasn't Radim's tweak that introduced the
TOCTOUs[1].

[1] https://lore.kernel.org/kvm/20180202210434.GC27896@flask/

>> A note: when servicing kvm_run->kvm_dirty_regs, changes made by
>> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
>> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
>> ok?
> 
> I would be amazed if anyone cares.  Given the justification and the author,
> 
>     This reduces ioctl overhead which is particularly important when userspace
>     is making synchronous guest state modifications (e.g. when emulating and/or
>     intercepting instructions).
>     
>     Signed-off-by: Ken Hofsass <hofsass@google.com>
> 
> I am pretty sure this was added to optimize a now-abandoned Google effort to do
> emulation in uesrspace.  I bring that up because I was going to suggest that we
> might be able to get away with a straight revert, as QEMU doesn't use the flag
> and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g.
> github, so sadly I think we're stuck with the feature :-(

All right, so assuming the revert is not happening and the API is not misused
(i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date
values in vcpu->run->s.regs), is assignment copying

	struct kvm_vcpu_events events = vcpu->run->s.regs.events;

the right approach or should it be a memcpy(), like in ioctl handlers?

thanks,
Michal
Sean Christopherson Aug. 2, 2023, 7:18 p.m. UTC | #3
On Tue, Aug 01, 2023, Michal Luczaj wrote:
> On 8/1/23 01:49, Sean Christopherson wrote:
> >> A note: when servicing kvm_run->kvm_dirty_regs, changes made by
> >> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
> >> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
> >> ok?
> > 
> > I would be amazed if anyone cares.  Given the justification and the author,
> > 
> >     This reduces ioctl overhead which is particularly important when userspace
> >     is making synchronous guest state modifications (e.g. when emulating and/or
> >     intercepting instructions).
> >     
> >     Signed-off-by: Ken Hofsass <hofsass@google.com>
> > 
> > I am pretty sure this was added to optimize a now-abandoned Google effort to do
> > emulation in uesrspace.  I bring that up because I was going to suggest that we
> > might be able to get away with a straight revert, as QEMU doesn't use the flag
> > and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g.
> > github, so sadly I think we're stuck with the feature :-(
> 
> All right, so assuming the revert is not happening and the API is not misused
> (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date
> values in vcpu->run->s.regs), is assignment copying
> 
> 	struct kvm_vcpu_events events = vcpu->run->s.regs.events;
> 
> the right approach or should it be a memcpy(), like in ioctl handlers?

Both approaches are fine, though I am gaining a preference for the copy-by-value
method.  With gcc-12 and probably most compilers, the code generation is identical
for both as the compiler generates a call to memcpy() to handle the the struct
assignment.

The advantage of copy-by-value for structs, and why I think I now prefer it, is
that it provides type safety.  E.g. this compiles without complaint

	memcpy(&events, &vcpu->run->s.regs.sregs, sizeof(events));

whereas this

	struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;

yields

  arch/x86/kvm/x86.c: In function ‘sync_regs’:
  arch/x86/kvm/x86.c:11793:49: error: invalid initializer
  11793 |                 struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;
        |                                                 ^~~~

The downside is that it's less obvious when reading the code that there is a
large-ish memcpy happening, but IMO it's worth gaining the type safety.
Michal Luczaj Aug. 3, 2023, 12:13 a.m. UTC | #4
On 8/2/23 21:18, Sean Christopherson wrote:
> On Tue, Aug 01, 2023, Michal Luczaj wrote:
>> All right, so assuming the revert is not happening and the API is not misused
>> (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date
>> values in vcpu->run->s.regs), is assignment copying
>>
>> 	struct kvm_vcpu_events events = vcpu->run->s.regs.events;
>>
>> the right approach or should it be a memcpy(), like in ioctl handlers?
> 
> Both approaches are fine, though I am gaining a preference for the copy-by-value
> method.  With gcc-12 and probably most compilers, the code generation is identical
> for both as the compiler generates a call to memcpy() to handle the the struct
> assignment.
> 
> The advantage of copy-by-value for structs, and why I think I now prefer it, is
> that it provides type safety.  E.g. this compiles without complaint
> 
> 	memcpy(&events, &vcpu->run->s.regs.sregs, sizeof(events));
> 
> whereas this
> 
> 	struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;
> 
> yields
> 
>   arch/x86/kvm/x86.c: In function ‘sync_regs’:
>   arch/x86/kvm/x86.c:11793:49: error: invalid initializer
>   11793 |                 struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;
>         |                                                 ^~~~
> 
> The downside is that it's less obvious when reading the code that there is a
> large-ish memcpy happening, but IMO it's worth gaining the type safety.

Sure, that makes sense. I was a bit concerned how a padding within a struct
might affect performance of such copy-by-value, but (obviously?) there's no
padding in kvm_sregs, nor kvm_vcpu_events...

Anyway, while there, could you take a look at __set_sregs_common()?

	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
	vcpu->arch.cr0 = sregs->cr0;

That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0()
take care of it, but I may be missing something (even if selftests pass with
that line removed).

thanks,
Michal
Paolo Bonzini Aug. 3, 2023, 5:48 p.m. UTC | #5
On 8/3/23 02:13, Michal Luczaj wrote:
> Anyway, while there, could you take a look at __set_sregs_common()?
> 
> 	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
> 	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
> 	vcpu->arch.cr0 = sregs->cr0;
> 
> That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0()
> take care of it, but I may be missing something (even if selftests pass with
> that line removed).

kvm_set_cr0 assumes that the static call sets vcpu->arch.cr0, so indeed 
it can be removed:

         static_call(kvm_x86_set_cr0)(vcpu, cr0);
         kvm_post_set_cr0(vcpu, old_cr0, cr0);
         return 0;

Neither __set_sregs_common nor its callers does not call 
kvm_post_set_cr0...  Not great, even though most uses of KVM_SET_SREGS 
are probably limited to reset in most "usual" VMMs.  It's probably 
enough to replace this line:

         *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;

with a call to the function just before __set_sregs_common returns.

Paolo
Michal Luczaj Aug. 3, 2023, 9:15 p.m. UTC | #6
On 8/3/23 19:48, Paolo Bonzini wrote:
> On 8/3/23 02:13, Michal Luczaj wrote:
>> Anyway, while there, could you take a look at __set_sregs_common()?
>>
>> 	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>> 	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
>> 	vcpu->arch.cr0 = sregs->cr0;
>>
>> That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0()
>> take care of it, but I may be missing something (even if selftests pass with
>> that line removed).
> 
> kvm_set_cr0 assumes that the static call sets vcpu->arch.cr0, so indeed 
> it can be removed:

I guess the same can be done in enter_smm()?

	cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG);
	static_call(kvm_x86_set_cr0)(vcpu, cr0);
	vcpu->arch.cr0 = cr0;

> Neither __set_sregs_common nor its callers does not call 
> kvm_post_set_cr0...  Not great, even though most uses of KVM_SET_SREGS 
> are probably limited to reset in most "usual" VMMs.  It's probably 
> enough to replace this line:
> 
>          *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
> 
> with a call to the function just before __set_sregs_common returns.

What about kvm_post_set_cr4() then? Should it be introduced to
__set_sregs_common() as well?

thanks,
Michal
Paolo Bonzini Aug. 4, 2023, 9:53 a.m. UTC | #7
On 8/3/23 23:15, Michal Luczaj wrote:
>>           *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>>
>> with a call to the function just before __set_sregs_common returns.
> What about kvm_post_set_cr4() then? Should it be introduced to
> __set_sregs_common() as well?

Yes, indeed, but it starts getting a bit unwieldy.

If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, 
we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs 
and __set_sregs2 call kvm_mmu_reset_context().

Paolo
Michal Luczaj Aug. 4, 2023, 5:50 p.m. UTC | #8
On 8/4/23 11:53, Paolo Bonzini wrote:
> On 8/3/23 23:15, Michal Luczaj wrote:
>>>           *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>>>
>>> with a call to the function just before __set_sregs_common returns.
>> What about kvm_post_set_cr4() then? Should it be introduced to
>> __set_sregs_common() as well?
> 
> Yes, indeed, but it starts getting a bit unwieldy.
> 
> If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, 
> we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs 
> and __set_sregs2 call kvm_mmu_reset_context().

Something like this?

@@ -11562,8 +11562,10 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
        if (ret)
                return ret;

-       if (mmu_reset_needed)
+       if (mmu_reset_needed) {
                kvm_mmu_reset_context(vcpu);
+               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+       }

        max_bits = KVM_NR_INTERRUPTS;
        pending_vec = find_first_bit(
@@ -11604,8 +11606,10 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
                mmu_reset_needed = 1;
                vcpu->arch.pdptrs_from_userspace = true;
        }
-       if (mmu_reset_needed)
+       if (mmu_reset_needed) {
                kvm_mmu_reset_context(vcpu);
+               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+       }
        return 0;
 }
Michal Luczaj Aug. 14, 2023, 10:29 p.m. UTC | #9
On 8/4/23 19:50, Michal Luczaj wrote:
> On 8/4/23 11:53, Paolo Bonzini wrote:
>> On 8/3/23 23:15, Michal Luczaj wrote:
>>>>           *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>>>>
>>>> with a call to the function just before __set_sregs_common returns.
>>> What about kvm_post_set_cr4() then? Should it be introduced to
>>> __set_sregs_common() as well?
>>
>> Yes, indeed, but it starts getting a bit unwieldy.
>>
>> If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, 
>> we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs 
>> and __set_sregs2 call kvm_mmu_reset_context().
> 
> Something like this?
> 
> @@ -11562,8 +11562,10 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>         if (ret)
>                 return ret;
> 
> -       if (mmu_reset_needed)
> +       if (mmu_reset_needed) {
>                 kvm_mmu_reset_context(vcpu);
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +       }
> 
>         max_bits = KVM_NR_INTERRUPTS;
>         pending_vec = find_first_bit(
> @@ -11604,8 +11606,10 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>                 mmu_reset_needed = 1;
>                 vcpu->arch.pdptrs_from_userspace = true;
>         }
> -       if (mmu_reset_needed)
> +       if (mmu_reset_needed) {
>                 kvm_mmu_reset_context(vcpu);
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +       }
>         return 0;
>  }

I guess I'll just post a patch then. There it is:
https://lore.kernel.org/kvm/20230814222358.707877-1-mhal@rbox.co/

thanks,
Michal
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7f4246e4255f..eb94081bd7e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11787,15 +11787,22 @@  static int sync_regs(struct kvm_vcpu *vcpu)
 		__set_regs(vcpu, &vcpu->run->s.regs.regs);
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
 	}
+
 	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
-		if (__set_sregs(vcpu, &vcpu->run->s.regs.sregs))
+		struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
+
+		if (__set_sregs(vcpu, &sregs))
 			return -EINVAL;
+
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
 	}
+
 	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
-		if (kvm_vcpu_ioctl_x86_set_vcpu_events(
-				vcpu, &vcpu->run->s.regs.events))
+		struct kvm_vcpu_events events = vcpu->run->s.regs.events;
+
+		if (kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events))
 			return -EINVAL;
+
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_EVENTS;
 	}