[1/2] KVM: Reject overly excessive IDs in KVM_CREATE_VCPU

Message ID	20240605220504.2941958-2-minipli@grsecurity.net (mailing list archive)
State	New, archived
Headers	show Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 928673B1AB for <kvm@vger.kernel.org>; Wed, 5 Jun 2024 22:05:24 +0000 (UTC) From: Mathias Krause <minipli@grsecurity.net> To: Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org, Mathias Krause <minipli@grsecurity.net>, Emese Revfy <re.emese@gmail.com>, PaX Team <pageexec@freemail.hu> Subject: [PATCH 1/2] KVM: Reject overly excessive IDs in KVM_CREATE_VCPU Date: Thu, 6 Jun 2024 00:05:03 +0200 Message-Id: <20240605220504.2941958-2-minipli@grsecurity.net> In-Reply-To: <20240605220504.2941958-1-minipli@grsecurity.net> References: <20240605220504.2941958-1-minipli@grsecurity.net> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	KVM: Reject vCPU IDs above 2^32 \| expand [0/2] KVM: Reject vCPU IDs above 2^32 [1/2] KVM: Reject overly excessive IDs in KVM_CREATE_VCPU [2/2] KVM: selftests: Test vCPU IDs above 2^32

Mathias Krause June 5, 2024, 10:05 p.m. UTC

If, on a 64 bit system, a vCPU ID is provided that has the upper 32 bits
set to a non-zero value, it may get accepted if the truncated to 32 bits
integer value is below KVM_MAX_VCPU_IDS and 'max_vcpus'. This feels very
wrong and triggered the reporting logic of PaX's SIZE_OVERFLOW plugin.

Instead of silently truncating and accepting such values, pass the full
value to kvm_vm_ioctl_create_vcpu() and make the existing limit checks
return an error.

Even if this is a userland ABI breaking change, no sane userland could
have ever relied on that behaviour.

Reported-by: PaX's SIZE_OVERFLOW plugin running on grsecurity's syzkaller
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Sean Christopherson June 5, 2024, 10:31 p.m. UTC | #1

On Thu, Jun 06, 2024, Mathias Krause wrote:
> If, on a 64 bit system, a vCPU ID is provided that has the upper 32 bits
> set to a non-zero value, it may get accepted if the truncated to 32 bits
> integer value is below KVM_MAX_VCPU_IDS and 'max_vcpus'. This feels very
> wrong and triggered the reporting logic of PaX's SIZE_OVERFLOW plugin.
> 
> Instead of silently truncating and accepting such values, pass the full
> value to kvm_vm_ioctl_create_vcpu() and make the existing limit checks
> return an error.
> 
> Even if this is a userland ABI breaking change, no sane userland could
> have ever relied on that behaviour.
> 
> Reported-by: PaX's SIZE_OVERFLOW plugin running on grsecurity's syzkaller
> Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
> Cc: Emese Revfy <re.emese@gmail.com>
> Cc: PaX Team <pageexec@freemail.hu>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> ---
>  virt/kvm/kvm_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 14841acb8b95..9f18fc42f018 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4200,7 +4200,7 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
>  /*
>   * Creates some virtual cpus.  Good luck creating more than one.
>   */
> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)

Hmm, I don't love that KVM subtly relies on the KVM_MAX_VCPU_IDS check to guard
against truncation when passing @id to kvm_arch_vcpu_precreate(), kvm_vcpu_init(),
etc.  I doubt that it will ever be problematic, but it _looks_ like a bug.

If we really care enough to fix this, my vote is for something like so:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4965196cad58..08adfdb2817e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4200,13 +4200,14 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
 /*
  * Creates some virtual cpus.  Good luck creating more than one.
  */
-static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
+static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long __id)
 {
        int r;
        struct kvm_vcpu *vcpu;
        struct page *page;
+       u32 id = __id;
 
-       if (id >= KVM_MAX_VCPU_IDS)
+       if (id != __id || id >= KVM_MAX_VCPU_IDS)
                return -EINVAL;
 
        mutex_lock(&kvm->lock);

Mathias Krause June 6, 2024, 7:20 a.m. UTC | #2

On 06.06.24 00:31, Sean Christopherson wrote:
> On Thu, Jun 06, 2024, Mathias Krause wrote:
>> If, on a 64 bit system, a vCPU ID is provided that has the upper 32 bits
>> set to a non-zero value, it may get accepted if the truncated to 32 bits
>> integer value is below KVM_MAX_VCPU_IDS and 'max_vcpus'. This feels very
>> wrong and triggered the reporting logic of PaX's SIZE_OVERFLOW plugin.
>>
>> Instead of silently truncating and accepting such values, pass the full
>> value to kvm_vm_ioctl_create_vcpu() and make the existing limit checks
>> return an error.
>>
>> Even if this is a userland ABI breaking change, no sane userland could
>> have ever relied on that behaviour.
>>
>> Reported-by: PaX's SIZE_OVERFLOW plugin running on grsecurity's syzkaller
>> Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
>> Cc: Emese Revfy <re.emese@gmail.com>
>> Cc: PaX Team <pageexec@freemail.hu>
>> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
>> ---
>>  virt/kvm/kvm_main.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 14841acb8b95..9f18fc42f018 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -4200,7 +4200,7 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
>>  /*
>>   * Creates some virtual cpus.  Good luck creating more than one.
>>   */
>> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> 
> Hmm, I don't love that KVM subtly relies on the KVM_MAX_VCPU_IDS check to guard
> against truncation when passing @id to kvm_arch_vcpu_precreate(), kvm_vcpu_init(),
> etc.  I doubt that it will ever be problematic, but it _looks_ like a bug.

It's not subtle but very explicit. KVM_MAX_VCPU_IDS is a small positive
number, depending on some arch specific #define, but with x86 allowing
for the largest value of 4 * 4096. That value, for sure, cannot exceed
U32_MAX, so an explicit truncation isn't needed as the upper bits will
already be zero if the limit check passes.

While subtile integer truncation is the bug that my patch is actually
fixing, it is for the *userland* facing part of it, as in clarifying the
ABI to work on "machine-sized words", i.e. a ulong, and doing the limit
checks on these.

*In-kernel* APIs truncate / sign extend / mix signed/unsigned values all
the time. The kernel is full of these. Trying to "fix" them all is an
uphill battle not worth fighting, imho.

> 
> If we really care enough to fix this, my vote is for something like so:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4965196cad58..08adfdb2817e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4200,13 +4200,14 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
>  /*
>   * Creates some virtual cpus.  Good luck creating more than one.
>   */
> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long __id)
>  {
>         int r;
>         struct kvm_vcpu *vcpu;
>         struct page *page;
> +       u32 id = __id;
>  
> -       if (id >= KVM_MAX_VCPU_IDS)
> +       if (id != __id || id >= KVM_MAX_VCPU_IDS)
>                 return -EINVAL;
>  
>         mutex_lock(&kvm->lock);

I'd rather suggest to add a build time assert instead, as the existing
runtime check is sufficient (with my u32->ulong change). Something like
this:

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4200,12 +4200,13 @@ static void kvm_create_vcpu_debugfs(struct
kvm_vcpu *vcpu)
 /*
  * Creates some virtual cpus.  Good luck creating more than one.
  */
-static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
+static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 {
        int r;
        struct kvm_vcpu *vcpu;
        struct page *page;

+       BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX);
        if (id >= KVM_MAX_VCPU_IDS)
                return -EINVAL;

Thanks,
Mathias

Sean Christopherson June 6, 2024, 2:55 p.m. UTC | #3

On Thu, Jun 06, 2024, Mathias Krause wrote:
> On 06.06.24 00:31, Sean Christopherson wrote:
> > On Thu, Jun 06, 2024, Mathias Krause wrote:
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index 14841acb8b95..9f18fc42f018 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -4200,7 +4200,7 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
> >>  /*
> >>   * Creates some virtual cpus.  Good luck creating more than one.
> >>   */
> >> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> >> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> > 
> > Hmm, I don't love that KVM subtly relies on the KVM_MAX_VCPU_IDS check to guard
> > against truncation when passing @id to kvm_arch_vcpu_precreate(), kvm_vcpu_init(),
> > etc.  I doubt that it will ever be problematic, but it _looks_ like a bug.
> 
> It's not subtle but very explicit. KVM_MAX_VCPU_IDS is a small positive
> number, depending on some arch specific #define, but with x86 allowing
> for the largest value of 4 * 4096. That value, for sure, cannot exceed
> U32_MAX, so an explicit truncation isn't needed as the upper bits will
> already be zero if the limit check passes.
> 
> While subtile integer truncation is the bug that my patch is actually
> fixing, it is for the *userland* facing part of it, as in clarifying the
> ABI to work on "machine-sized words", i.e. a ulong, and doing the limit
> checks on these.
> 
> *In-kernel* APIs truncate / sign extend / mix signed/unsigned values all
> the time. The kernel is full of these. Trying to "fix" them all is an
> uphill battle not worth fighting, imho.

Oh, I'm not worry about something going wrong with the actual truncation.

What I don't like is the primary in-kernal API, kvm_vm_ioctl_create_vcpu(), taking
an unsigned long, but everything underneath converting that to an unsigned int,
without much of anything to give the reader a clue that the truncation is
deliberate.  

Similarly, without the context of the changelog, it's not at all obvious why
kvm_vm_ioctl_create_vcpu() takes an unsigned long.

E.g. x86 has another potentially more restrictive check on @id, and it looks
quite odd to check @id against KVM_MAX_VCPU_IDS as an "unsigned long" in flow
flow, but as an "unsigned int" in another.

int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
	if (kvm_check_tsc_unstable() && kvm->created_vcpus)
		pr_warn_once("SMP vm created on host with unstable TSC; "
			     "guest TSC will not be reliable\n");

	if (!kvm->arch.max_vcpu_ids)
		kvm->arch.max_vcpu_ids = KVM_MAX_VCPU_IDS;

	if (id >= kvm->arch.max_vcpu_ids)
		return -EINVAL;

> I'd rather suggest to add a build time assert instead, as the existing
> runtime check is sufficient (with my u32->ulong change). Something like
> this:
> 
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4200,12 +4200,13 @@ static void kvm_create_vcpu_debugfs(struct
> kvm_vcpu *vcpu)
>  /*
>   * Creates some virtual cpus.  Good luck creating more than one.
>   */
> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>  {
>         int r;
>         struct kvm_vcpu *vcpu;
>         struct page *page;
> 
> +       BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX);

This should be UINT_MAX, no?  Regardless, the "need" for an explicit BUILD_BUG_ON()
is another reason I dislike relying on the KVM_MAX_VCPU_IDS check to detect
truncation.  If @id is checked as a 32-bit value, and we somehow screw up and
define KVM_MAX_VCPU_IDS to be a 64-bit value, clang will rightly complain that
the check is useless, e.g. given "#define KVM_MAX_VCPU_ID_TEST	BIT(32)"

arch/x86/kvm/x86.c:12171:9: error: result of comparison of constant 4294967296 with
expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
        if (id > KVM_MAX_VCPU_ID_TEST)
            ~~ ^ ~~~~~~~~~~~~~~~~~~~~
1 error generated.

>         if (id >= KVM_MAX_VCPU_IDS)
>                 return -EINVAL;

What if we do an explicit check before calling kvm_vm_ioctl_create_vcpu()?  That
would avoid the weird __id param, and provide a convenient location to document
exactly why KVM checks for truncation.

We could also move the "if (id >= KVM_MAX_VCPU_IDS)" check to kvm_vm_ioctl(),
but I don't love that, because again IMO it makes the code less readable overall,
loses clang's tuautological constant check, and the cost of the extra check against
UINT_MAX is completely negligible.  

Though if I had to choose, I'd prefer moving the check to kvm_vm_ioctl() over
taking an "unsigned long" in kvm_vm_ioctl_create_vcpu().

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4965196cad58..8155146b16cd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5083,6 +5083,13 @@ static long kvm_vm_ioctl(struct file *filp,
                return -EIO;
        switch (ioctl) {
        case KVM_CREATE_VCPU:
+               /*
+                * KVM tracks vCPU ID as a 32-bit value, be kind to userspace
+                * and reject too-large values instead of silently truncating.
+                */
+               if (arg > UINT_MAX)
+                       return -EINVAL;
+
                r = kvm_vm_ioctl_create_vcpu(kvm, arg);
                break;
        case KVM_ENABLE_CAP: {

Mathias Krause June 6, 2024, 6:11 p.m. UTC | #4

On 06.06.24 16:55, Sean Christopherson wrote:
> On Thu, Jun 06, 2024, Mathias Krause wrote:
>> On 06.06.24 00:31, Sean Christopherson wrote:
>>> On Thu, Jun 06, 2024, Mathias Krause wrote:
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 14841acb8b95..9f18fc42f018 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -4200,7 +4200,7 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
>>>>  /*
>>>>   * Creates some virtual cpus.  Good luck creating more than one.
>>>>   */
>>>> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>>>> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>>>
>>> Hmm, I don't love that KVM subtly relies on the KVM_MAX_VCPU_IDS check to guard
>>> against truncation when passing @id to kvm_arch_vcpu_precreate(), kvm_vcpu_init(),
>>> etc.  I doubt that it will ever be problematic, but it _looks_ like a bug.
>>
>> It's not subtle but very explicit. KVM_MAX_VCPU_IDS is a small positive
>> number, depending on some arch specific #define, but with x86 allowing
>> for the largest value of 4 * 4096. That value, for sure, cannot exceed
>> U32_MAX, so an explicit truncation isn't needed as the upper bits will
>> already be zero if the limit check passes.
>>
>> While subtile integer truncation is the bug that my patch is actually
>> fixing, it is for the *userland* facing part of it, as in clarifying the
>> ABI to work on "machine-sized words", i.e. a ulong, and doing the limit
>> checks on these.
>>
>> *In-kernel* APIs truncate / sign extend / mix signed/unsigned values all
>> the time. The kernel is full of these. Trying to "fix" them all is an
>> uphill battle not worth fighting, imho.
> 
> Oh, I'm not worry about something going wrong with the actual truncation.

Well, I do ;)

> 
> What I don't like is the primary in-kernal API, kvm_vm_ioctl_create_vcpu(), taking
> an unsigned long, but everything underneath converting that to an unsigned int,
> without much of anything to give the reader a clue that the truncation is
> deliberate.

Well, again, it's clear to me, at least. kvm_vm_ioctl_create_vcpu() is
the barrier from converting a userland provided "raw" value and checking
it to be within bounds that are sensible for in-kernel use. After that
check it's fine to use a more narrow type that still fits these bounds
and use that for in-kernel use.

The first part is _completely_ handled by the 'id >= KVM_MAX_VCPU_IDS'
test, as 'id' is still the "raw" value userland provided. Testing it
against KVM_MAX_VCPU_IDS does the "sensible for in-kernel use" check
and, on passing that check, allows to limit the storage type for 'id' to
the bounds of [0,KVM_MAX_VCPU_IDS) which fits within u32, unsigned int
or, as actually used for the vcpu_id member, an int.¹

> 
> Similarly, without the context of the changelog, it's not at all obvious why
> kvm_vm_ioctl_create_vcpu() takes an unsigned long.

Well, it's the ioctl() entry path. Passing on the UABI value shouldn't
be all that surprising. But I agree, a comment explaining the type- and
value handling to avoid early truncation thereof may not be such a bad idea.

> 
> E.g. x86 has another potentially more restrictive check on @id, and it looks
> quite odd to check @id against KVM_MAX_VCPU_IDS as an "unsigned long" in flow
> flow, but as an "unsigned int" in another.

Again, that's two distinct things, even if looking similar. The first
check against KVM_MAX_VCPU_IDS does actually two things:

1/ Ensure the full user ABI provided value (ulong) is sane and
2/ ensure it's within the hard limits KVM expects (fits unsigned int).

Now we do both with only a single compare and maybe that's what's so
hard to grasp -- that a single check can do both things. But why not
make use of simple things when we can do so?

Regarding x86's kvm_arch_vcpu_precreate(), it can use an 'unsigned int'
because the KVM_MAX_VCPU_IDS check ensured that the value of 'id' cannot
exceed that (limited) type. It's after the "UABI raw value" to "fits
kernel-internal type" conversion has happened, so using the narrower
type is just fine.

> 
> int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
> {
> 	if (kvm_check_tsc_unstable() && kvm->created_vcpus)
> 		pr_warn_once("SMP vm created on host with unstable TSC; "
> 			     "guest TSC will not be reliable\n");
> 
> 	if (!kvm->arch.max_vcpu_ids)
> 		kvm->arch.max_vcpu_ids = KVM_MAX_VCPU_IDS;
> 
> 	if (id >= kvm->arch.max_vcpu_ids)
> 		return -EINVAL;
> 

Above is completely fine, as this code only executes after the narrower
type bounds check.

>> I'd rather suggest to add a build time assert instead, as the existing
>> runtime check is sufficient (with my u32->ulong change). Something like
>> this:
>>
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -4200,12 +4200,13 @@ static void kvm_create_vcpu_debugfs(struct
>> kvm_vcpu *vcpu)
>>  /*
>>   * Creates some virtual cpus.  Good luck creating more than one.
>>   */
>> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>>  {
>>         int r;
>>         struct kvm_vcpu *vcpu;
>>         struct page *page;
>>
>> +       BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX);
> 
> This should be UINT_MAX, no?

No, I chose INT_MAX very intentional, as the underlying type of
'vcpu_id' is actually an int. Trying to store a value that's greater
than INT_MAX (but smaller than UINT_MAX) will make it negative and
that's definitely something we need to prevent as otherwise array
indexed accesses like in the IOAPIC code would try to access memory
before the allocated storage. Not good!

>                               Regardless, the "need" for an explicit BUILD_BUG_ON()
> is another reason I dislike relying on the KVM_MAX_VCPU_IDS check to detect
> truncation.

There's no "need" for the BUILD_BUG_ON(). It's just a cheap (compile
time only, no runtime "overhead") assert that the code won't allow
truncated values which may lead to follow-up bugs because of unintended
truncation. And, after all, you suggested something like that (a
truncation check) yourself. I just tried to provide it as something that
doesn't need the odd '__id' argument and an explicit truncation check
which would do the wrong thing if we would like to push KVM_MAX_VCPU_IDS
above UINT_MAX (failing only at runtime, not at compile time).

>              If @id is checked as a 32-bit value, and we somehow screw up and
> define KVM_MAX_VCPU_IDS to be a 64-bit value, clang will rightly complain that
> the check is useless, e.g. given "#define KVM_MAX_VCPU_ID_TEST	BIT(32)"
> 
> arch/x86/kvm/x86.c:12171:9: error: result of comparison of constant 4294967296 with
> expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
>         if (id > KVM_MAX_VCPU_ID_TEST)
>             ~~ ^ ~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
  ^^^^^^^^^^^^^^^^^^
Perfect! So this breaks the build. How much better can we prevent this
bug from going unnoticed?

Same for the BUILD_BUG_ON(). I don't see how it is somehow hiding things
or making future changes silently fail. They won't. Neither will the x86
specific code compile, nor will the BUILD_BUG_ON() in
kvm_vm_ioctl_create_vcpu(). So what you're trying to say?...

> 
> 
>>         if (id >= KVM_MAX_VCPU_IDS)
>>                 return -EINVAL;
> 
> What if we do an explicit check before calling kvm_vm_ioctl_create_vcpu()?  That
> would avoid the weird __id param, and provide a convenient location to document
> exactly why KVM checks for truncation.

My version has no "weird __id param" ;) And, honestly, the current u32
argument type makes no sense either. It's neither the final type used
for 'vcpu_id' nor does it have to be explicitly 32 bits. So, IMHO, the
argument type should be fixed in any case and using the type that
preserves the UABI value just seems like a natural fit.

kvm_vm_ioctl() is "only" the command multiplexer, deferring concrete
implementation to the individual functions. It currently does no input
value sanity checks itself, beside some user copy error checks. So why
should KVM_CREATE_VCPU be special and have its truncation check be
outside of the handler function? It just scatters the code and hurts
readability, IMHO.

> 
> We could also move the "if (id >= KVM_MAX_VCPU_IDS)" check to kvm_vm_ioctl(),
> but I don't love that, because again IMO it makes the code less readable overall,
> loses clang's tuautological constant check, and the cost of the extra check against
> UINT_MAX is completely negligible.  
> 
> Though if I had to choose, I'd prefer moving the check to kvm_vm_ioctl() over
> taking an "unsigned long" in kvm_vm_ioctl_create_vcpu().

Looks like we disagree again. I'd always choose the ulong argument over
scattering the checks over multiple functions and possibly missing it if
a new call site appears.

> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4965196cad58..8155146b16cd 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5083,6 +5083,13 @@ static long kvm_vm_ioctl(struct file *filp,
>                 return -EIO;
>         switch (ioctl) {
>         case KVM_CREATE_VCPU:
> +               /*
> +                * KVM tracks vCPU ID as a 32-bit value, be kind to userspace
> +                * and reject too-large values instead of silently truncating.
> +                */
> +               if (arg > UINT_MAX)
> +                       return -EINVAL;

Unfortunately, that check is a tautology for 32 bit systems and wrong
for reasons I explained in the beginning of my Email. But we can move
the comment to kvm_vm_ioctl_create_vcpu() and explain the rational over
there (with taking an 'unsigned long id' argument):

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 14841acb8b95..b04e87f6568f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4200,12 +4200,20 @@ static void kvm_create_vcpu_debugfs(struct
kvm_vcpu *vcpu)
 /*
  * Creates some virtual cpus.  Good luck creating more than one.
  */
-static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
+static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 {
    int r;
    struct kvm_vcpu *vcpu;
    struct page *page;

+   /*
+    * KVM tracks vCPU IDs as 'int', be kind to userspace and reject
+    * too-large values instead of silently truncating.
+    *
+    * Also ensure we're not breaking this assumption by accidentally
+    * pushing KVM_MAX_VCPU_IDS above INT_MAX.
+    */
+   BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX);
    if (id >= KVM_MAX_VCPU_IDS)
        return -EINVAL;

> +
>                 r = kvm_vm_ioctl_create_vcpu(kvm, arg);
>                 break;
>         case KVM_ENABLE_CAP: {

Cheers,
Mathias

¹ IMHO, using 'int' for vcpu_id is actually *very* *wrong*, as it's used
as an index in certain constructs and having a signed type doesn't feel
right at all. But that's just a side matter, as, according to the checks
on the ioctl() path, the actual value of vcpu_id can never be negative.
So lets not distract.

Mathias Krause June 6, 2024, 6:23 p.m. UTC | #5

On 06.06.24 20:11, Mathias Krause wrote:
> +   /*
> +    * KVM tracks vCPU IDs as 'int', be kind to userspace and reject
> +    * too-large values instead of silently truncating.
> +    *
> +    * Also ensure we're not breaking this assumption by accidentally
> +    * pushing KVM_MAX_VCPU_IDS above INT_MAX.
> +    */
> +   BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX);

And yes, I'd rather like to write the above as something like below to
account for type changes:

    BUILD_BUG_ON(KVM_MAX_VCPU_IDS > typeof(vcpu->vcpu_id)::max());

But, unfortunately, C is not there yet. One could try to implement
something based on _Generic() but, meh..

>     if (id >= KVM_MAX_VCPU_IDS)
>         return -EINVAL;
>

Sean Christopherson June 7, 2024, 12:12 a.m. UTC | #6

On Thu, Jun 06, 2024, Mathias Krause wrote:
> On 06.06.24 16:55, Sean Christopherson wrote:
> > On Thu, Jun 06, 2024, Mathias Krause wrote:
> The first part is _completely_ handled by the 'id >= KVM_MAX_VCPU_IDS'
> test, as 'id' is still the "raw" value userland provided. 

I'm not arguing it doesn't, all I'm saying is that I don't like overloading a
check against an arbitrary limit to also protect against truncation issues.

> > E.g. x86 has another potentially more restrictive check on @id, and it looks
> > quite odd to check @id against KVM_MAX_VCPU_IDS as an "unsigned long" in flow
> > flow, but as an "unsigned int" in another.
> 
> Again, that's two distinct things, even if looking similar. The first
> check against KVM_MAX_VCPU_IDS does actually two things:
> 
> 1/ Ensure the full user ABI provided value (ulong) is sane and
> 2/ ensure it's within the hard limits KVM expects (fits unsigned int).
> 
> Now we do both with only a single compare and maybe that's what's so
> hard to grasp -- that a single check can do both things. But why not
> make use of simple things when we can do so?

Because it's unnecessarily clever.  I've dealt with waaaay too much legacy KVM
code that probably seemed super obvious at the time, but 8+ years and lots of
code churn later was completely nonsensical and all but impossible to decipher.

That's less likely to happen these days, as we have better tracking via lore, and
I like to think we have better changelogs.  But I still dislike doing unnecessarily
clever things because it tends to set the next generation up to fail.

> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -4200,12 +4200,13 @@ static void kvm_create_vcpu_debugfs(struct
> >> kvm_vcpu *vcpu)
> >>  /*
> >>   * Creates some virtual cpus.  Good luck creating more than one.
> >>   */
> >> -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> >> +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> >>  {
> >>         int r;
> >>         struct kvm_vcpu *vcpu;
> >>         struct page *page;
> >>
> >> +       BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX);
> > 
> > This should be UINT_MAX, no?
> 
> No, I chose INT_MAX very intentional, as the underlying type of
> 'vcpu_id' is actually an int.

Oof, I didn't realize (or more likely, simply forgot) that vcpu_id and vcpu_idx
are tracked as "int".

> There's no "need" for the BUILD_BUG_ON(). It's just a cheap (compile
> time only, no runtime "overhead") assert that the code won't allow
> truncated values which may lead to follow-up bugs because of unintended
> truncation. And, after all, you suggested something like that (a
> truncation check) yourself. I just tried to provide it as something that
> doesn't need the odd '__id' argument and an explicit truncation check
> which would do the wrong thing if we would like to push KVM_MAX_VCPU_IDS
> above UINT_MAX (failing only at runtime, not at compile time).

Yeah, I know all that.  I'm not arguing the actual cost of the code is at all
meaningful.  I'm purely concerned about the long-term maintenance cost.  This is
obviously a small thing that is unlikely to ever cause problems, but again, I
suspect that past KVM developers said exactly that about things that have pegged
my WTF-o-meter.

> >              If @id is checked as a 32-bit value, and we somehow screw up and
> > define KVM_MAX_VCPU_IDS to be a 64-bit value, clang will rightly complain that
> > the check is useless, e.g. given "#define KVM_MAX_VCPU_ID_TEST	BIT(32)"
> > 
> > arch/x86/kvm/x86.c:12171:9: error: result of comparison of constant 4294967296 with
> > expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
> >         if (id > KVM_MAX_VCPU_ID_TEST)
> >             ~~ ^ ~~~~~~~~~~~~~~~~~~~~
> > 1 error generated.
>   ^^^^^^^^^^^^^^^^^^
> Perfect! So this breaks the build. How much better can we prevent this
> bug from going unnoticed?

Yes, but iff @id is a 32-bit value, i.e. this trick doesn't work on 64-bit kernels
if the comparison is done with @id is an unsigned long (and I'm hoping that we
can kill off 32-bit KVM support in the not too distant future).

> ¹ IMHO, using 'int' for vcpu_id is actually *very* *wrong*, as it's used
> as an index in certain constructs and having a signed type doesn't feel
> right at all. But that's just a side matter, as, according to the checks
> on the ioctl() path, the actual value of vcpu_id can never be negative.
> So lets not distract.

Hmm, I 100% agree that it's horrific, but I disagree that it's a distraction.
I think we should fix that at the same time as we harden the trunction stuff, so
that it's (hopefully) clear what KVM _intends_ to support, as opposed to what the
code happens to allow.

In the end, I'm ok relying on the KVM_MAX_VCPU_IDS check, so long as there's
a BUILD_BUG_ON() and a comment.

Mathias Krause June 12, 2024, 9:34 p.m. UTC | #7

On 07.06.24 02:12, Sean Christopherson wrote:
> On Thu, Jun 06, 2024, Mathias Krause wrote:
>> On 06.06.24 16:55, Sean Christopherson wrote:
>>> On Thu, Jun 06, 2024, Mathias Krause wrote:
>>>> [snip]
>>>              If @id is checked as a 32-bit value, and we somehow screw up and
>>> define KVM_MAX_VCPU_IDS to be a 64-bit value, clang will rightly complain that
>>> the check is useless, e.g. given "#define KVM_MAX_VCPU_ID_TEST	BIT(32)"
>>>
>>> arch/x86/kvm/x86.c:12171:9: error: result of comparison of constant 4294967296 with
>>> expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
>>>         if (id > KVM_MAX_VCPU_ID_TEST)
>>>             ~~ ^ ~~~~~~~~~~~~~~~~~~~~
>>> 1 error generated.
>>   ^^^^^^^^^^^^^^^^^^
>> Perfect! So this breaks the build. How much better can we prevent this
>> bug from going unnoticed?
> 
> Yes, but iff @id is a 32-bit value, i.e. this trick doesn't work on 64-bit kernels
> if the comparison is done with @id is an unsigned long (and I'm hoping that we
> can kill off 32-bit KVM support in the not too distant future).

Fortunately, the BUILD_BUG_ON(KVM_MAX_VCPU_IDS > INT_MAX) will still
catch it.

> 
>> ¹ IMHO, using 'int' for vcpu_id is actually *very* *wrong*, as it's used
>> as an index in certain constructs and having a signed type doesn't feel
>> right at all. But that's just a side matter, as, according to the checks
>> on the ioctl() path, the actual value of vcpu_id can never be negative.
>> So lets not distract.
> 
> Hmm, I 100% agree that it's horrific, but I disagree that it's a distraction.
> I think we should fix that at the same time as we harden the trunction stuff, so
> that it's (hopefully) clear what KVM _intends_ to support, as opposed to what the
> code happens to allow.

I looked into it a little closer and it's even a bigger mess than what I
initially thought. 'vcpu_id' values get passed around as int, unsigned
int, u32, u16 even (S390) and unsigned long in the various architectures
implementing KVM support. A change touching all of these clearly needs
quite some coordination among multiple maintainers and testing to rule
out issues caused by changing the sign of the underlying type -- even if
it'll be just new warnings popping up. I don't even have cross-compilers
for all of these, less so setups to actually do some tests. So,
unfortunately, I'm not up to do that change.

> 
> In the end, I'm ok relying on the KVM_MAX_VCPU_IDS check, so long as there's
> a BUILD_BUG_ON() and a comment.

Ok, will send a v2 series soon, covering KVM_SET_BOOT_CPU_ID, too.

Thanks,
Mathias

[1/2] KVM: Reject overly excessive IDs in KVM_CREATE_VCPU

Commit Message

Comments

Patch