[v3] KVM: remove buggy vcpu id check on vcpu creation

Message ID	146116689259.20666.15860134511726195550.stgit@bahia.huguette.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <gkurz@linux.vnet.ibm.com>; Wed, 20 Apr 2016 16:44:59 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 20 Apr 2016 16:44:58 +0100 Subject: [PATCH v3] KVM: remove buggy vcpu id check on vcpu creation From: Greg Kurz <gkurz@linux.vnet.ibm.com> To: Paolo Bonzini <pbonzini@redhat.com>, james.hogan@imgtec.com, mingo@redhat.com Cc: linux-mips@linux-mips.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-ppc@nongnu.org, Cornelia Huck <cornelia.huck@de.ibm.com>, Paul Mackerras <paulus@samba.org>, David Gibson <david@gibson.dropbear.id.au> Date: Wed, 20 Apr 2016 17:44:54 +0200 Message-ID: <146116689259.20666.15860134511726195550.stgit@bahia.huguette.org> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Greg Kurz April 20, 2016, 3:44 p.m. UTC

Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
introduced a check to prevent potential kernel memory corruption in case
the vcpu id is too great.

Unfortunately this check assumes vcpu ids grow in sequence with a common
difference of 1, which is wrong: archs are free to use vcpu id as they fit.
For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
1024, guests may be limited down to 128 vcpus on POWER8.

This means the check does not belong here and should be moved to some arch
specific function: kvm_arch_vcpu_create() looks like a good candidate.

ARM and s390 already have such a check.

I could not spot any path in the PowerPC or common KVM code where a vcpu
id is used as described in the above commit: I believe PowerPC can live
without this check.

In the end, this patch simply moves the check to MIPS and x86.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
v3: use ERR_PTR()

 arch/mips/kvm/mips.c |    7 ++++++-
 arch/x86/kvm/x86.c   |    3 +++
 virt/kvm/kvm_main.c  |    3 ---
 3 files changed, 9 insertions(+), 4 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Hogan April 20, 2016, 4:10 p.m. UTC | #1

On Wed, Apr 20, 2016 at 05:44:54PM +0200, Greg Kurz wrote:
> Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> introduced a check to prevent potential kernel memory corruption in case
> the vcpu id is too great.
> 
> Unfortunately this check assumes vcpu ids grow in sequence with a common
> difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> 1024, guests may be limited down to 128 vcpus on POWER8.
> 
> This means the check does not belong here and should be moved to some arch
> specific function: kvm_arch_vcpu_create() looks like a good candidate.
> 
> ARM and s390 already have such a check.
> 
> I could not spot any path in the PowerPC or common KVM code where a vcpu
> id is used as described in the above commit: I believe PowerPC can live
> without this check.
> 
> In the end, this patch simply moves the check to MIPS and x86.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> v3: use ERR_PTR()
> 
>  arch/mips/kvm/mips.c |    7 ++++++-

Acked-by: James Hogan <james.hogan@imgtec.com> [MIPS]

Cheers
James

>  arch/x86/kvm/x86.c   |    3 +++
>  virt/kvm/kvm_main.c  |    3 ---
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 70ef1a43c114..0278ea146db5 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>  	int err, size, offset;
>  	void *gebase;
>  	int i;
> +	struct kvm_vcpu *vcpu;
>  
> -	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> +	if (id >= KVM_MAX_VCPUS) {
> +		err = -EINVAL;
> +		goto out;
> +	}
>  
> +	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>  	if (!vcpu) {
>  		err = -ENOMEM;
>  		goto out;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b7798c7b210..7738202edcce 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>  {
>  	struct kvm_vcpu *vcpu;
>  
> +	if (id >= KVM_MAX_VCPUS)
> +		return ERR_PTR(-EINVAL);
> +
>  	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
>  		printk_once(KERN_WARNING
>  		"kvm: SMP vm created on host with unstable TSC; "
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4fd482fb9260..6b6cca3cb488 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>  	int r;
>  	struct kvm_vcpu *vcpu;
>  
> -	if (id >= KVM_MAX_VCPUS)
> -		return -EINVAL;
> -
>  	vcpu = kvm_arch_vcpu_create(kvm, id);
>  	if (IS_ERR(vcpu))
>  		return PTR_ERR(vcpu);
>

Cornelia Huck April 20, 2016, 4:48 p.m. UTC | #2

On Wed, 20 Apr 2016 17:44:54 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> introduced a check to prevent potential kernel memory corruption in case
> the vcpu id is too great.
> 
> Unfortunately this check assumes vcpu ids grow in sequence with a common
> difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> 1024, guests may be limited down to 128 vcpus on POWER8.
> 
> This means the check does not belong here and should be moved to some arch
> specific function: kvm_arch_vcpu_create() looks like a good candidate.
> 
> ARM and s390 already have such a check.
> 
> I could not spot any path in the PowerPC or common KVM code where a vcpu
> id is used as described in the above commit: I believe PowerPC can live
> without this check.
> 
> In the end, this patch simply moves the check to MIPS and x86.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> v3: use ERR_PTR()
> 
>  arch/mips/kvm/mips.c |    7 ++++++-
>  arch/x86/kvm/x86.c   |    3 +++
>  virt/kvm/kvm_main.c  |    3 ---
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 20, 2016, 5:02 p.m. UTC | #3

2016-04-20 17:44+0200, Greg Kurz:
> Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> introduced a check to prevent potential kernel memory corruption in case
> the vcpu id is too great.
> 
> Unfortunately this check assumes vcpu ids grow in sequence with a common
> difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> 1024, guests may be limited down to 128 vcpus on POWER8.
> 
> This means the check does not belong here and should be moved to some arch
> specific function: kvm_arch_vcpu_create() looks like a good candidate.
> 
> ARM and s390 already have such a check.
> 
> I could not spot any path in the PowerPC or common KVM code where a vcpu
> id is used as described in the above commit: I believe PowerPC can live
> without this check.
> 
> In the end, this patch simply moves the check to MIPS and x86.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> v3: use ERR_PTR()
> 
>  arch/mips/kvm/mips.c |    7 ++++++-
>  arch/x86/kvm/x86.c   |    3 +++
>  virt/kvm/kvm_main.c  |    3 ---
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 70ef1a43c114..0278ea146db5 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>  	int err, size, offset;
>  	void *gebase;
>  	int i;
> +	struct kvm_vcpu *vcpu;
>  
> -	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> +	if (id >= KVM_MAX_VCPUS) {
> +		err = -EINVAL;
> +		goto out;

'vcpu' looks undefined at this point, so kfree in 'out:' may bug.
Just 'return ERR_PTR(-EINVAL)'?

> +	}
>  
> +	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>  	if (!vcpu) {
>  		err = -ENOMEM;
>  		goto out;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Hogan April 20, 2016, 5:09 p.m. UTC | #4

On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote:
> 2016-04-20 17:44+0200, Greg Kurz:
> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > introduced a check to prevent potential kernel memory corruption in case
> > the vcpu id is too great.
> > 
> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > 
> > This means the check does not belong here and should be moved to some arch
> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > 
> > ARM and s390 already have such a check.
> > 
> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > id is used as described in the above commit: I believe PowerPC can live
> > without this check.
> > 
> > In the end, this patch simply moves the check to MIPS and x86.
> > 
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> > v3: use ERR_PTR()
> > 
> >  arch/mips/kvm/mips.c |    7 ++++++-
> >  arch/x86/kvm/x86.c   |    3 +++
> >  virt/kvm/kvm_main.c  |    3 ---
> >  3 files changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> > index 70ef1a43c114..0278ea146db5 100644
> > --- a/arch/mips/kvm/mips.c
> > +++ b/arch/mips/kvm/mips.c
> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> >  	int err, size, offset;
> >  	void *gebase;
> >  	int i;
> > +	struct kvm_vcpu *vcpu;
> >  
> > -	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> > +	if (id >= KVM_MAX_VCPUS) {
> > +		err = -EINVAL;
> > +		goto out;
> 
> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug.

Thats out_free_cpu I think?

Cheers
James

> Just 'return ERR_PTR(-EINVAL)'?
> 
> > +	}
> >  
> > +	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> >  	if (!vcpu) {
> >  		err = -ENOMEM;
> >  		goto out;

Radim Krčmář April 20, 2016, 5:27 p.m. UTC | #5

2016-04-20 18:09+0100, James Hogan:
> On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote:
>> 2016-04-20 17:44+0200, Greg Kurz:
>> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>> > index 70ef1a43c114..0278ea146db5 100644
>> > --- a/arch/mips/kvm/mips.c
>> > +++ b/arch/mips/kvm/mips.c
>> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>> >  	int err, size, offset;
>> >  	void *gebase;
>> >  	int i;
>> > +	struct kvm_vcpu *vcpu;
>> >  
>> > -	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>> > +	if (id >= KVM_MAX_VCPUS) {
>> > +		err = -EINVAL;
>> > +		goto out;
>> 
>> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug.
> 
> Thats out_free_cpu I think?

My bad, it is.  Thank you!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 20, 2016, 5:53 p.m. UTC | #6

On Wed, 20 Apr 2016 19:27:06 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-20 18:09+0100, James Hogan:
> > On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote:  
> >> 2016-04-20 17:44+0200, Greg Kurz:  
> >> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> >> > index 70ef1a43c114..0278ea146db5 100644
> >> > --- a/arch/mips/kvm/mips.c
> >> > +++ b/arch/mips/kvm/mips.c
> >> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> >> >  	int err, size, offset;
> >> >  	void *gebase;
> >> >  	int i;
> >> > +	struct kvm_vcpu *vcpu;
> >> >  
> >> > -	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> >> > +	if (id >= KVM_MAX_VCPUS) {
> >> > +		err = -EINVAL;
> >> > +		goto out;  
> >> 
> >> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug.  
> > 
> > Thats out_free_cpu I think?  
> 
> My bad, it is.  Thank you!
> 

I kept the goto based construct because it was done this way for kzalloc().
but I agree that 'return ERR_PTR(-EINVAL)' may look more explicit.

Worth a v4 ?

Cheers.

--
Greg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 20, 2016, 6:29 p.m. UTC | #7

2016-04-20 17:44+0200, Greg Kurz:
> Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> introduced a check to prevent potential kernel memory corruption in case
> the vcpu id is too great.
> 
> Unfortunately this check assumes vcpu ids grow in sequence with a common
> difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> 1024, guests may be limited down to 128 vcpus on POWER8.
> 
> This means the check does not belong here and should be moved to some arch
> specific function: kvm_arch_vcpu_create() looks like a good candidate.
> 
> ARM and s390 already have such a check.
> 
> I could not spot any path in the PowerPC or common KVM code where a vcpu
> id is used as described in the above commit: I believe PowerPC can live
> without this check.

The only problematic path I see is kvm_get_vcpu_by_id(), which returns
NULL for any id above KVM_MAX_VCPUS.
kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for
duplicate ids, so PowerPC could end up with many VCPUs of the same id.
I'm not sure what could fail, but code doesn't expect this situation.
Patching kvm_get_vcpu_by_id() is easy, though.

Second issue is that Documentation/virtual/kvm/api.txt says
  4.7 KVM_CREATE_VCPU
  [...]
  This API adds a vcpu to a virtual machine.  The vcpu id is a small
  integer in the range [0, max_vcpus).

so we'd remove those two lines and change the API too.  The change would
be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
just because KVM is lacking an API to set DT ID?
x86 (APIC ID) is affected by this and ARM (MP ID) probably too.

(Maybe it is time to decouple VCPU ID used in KVM interfaces from
 architecture dependent CPU ID that the guest uses ...
 Mostly for future architectures that won't fit into 32 bits, but
 clarity of the code could go up as well.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 20, 2016, 6:31 p.m. UTC | #8

2016-04-20 19:53+0200, Greg Kurz:
> On Wed, 20 Apr 2016 19:27:06 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> 2016-04-20 18:09+0100, James Hogan:
>> > On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote:  
>> >> 2016-04-20 17:44+0200, Greg Kurz:  
>> >> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>> >> > index 70ef1a43c114..0278ea146db5 100644
>> >> > --- a/arch/mips/kvm/mips.c
>> >> > +++ b/arch/mips/kvm/mips.c
>> >> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>> >> >  	int err, size, offset;
>> >> >  	void *gebase;
>> >> >  	int i;
>> >> > +	struct kvm_vcpu *vcpu;
>> >> >  
>> >> > -	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>> >> > +	if (id >= KVM_MAX_VCPUS) {
>> >> > +		err = -EINVAL;
>> >> > +		goto out;  
>> >> 
>> >> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug.  
>> > 
>> > Thats out_free_cpu I think?  
>> 
>> My bad, it is.  Thank you!
>> 
> 
> I kept the goto based construct because it was done this way for kzalloc().
> but I agree that 'return ERR_PTR(-EINVAL)' may look more explicit.
> 
> Worth a v4 ?

No, it is consistent with kzalloc fault handling this way.

I was going to queue it, but found an issue with kvm_get_vcpu_by_id()
that would allow the guest to create multiple VCPUs with the same id,
which led to an unfortunate discourse on KVM API.
(Please see a new thread.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 21, 2016, 11:29 a.m. UTC | #9

On Wed, 20 Apr 2016 20:29:09 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-20 17:44+0200, Greg Kurz:
> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > introduced a check to prevent potential kernel memory corruption in case
> > the vcpu id is too great.
> > 
> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > 
> > This means the check does not belong here and should be moved to some arch
> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > 
> > ARM and s390 already have such a check.
> > 
> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > id is used as described in the above commit: I believe PowerPC can live
> > without this check.  
> 
> The only problematic path I see is kvm_get_vcpu_by_id(), which returns
> NULL for any id above KVM_MAX_VCPUS.

Oops my bad, I started to work on a 4.4 tree and I missed this check brought
by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).

But again, I believe the check is wrong there also: the changelog just mentions
this is a fastpath for the usual case where "VCPU ids match the array index"...
why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?

> kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for
> duplicate ids, so PowerPC could end up with many VCPUs of the same id.
> I'm not sure what could fail, but code doesn't expect this situation.
> Patching kvm_get_vcpu_by_id() is easy, though.
> 

Something like this ?

	if (id < 0)
		return NULL;
	if (id < KVM_MAX_VCPUS)
		vcpu = kvm_get_vcpu(kvm, id);

In the same patch ?

> Second issue is that Documentation/virtual/kvm/api.txt says
>   4.7 KVM_CREATE_VCPU
>   [...]
>   This API adds a vcpu to a virtual machine.  The vcpu id is a small
>   integer in the range [0, max_vcpus).
> 

Yeah and I find the meaning of max_vcpus is unclear.

Here it is considered as a limit for vcpu id, but if you look at the code,
KVM_MAX_VCPUS is also used as a limit for the number of vcpus:

virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {

> so we'd remove those two lines and change the API too.  The change would
> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
> just because KVM is lacking an API to set DT ID?

This is related to a limitation when running in book3s_hv mode with cpus
that support SMT (multiple HW threads): all HW threads within a core
cannot be running in different guests at the same time. 

We solve this by using a vcpu numbering scheme as follows:

vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest

where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
that can be scheduled to run on the same real core.

So, in the "worst" case where we want to run a guest with 1 thread/core and the host
has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.

> x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
> 

x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
patch kvm_get_vcpu_by_id() like suggested above.

Depending on the platform, ARM can be limited to VGIC_V3_MAX_CPUS (== 255) or
VGIC_V8_MAX_CPUS (== 8). I guess it won't be affected either.

> (Maybe it is time to decouple VCPU ID used in KVM interfaces from
>  architecture dependent CPU ID that the guest uses ...

Maybe... I did not get that far.

>  Mostly for future architectures that won't fit into 32 bits, but
>  clarity of the code could go up as well.)
> 

Thanks for your remarks !

Cheers.

--
Greg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cornelia Huck April 21, 2016, 12:26 p.m. UTC | #10

On Thu, 21 Apr 2016 13:29:58 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> On Wed, 20 Apr 2016 20:29:09 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 
> > 2016-04-20 17:44+0200, Greg Kurz:
> > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > > introduced a check to prevent potential kernel memory corruption in case
> > > the vcpu id is too great.
> > > 
> > > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > > 1024, guests may be limited down to 128 vcpus on POWER8.
> > > 
> > > This means the check does not belong here and should be moved to some arch
> > > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > > 
> > > ARM and s390 already have such a check.
> > > 
> > > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > > id is used as described in the above commit: I believe PowerPC can live
> > > without this check.  
> > 
> > The only problematic path I see is kvm_get_vcpu_by_id(), which returns
> > NULL for any id above KVM_MAX_VCPUS.
> 
> Oops my bad, I started to work on a 4.4 tree and I missed this check brought
> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
> 
> But again, I believe the check is wrong there also: the changelog just mentions
> this is a fastpath for the usual case where "VCPU ids match the array index"...
> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?

Probably because noone considered power :)

> 
> > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for
> > duplicate ids, so PowerPC could end up with many VCPUs of the same id.
> > I'm not sure what could fail, but code doesn't expect this situation.
> > Patching kvm_get_vcpu_by_id() is easy, though.
> > 
> 
> Something like this ?
> 
> 	if (id < 0)
> 		return NULL;
> 	if (id < KVM_MAX_VCPUS)
> 		vcpu = kvm_get_vcpu(kvm, id);
> 
> In the same patch ?
> 
> > Second issue is that Documentation/virtual/kvm/api.txt says
> >   4.7 KVM_CREATE_VCPU
> >   [...]
> >   This API adds a vcpu to a virtual machine.  The vcpu id is a small
> >   integer in the range [0, max_vcpus).
> > 
> 
> Yeah and I find the meaning of max_vcpus is unclear.
> 
> Here it is considered as a limit for vcpu id, but if you look at the code,
> KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
> 
> virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
> 
> > so we'd remove those two lines and change the API too.  The change would
> > be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
> > just because KVM is lacking an API to set DT ID?
> 
> This is related to a limitation when running in book3s_hv mode with cpus
> that support SMT (multiple HW threads): all HW threads within a core
> cannot be running in different guests at the same time. 
> 
> We solve this by using a vcpu numbering scheme as follows:
> 
> vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
> 
> where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
> that can be scheduled to run on the same real core.
> 
> So, in the "worst" case where we want to run a guest with 1 thread/core and the host
> has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.
> 
> > x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
> > 
> 
> x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
> patch kvm_get_vcpu_by_id() like suggested above.
> 
> Depending on the platform, ARM can be limited to VGIC_V3_MAX_CPUS (== 255) or
> VGIC_V8_MAX_CPUS (== 8). I guess it won't be affected either.

For s390, it's either 64 (no esca) or 248 (esca).

> 
> > (Maybe it is time to decouple VCPU ID used in KVM interfaces from
> >  architecture dependent CPU ID that the guest uses ...
> 
> Maybe... I did not get that far.

It seems that the various architectures are more different than I
thought... wasn't aware of the complicated situation on power, for
example.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 21, 2016, 1:05 p.m. UTC | #11

On Thu, 21 Apr 2016 14:26:19 +0200
Cornelia Huck <cornelia.huck@de.ibm.com> wrote:

> On Thu, 21 Apr 2016 13:29:58 +0200
> Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
> 
> > On Wed, 20 Apr 2016 20:29:09 +0200
> > Radim Kr?má? <rkrcmar@redhat.com> wrote:
> >   
> > > 2016-04-20 17:44+0200, Greg Kurz:  
> > > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > > > introduced a check to prevent potential kernel memory corruption in case
> > > > the vcpu id is too great.
> > > > 
> > > > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > > > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > > > 1024, guests may be limited down to 128 vcpus on POWER8.
> > > > 
> > > > This means the check does not belong here and should be moved to some arch
> > > > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > > > 
> > > > ARM and s390 already have such a check.
> > > > 
> > > > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > > > id is used as described in the above commit: I believe PowerPC can live
> > > > without this check.    
> > > 
> > > The only problematic path I see is kvm_get_vcpu_by_id(), which returns
> > > NULL for any id above KVM_MAX_VCPUS.  
> > 
> > Oops my bad, I started to work on a 4.4 tree and I missed this check brought
> > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
> > 
> > But again, I believe the check is wrong there also: the changelog just mentions
> > this is a fastpath for the usual case where "VCPU ids match the array index"...
> > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?  
> 
> Probably because noone considered power :)
> 

No surprise but the return path is a bit overkill anyway :)

> >   
> > > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for
> > > duplicate ids, so PowerPC could end up with many VCPUs of the same id.
> > > I'm not sure what could fail, but code doesn't expect this situation.
> > > Patching kvm_get_vcpu_by_id() is easy, though.
> > >   
> > 
> > Something like this ?
> > 
> > 	if (id < 0)
> > 		return NULL;
> > 	if (id < KVM_MAX_VCPUS)
> > 		vcpu = kvm_get_vcpu(kvm, id);
> > 
> > In the same patch ?
> >   
> > > Second issue is that Documentation/virtual/kvm/api.txt says
> > >   4.7 KVM_CREATE_VCPU
> > >   [...]
> > >   This API adds a vcpu to a virtual machine.  The vcpu id is a small
> > >   integer in the range [0, max_vcpus).
> > >   
> > 
> > Yeah and I find the meaning of max_vcpus is unclear.
> > 
> > Here it is considered as a limit for vcpu id, but if you look at the code,
> > KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
> > 
> > virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
> >   
> > > so we'd remove those two lines and change the API too.  The change would
> > > be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
> > > just because KVM is lacking an API to set DT ID?  
> > 
> > This is related to a limitation when running in book3s_hv mode with cpus
> > that support SMT (multiple HW threads): all HW threads within a core
> > cannot be running in different guests at the same time. 
> > 
> > We solve this by using a vcpu numbering scheme as follows:
> > 
> > vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
> > 
> > where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
> > that can be scheduled to run on the same real core.
> > 
> > So, in the "worst" case where we want to run a guest with 1 thread/core and the host
> > has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.
> >   
> > > x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
> > >   
> > 
> > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
> > patch kvm_get_vcpu_by_id() like suggested above.
> > 
> > Depending on the platform, ARM can be limited to VGIC_V3_MAX_CPUS (== 255) or
> > VGIC_V8_MAX_CPUS (== 8). I guess it won't be affected either.  
> 
> For s390, it's either 64 (no esca) or 248 (esca).
> 

And it is CONFIG_NR_CPUS for powerpc, i.e. 2048 per default on powernv.

But the problem here is more: can we compare the number of vcpus with vcpu ids ?

> >   
> > > (Maybe it is time to decouple VCPU ID used in KVM interfaces from
> > >  architecture dependent CPU ID that the guest uses ...  
> > 
> > Maybe... I did not get that far.  
> 
> It seems that the various architectures are more different than I
> thought... wasn't aware of the complicated situation on power, for
> example.

Yeah, and I think moving these vcpu id checks to the archs allows to
solve the problem and confine the complexity to the powerpc code.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Hildenbrand April 21, 2016, 1:22 p.m. UTC | #12

> On Wed, 20 Apr 2016 20:29:09 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 
> > 2016-04-20 17:44+0200, Greg Kurz:  
> > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > > introduced a check to prevent potential kernel memory corruption in case
> > > the vcpu id is too great.
> > > 
> > > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > > 1024, guests may be limited down to 128 vcpus on POWER8.
> > > 
> > > This means the check does not belong here and should be moved to some arch
> > > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > > 
> > > ARM and s390 already have such a check.
> > > 
> > > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > > id is used as described in the above commit: I believe PowerPC can live
> > > without this check.    
> > 
> > The only problematic path I see is kvm_get_vcpu_by_id(), which returns
> > NULL for any id above KVM_MAX_VCPUS.  
> 
> Oops my bad, I started to work on a 4.4 tree and I missed this check brought
> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
> 
> But again, I believe the check is wrong there also: the changelog just mentions
> this is a fastpath for the usual case where "VCPU ids match the array index"...
> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?
> 
> > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for
> > duplicate ids, so PowerPC could end up with many VCPUs of the same id.
> > I'm not sure what could fail, but code doesn't expect this situation.
> > Patching kvm_get_vcpu_by_id() is easy, though.
> >   
> 
> Something like this ?
> 
> 	if (id < 0)
> 		return NULL;
> 	if (id < KVM_MAX_VCPUS)
> 		vcpu = kvm_get_vcpu(kvm, id);
> 
> In the same patch ?

So the heuristic would only trigger if id < KVM_MAX_VCPUS.
By initializing vcpu to NULL this would work.

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Hildenbrand April 21, 2016, 1:24 p.m. UTC | #13

> On Wed, 20 Apr 2016 17:44:54 +0200
> Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
> 
> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > introduced a check to prevent potential kernel memory corruption in case
> > the vcpu id is too great.
> > 
> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > 
> > This means the check does not belong here and should be moved to some arch
> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > 
> > ARM and s390 already have such a check.
> > 
> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > id is used as described in the above commit: I believe PowerPC can live
> > without this check.
> > 
> > In the end, this patch simply moves the check to MIPS and x86.
> > 
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> > v3: use ERR_PTR()
> > 
> >  arch/mips/kvm/mips.c |    7 ++++++-
> >  arch/x86/kvm/x86.c   |    3 +++
> >  virt/kvm/kvm_main.c  |    3 ---
> >  3 files changed, 9 insertions(+), 4 deletions(-)
> >   
> 
> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> 

The existing s390 check looks sane to me, too!

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 21, 2016, 3:29 p.m. UTC | #14

2016-04-21 13:29+0200, Greg Kurz:
> On Wed, 20 Apr 2016 20:29:09 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> 2016-04-20 17:44+0200, Greg Kurz:
>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>> > introduced a check to prevent potential kernel memory corruption in case
>> > the vcpu id is too great.
>> > 
>> > Unfortunately this check assumes vcpu ids grow in sequence with a common
>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>> > 1024, guests may be limited down to 128 vcpus on POWER8.
>> > 
>> > This means the check does not belong here and should be moved to some arch
>> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
>> > 
>> > ARM and s390 already have such a check.
>> > 
>> > I could not spot any path in the PowerPC or common KVM code where a vcpu
>> > id is used as described in the above commit: I believe PowerPC can live
>> > without this check.  
>> 
>> The only problematic path I see is kvm_get_vcpu_by_id(), which returns
>> NULL for any id above KVM_MAX_VCPUS.
> 
> Oops my bad, I started to work on a 4.4 tree and I missed this check brought
> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
> 
> But again, I believe the check is wrong there also: the changelog just mentions
> this is a fastpath for the usual case where "VCPU ids match the array index"...
> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?

(The patch had to check id >= KVM_MAX_VCPUS for sanity and there could
 not be a VCPU with that index according to the spec, so it made a
 shortcut to the correct NULL result ...)

>> Second issue is that Documentation/virtual/kvm/api.txt says
>>   4.7 KVM_CREATE_VCPU
>>   [...]
>>   This API adds a vcpu to a virtual machine.  The vcpu id is a small
>>   integer in the range [0, max_vcpus).
>> 
> 
> Yeah and I find the meaning of max_vcpus is unclear.
> 
> Here it is considered as a limit for vcpu id, but if you look at the code,
> KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
> 
> virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {

I agree.  Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make
you think that online_vcpus limit interpretation is the correct one, but
the code is conflicted.

>> so we'd remove those two lines and change the API too.  The change would
>> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
>> just because KVM is lacking an API to set DT ID?
> 
> This is related to a limitation when running in book3s_hv mode with cpus
> that support SMT (multiple HW threads): all HW threads within a core
> cannot be running in different guests at the same time. 
> 
> We solve this by using a vcpu numbering scheme as follows:
> 
> vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
> 
> where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
> that can be scheduled to run on the same real core.
> 
> So, in the "worst" case where we want to run a guest with 1 thread/core and the host
> has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.

I see, thanks.  Accommodating existing users seems like an acceptable
excuse to change the API.

>> x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
>> 
> 
> x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
> patch kvm_get_vcpu_by_id() like suggested above.

x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
reserving blocks of bits for socket/core/thread, so if core or thread
count isn't a power of two, then the set of valid APIC IDs is sparse,
but max id is still limited by 255, so the effective maximum VCPU count
is lower.

x86 doesn't support APIC ID over 255 yet, though, so this change
wouldn't change a thing in practice. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 21, 2016, 3:49 p.m. UTC | #15

On Thu, 21 Apr 2016 17:29:16 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-21 13:29+0200, Greg Kurz:
> > On Wed, 20 Apr 2016 20:29:09 +0200
> > Radim Kr?má? <rkrcmar@redhat.com> wrote:  
> >> 2016-04-20 17:44+0200, Greg Kurz:  
> >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> >> > introduced a check to prevent potential kernel memory corruption in case
> >> > the vcpu id is too great.
> >> > 
> >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> >> > 
> >> > This means the check does not belong here and should be moved to some arch
> >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> >> > 
> >> > ARM and s390 already have such a check.
> >> > 
> >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> >> > id is used as described in the above commit: I believe PowerPC can live
> >> > without this check.    
> >> 
> >> The only problematic path I see is kvm_get_vcpu_by_id(), which returns
> >> NULL for any id above KVM_MAX_VCPUS.  
> > 
> > Oops my bad, I started to work on a 4.4 tree and I missed this check brought
> > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
> > 
> > But again, I believe the check is wrong there also: the changelog just mentions
> > this is a fastpath for the usual case where "VCPU ids match the array index"...
> > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?  
> 
> (The patch had to check id >= KVM_MAX_VCPUS for sanity and there could
>  not be a VCPU with that index according to the spec, so it made a
>  shortcut to the correct NULL result ...)
> 

With the spec in mind, you're right... the confusion comes from the fact
that powerpc decided to use bigger vcpu ids a long time ago but nobody
cared to document that.

> >> Second issue is that Documentation/virtual/kvm/api.txt says
> >>   4.7 KVM_CREATE_VCPU
> >>   [...]
> >>   This API adds a vcpu to a virtual machine.  The vcpu id is a small
> >>   integer in the range [0, max_vcpus).
> >>   
> > 
> > Yeah and I find the meaning of max_vcpus is unclear.
> > 
> > Here it is considered as a limit for vcpu id, but if you look at the code,
> > KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
> > 
> > virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {  
> 
> I agree.  Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make
> you think that online_vcpus limit interpretation is the correct one, but
> the code is conflicted.
> 
> >> so we'd remove those two lines and change the API too.  The change would
> >> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
> >> just because KVM is lacking an API to set DT ID?  
> > 
> > This is related to a limitation when running in book3s_hv mode with cpus
> > that support SMT (multiple HW threads): all HW threads within a core
> > cannot be running in different guests at the same time. 
> > 
> > We solve this by using a vcpu numbering scheme as follows:
> > 
> > vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
> > 
> > where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
> > that can be scheduled to run on the same real core.
> > 
> > So, in the "worst" case where we want to run a guest with 1 thread/core and the host
> > has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.  
> 
> I see, thanks.  Accommodating existing users seems like an acceptable
> excuse to change the API.
> 
> >> x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
> >>   
> > 
> > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
> > patch kvm_get_vcpu_by_id() like suggested above.  
> 
> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
> reserving blocks of bits for socket/core/thread, so if core or thread
> count isn't a power of two, then the set of valid APIC IDs is sparse,
> but max id is still limited by 255, so the effective maximum VCPU count
> is lower.
> 
> x86 doesn't support APIC ID over 255 yet, though, so this change
> wouldn't change a thing in practice. :)
> 

Thanks for the details !

So we're good ? Whose tree can carry these patches ?

Cheers.

--
Greg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 21, 2016, 4:08 p.m. UTC | #16

2016-04-21 17:49+0200, Greg Kurz:
> So we're good ?

I support the change, just had a nit about API design for v2.

>                 Whose tree can carry these patches ?

(PowerPC is the only immediately affected arch, so I'd it there.)

What do you think is best?  My experience in this regard is pretty low.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 21, 2016, 5:18 p.m. UTC | #17

On Thu, 21 Apr 2016 18:08:41 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-21 17:49+0200, Greg Kurz:
> > So we're good ?  
> 
> I support the change, just had a nit about API design for v2.
> 

As I said in my other mail, I'm not sure we should do more... if
that's okay for you and you still support the change, maybe you
can give an Acked-by ?

> >                 Whose tree can carry these patches ?  
> 
> (PowerPC is the only immediately affected arch, so I'd it there.)
> 
> What do you think is best?  My experience in this regard is pretty low.
> 

Maybe Paolo's tree but I guess we'd need some more acks from x86, ARM and
PowerPC :) KVM maintainers...

Thanks anyway for your valuable help !

Cheers.

--
Greg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 21, 2016, 5:39 p.m. UTC | #18

2016-04-21 19:18+0200, Greg Kurz:
> On Thu, 21 Apr 2016 18:08:41 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> 2016-04-21 17:49+0200, Greg Kurz:
>> > So we're good ?  
>> 
>> I support the change, just had a nit about API design for v2.
>> 
> 
> As I said in my other mail, I'm not sure we should do more... if
> that's okay for you and you still support the change, maybe you
> can give an Acked-by ?

I'm evil when it comes to APIs, so bear it a bit longer. :)

>> >                 Whose tree can carry these patches ?  
>> 
>> (PowerPC is the only immediately affected arch, so I'd it there.)
>> 
>> What do you think is best?  My experience in this regard is pretty low.
>> 
> 
> Maybe Paolo's tree but I guess we'd need some more acks from x86, ARM and
> PowerPC :) KVM maintainers...

Ok.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 21, 2016, 6:08 p.m. UTC | #19

On Thu, 21 Apr 2016 19:39:31 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-21 19:18+0200, Greg Kurz:
> > On Thu, 21 Apr 2016 18:08:41 +0200
> > Radim Kr?má? <rkrcmar@redhat.com> wrote:  
> >> 2016-04-21 17:49+0200, Greg Kurz:  
> >> > So we're good ?    
> >> 
> >> I support the change, just had a nit about API design for v2.
> >>   
> > 
> > As I said in my other mail, I'm not sure we should do more... if
> > that's okay for you and you still support the change, maybe you
> > can give an Acked-by ?  
> 
> I'm evil when it comes to APIs, so bear it a bit longer. :)
> 

Fair enough :)

> >> >                 Whose tree can carry these patches ?    
> >> 
> >> (PowerPC is the only immediately affected arch, so I'd it there.)
> >> 
> >> What do you think is best?  My experience in this regard is pretty low.
> >>   
> > 
> > Maybe Paolo's tree but I guess we'd need some more acks from x86, ARM and
> > PowerPC :) KVM maintainers...  
> 
> Ok.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li April 22, 2016, 1:40 a.m. UTC | #20

2016-04-21 23:29 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>:
> 2016-04-21 13:29+0200, Greg Kurz:
>> On Wed, 20 Apr 2016 20:29:09 +0200
>> Radim Kr?má? <rkrcmar@redhat.com> wrote:
>>> 2016-04-20 17:44+0200, Greg Kurz:
>>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>>> > introduced a check to prevent potential kernel memory corruption in case
>>> > the vcpu id is too great.
>>> >
>>> > Unfortunately this check assumes vcpu ids grow in sequence with a common
>>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>>> > 1024, guests may be limited down to 128 vcpus on POWER8.
>>> >
>>> > This means the check does not belong here and should be moved to some arch
>>> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
>>> >
>>> > ARM and s390 already have such a check.
>>> >
>>> > I could not spot any path in the PowerPC or common KVM code where a vcpu
>>> > id is used as described in the above commit: I believe PowerPC can live
>>> > without this check.
>>>
>>> The only problematic path I see is kvm_get_vcpu_by_id(), which returns
>>> NULL for any id above KVM_MAX_VCPUS.
>>
>> Oops my bad, I started to work on a 4.4 tree and I missed this check brought
>> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
>>
>> But again, I believe the check is wrong there also: the changelog just mentions
>> this is a fastpath for the usual case where "VCPU ids match the array index"...
>> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?
>
> (The patch had to check id >= KVM_MAX_VCPUS for sanity and there could
>  not be a VCPU with that index according to the spec, so it made a
>  shortcut to the correct NULL result ...)
>
>>> Second issue is that Documentation/virtual/kvm/api.txt says
>>>   4.7 KVM_CREATE_VCPU
>>>   [...]
>>>   This API adds a vcpu to a virtual machine.  The vcpu id is a small
>>>   integer in the range [0, max_vcpus).
>>>
>>
>> Yeah and I find the meaning of max_vcpus is unclear.
>>
>> Here it is considered as a limit for vcpu id, but if you look at the code,
>> KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
>>
>> virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
>
> I agree.  Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make
> you think that online_vcpus limit interpretation is the correct one, but
> the code is conflicted.
>
>>> so we'd remove those two lines and change the API too.  The change would
>>> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
>>> just because KVM is lacking an API to set DT ID?
>>
>> This is related to a limitation when running in book3s_hv mode with cpus
>> that support SMT (multiple HW threads): all HW threads within a core
>> cannot be running in different guests at the same time.
>>
>> We solve this by using a vcpu numbering scheme as follows:
>>
>> vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
>>
>> where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
>> that can be scheduled to run on the same real core.
>>
>> So, in the "worst" case where we want to run a guest with 1 thread/core and the host
>> has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.
>
> I see, thanks.  Accommodating existing users seems like an acceptable
> excuse to change the API.
>
>>> x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
>>>
>>
>> x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
>> patch kvm_get_vcpu_by_id() like suggested above.
>
> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
> reserving blocks of bits for socket/core/thread, so if core or thread
> count isn't a power of two, then the set of valid APIC IDs is sparse,

             ^^^^^^^^^^^^^^^^^^^
             ^^^^^^^
Is this the root reason why recommand max vCPUs per vm is 160 and the
KVM_MAX_VCPUS is 255 instead of due to perforamnce concern?

Regards,
Wanpeng Li

> but max id is still limited by 255, so the effective maximum VCPU count
> is lower.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 22, 2016, 1:07 p.m. UTC | #21

2016-04-22 09:40+0800, Wanpeng Li:
> 2016-04-21 23:29 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>:
>> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
>> reserving blocks of bits for socket/core/thread, so if core or thread
>> count isn't a power of two, then the set of valid APIC IDs is sparse,
> 
>              ^^^^^^^^^^^^^^^^^^^
>              ^^^^^^^
> Is this the root reason why recommand max vCPUs per vm is 160 and the
> KVM_MAX_VCPUS is 255 instead of due to perforamnce concern?

No, the recommended amout of VCPUs is 160 because I didn't bump it after
PLE stopped killing big guests. :/

You can get full 255 VCPU guest with a proper configuration, e.g.
"-smp 255" or "-smp 255,cores=8" and the only problem is scalability,
but I don't know of anything that doesn't scale to that point.

(Scaling up to 2^32 is harder, because you don't want O(N) search, nor
 full allocation on smaller guests.  Neither is a big problem now.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li April 23, 2016, 10:54 p.m. UTC | #22

2016-04-22 21:07 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>:
> 2016-04-22 09:40+0800, Wanpeng Li:
>> 2016-04-21 23:29 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>:
>>> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
>>> reserving blocks of bits for socket/core/thread, so if core or thread
>>> count isn't a power of two, then the set of valid APIC IDs is sparse,
>>
>>              ^^^^^^^^^^^^^^^^^^^
>>              ^^^^^^^
>> Is this the root reason why recommand max vCPUs per vm is 160 and the
>> KVM_MAX_VCPUS is 255 instead of due to perforamnce concern?
>
> No, the recommended amout of VCPUs is 160 because I didn't bump it after
> PLE stopped killing big guests. :/
>
> You can get full 255 VCPU guest with a proper configuration, e.g.
> "-smp 255" or "-smp 255,cores=8" and the only problem is scalability,
> but I don't know of anything that doesn't scale to that point.
>
> (Scaling up to 2^32 is harder, because you don't want O(N) search, nor
>  full allocation on smaller guests.  Neither is a big problem now.)

I see, thanks Radim.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v3] KVM: remove buggy vcpu id check on vcpu creation

Commit Message

Comments

Patch