[v4,2/2] KVM: move vcpu id checking to archs

Message ID	146124811255.32509.17679765789502091772.stgit@bahia.huguette.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <gkurz@linux.vnet.ibm.com>; Thu, 21 Apr 2016 15:21:08 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 21 Apr 2016 15:20:57 +0100 Subject: [PATCH v4 2/2] KVM: move vcpu id checking to archs From: Greg Kurz <gkurz@linux.vnet.ibm.com> To: Paolo Bonzini <pbonzini@redhat.com>, james.hogan@imgtec.com, mingo@redhat.com Cc: linux-mips@linux-mips.org, kvm@vger.kernel.org, rkrcmar@redhat.com, linux-kernel@vger.kernel.org, David Hildenbrand <dahi@linux.vnet.ibm.com>, qemu-ppc@nongnu.org, Cornelia Huck <cornelia.huck@de.ibm.com>, Paul Mackerras <paulus@samba.org>, David Gibson <david@gibson.dropbear.id.au> Date: Thu, 21 Apr 2016 16:20:53 +0200 Message-ID: <146124811255.32509.17679765789502091772.stgit@bahia.huguette.org> In-Reply-To: <146124809455.32509.15232948272580716135.stgit@bahia.huguette.org> References: <146124809455.32509.15232948272580716135.stgit@bahia.huguette.org> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Greg Kurz April 21, 2016, 2:20 p.m. UTC

Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
introduced a check to prevent potential kernel memory corruption in case
the vcpu id is too great.

Unfortunately this check assumes vcpu ids grow in sequence with a common
difference of 1, which is wrong: archs are free to use vcpu id as they fit.
For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
1024, guests may be limited down to 128 vcpus on POWER8.

This means the check does not belong here and should be moved to some arch
specific function: kvm_arch_vcpu_create() looks like a good candidate.

ARM and s390 already have such a check.

I could not spot any path in the PowerPC or common KVM code where a vcpu
id is used as described in the above commit: I believe PowerPC can live
without this check.

In the end, this patch simply moves the check to MIPS and x86. While here,
we also update the documentation to dissociate vcpu ids from the maximum
number of vcpus per virtual machine.

Acked-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
v4: - updated subject for more clarity on what the patch does
    - added James's and Connie's A-b tags
    - updated documentation

 Documentation/virtual/kvm/api.txt |    7 +++----
 arch/mips/kvm/mips.c              |    7 ++++++-
 arch/x86/kvm/x86.c                |    3 +++
 virt/kvm/kvm_main.c               |    3 ---
 4 files changed, 12 insertions(+), 8 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 21, 2016, 4 p.m. UTC | #1

2016-04-21 16:20+0200, Greg Kurz:
> Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> introduced a check to prevent potential kernel memory corruption in case
> the vcpu id is too great.
> 
> Unfortunately this check assumes vcpu ids grow in sequence with a common
> difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> 1024, guests may be limited down to 128 vcpus on POWER8.
> 
> This means the check does not belong here and should be moved to some arch
> specific function: kvm_arch_vcpu_create() looks like a good candidate.
> 
> ARM and s390 already have such a check.
> 
> I could not spot any path in the PowerPC or common KVM code where a vcpu
> id is used as described in the above commit: I believe PowerPC can live
> without this check.
> 
> In the end, this patch simply moves the check to MIPS and x86. While here,
> we also update the documentation to dissociate vcpu ids from the maximum
> number of vcpus per virtual machine.
> 
> Acked-by: James Hogan <james.hogan@imgtec.com>
> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> v4: - updated subject for more clarity on what the patch does
>     - added James's and Connie's A-b tags
>     - updated documentation
> 
>  Documentation/virtual/kvm/api.txt |    7 +++----
>  arch/mips/kvm/mips.c              |    7 ++++++-
>  arch/x86/kvm/x86.c                |    3 +++
>  virt/kvm/kvm_main.c               |    3 ---
>  4 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 4d0542c5206b..486a1d783b82 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -199,11 +199,10 @@ Type: vm ioctl
>  Parameters: vcpu id (apic id on x86)
>  Returns: vcpu fd on success, -1 on error
>  
> -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> -in the range [0, max_vcpus).
> +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.

Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
provided too high vcpu_id to an old KVM or because new KVM failed in
other areas.  Not a huge problem (because I expect that userspace will
die on both), but a new KVM_CAP would be able to disambiguate it.

Toggleable capability doesn't seem necessary and only PowerPC changes,
so the capability could be arch specific ... I think that a generic one
makes more sense, though.

Userspace also doesn't know the vcpu id limit anymore, and it might
care.  What do you think about returning the arch-specific limit (or the
highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?

I think this would also clarify the connection between VCPU limit and
VCPU_ID limit.  Or is a boolean cap better?

> -The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
> -the KVM_CHECK_EXTENSION ioctl() at run-time.
> +The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
> +KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
>  The maximum possible value for max_vcpus can be retrieved using the
>  KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 21, 2016, 4:45 p.m. UTC | #2

On Thu, 21 Apr 2016 18:00:19 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-21 16:20+0200, Greg Kurz:
> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > introduced a check to prevent potential kernel memory corruption in case
> > the vcpu id is too great.
> > 
> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > 
> > This means the check does not belong here and should be moved to some arch
> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > 
> > ARM and s390 already have such a check.
> > 
> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > id is used as described in the above commit: I believe PowerPC can live
> > without this check.
> > 
> > In the end, this patch simply moves the check to MIPS and x86. While here,
> > we also update the documentation to dissociate vcpu ids from the maximum
> > number of vcpus per virtual machine.
> > 
> > Acked-by: James Hogan <james.hogan@imgtec.com>
> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> > v4: - updated subject for more clarity on what the patch does
> >     - added James's and Connie's A-b tags
> >     - updated documentation
> > 
> >  Documentation/virtual/kvm/api.txt |    7 +++----
> >  arch/mips/kvm/mips.c              |    7 ++++++-
> >  arch/x86/kvm/x86.c                |    3 +++
> >  virt/kvm/kvm_main.c               |    3 ---
> >  4 files changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 4d0542c5206b..486a1d783b82 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -199,11 +199,10 @@ Type: vm ioctl
> >  Parameters: vcpu id (apic id on x86)
> >  Returns: vcpu fd on success, -1 on error
> >  
> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> > -in the range [0, max_vcpus).
> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.  
> 
> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> provided too high vcpu_id to an old KVM or because new KVM failed in
> other areas.  Not a huge problem (because I expect that userspace will
> die on both), but a new KVM_CAP would be able to disambiguate it.
> 
> Toggleable capability doesn't seem necessary and only PowerPC changes,
> so the capability could be arch specific ... I think that a generic one
> makes more sense, though.
>

I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.

So I understand your concern but would we have a user for this ?

> Userspace also doesn't know the vcpu id limit anymore, and it might
> care.  What do you think about returning the arch-specific limit (or the
> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> 

This is partly true: only arch agnostic code would be lost.

Moreover this is a problem for powerpc only at the moment and userspace code
can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.

For other architectures, it is simply KVM_MAX_VCPUS.

> I think this would also clarify the connection between VCPU limit and
> VCPU_ID limit.  Or is a boolean cap better?
> 

Well, I'm not fan of adding a generic API to handle a corner case... maybe later
if we have other scenarios where vcpu ids need to cross the limit ?

> > -The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
> > -the KVM_CHECK_EXTENSION ioctl() at run-time.
> > +The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
> > +KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> >  The maximum possible value for max_vcpus can be retrieved using the
> >  KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.  
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 21, 2016, 5:36 p.m. UTC | #3

2016-04-21 18:45+0200, Greg Kurz:
> On Thu, 21 Apr 2016 18:00:19 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> 2016-04-21 16:20+0200, Greg Kurz:
>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>> > introduced a check to prevent potential kernel memory corruption in case
>> > the vcpu id is too great.
>> > 
>> > Unfortunately this check assumes vcpu ids grow in sequence with a common
>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>> > 1024, guests may be limited down to 128 vcpus on POWER8.
>> > 
>> > This means the check does not belong here and should be moved to some arch
>> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
>> > 
>> > ARM and s390 already have such a check.
>> > 
>> > I could not spot any path in the PowerPC or common KVM code where a vcpu
>> > id is used as described in the above commit: I believe PowerPC can live
>> > without this check.
>> > 
>> > In the end, this patch simply moves the check to MIPS and x86. While here,
>> > we also update the documentation to dissociate vcpu ids from the maximum
>> > number of vcpus per virtual machine.
>> > 
>> > Acked-by: James Hogan <james.hogan@imgtec.com>
>> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>> > ---
>> > v4: - updated subject for more clarity on what the patch does
>> >     - added James's and Connie's A-b tags
>> >     - updated documentation
>> > 
>> >  Documentation/virtual/kvm/api.txt |    7 +++----
>> >  arch/mips/kvm/mips.c              |    7 ++++++-
>> >  arch/x86/kvm/x86.c                |    3 +++
>> >  virt/kvm/kvm_main.c               |    3 ---
>> >  4 files changed, 12 insertions(+), 8 deletions(-)
>> > 
>> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> > index 4d0542c5206b..486a1d783b82 100644
>> > --- a/Documentation/virtual/kvm/api.txt
>> > +++ b/Documentation/virtual/kvm/api.txt
>> > @@ -199,11 +199,10 @@ Type: vm ioctl
>> >  Parameters: vcpu id (apic id on x86)
>> >  Returns: vcpu fd on success, -1 on error
>> >  
>> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
>> > -in the range [0, max_vcpus).
>> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.  
>> 
>> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
>> provided too high vcpu_id to an old KVM or because new KVM failed in
>> other areas.  Not a huge problem (because I expect that userspace will
>> die on both), but a new KVM_CAP would be able to disambiguate it.
>> 
>> Toggleable capability doesn't seem necessary and only PowerPC changes,
>> so the capability could be arch specific ... I think that a generic one
>> makes more sense, though.
>>
> 
> I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.

Yes, userspace cannot disambiguate, but would have the option of not
doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.

> So I understand your concern but would we have a user for this ?

I think so, new userspace on pre-patch KVM is the most likely one.

Userspace cannot tell that KVM doesn't support the extension and
behaving like on patched KVM would result in a failure with cryptic
error message, because KVM only returns EINVAL.

Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
recognizing that the user wanted too much?

>> Userspace also doesn't know the vcpu id limit anymore, and it might
>> care.  What do you think about returning the arch-specific limit (or the
>> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
>> 
> 
> This is partly true: only arch agnostic code would be lost.
> 
> Moreover this is a problem for powerpc only at the moment and userspace code
> can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.

How would that work on KVM without this patch?

> For other architectures, it is simply KVM_MAX_VCPUS.

(Other architectures would not implement the capability.)

>> I think this would also clarify the connection between VCPU limit and
>> VCPU_ID limit.  Or is a boolean cap better?
>> 
> 
> Well, I'm not fan of adding a generic API to handle a corner case...

I don't like it either, but I think that introducing the capability is
worth avoided problems.

>                                                                      maybe later
> if we have other scenarios where vcpu ids need to cross the limit ?

x86 is going to have that soon too -- vcpu_id will be able to range from
0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
won't be improved to actually scale, so MAX_CPUS will remain lower.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wei Yang April 22, 2016, 9:21 a.m. UTC | #4

Hi, Greg

One confusion.

There are 5 kvm_arch_vcpu_create() while in this patch you changed 2 of them.
Some particular reason?

On Thu, Apr 21, 2016 at 04:20:53PM +0200, Greg Kurz wrote:
>Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>introduced a check to prevent potential kernel memory corruption in case
>the vcpu id is too great.
>
>Unfortunately this check assumes vcpu ids grow in sequence with a common
>difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>1024, guests may be limited down to 128 vcpus on POWER8.
>
>This means the check does not belong here and should be moved to some arch
>specific function: kvm_arch_vcpu_create() looks like a good candidate.
>
>ARM and s390 already have such a check.
>
>I could not spot any path in the PowerPC or common KVM code where a vcpu
>id is used as described in the above commit: I believe PowerPC can live
>without this check.
>
>In the end, this patch simply moves the check to MIPS and x86. While here,
>we also update the documentation to dissociate vcpu ids from the maximum
>number of vcpus per virtual machine.
>
>Acked-by: James Hogan <james.hogan@imgtec.com>
>Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>---
>v4: - updated subject for more clarity on what the patch does
>    - added James's and Connie's A-b tags
>    - updated documentation
>
> Documentation/virtual/kvm/api.txt |    7 +++----
> arch/mips/kvm/mips.c              |    7 ++++++-
> arch/x86/kvm/x86.c                |    3 +++
> virt/kvm/kvm_main.c               |    3 ---
> 4 files changed, 12 insertions(+), 8 deletions(-)
>
>diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>index 4d0542c5206b..486a1d783b82 100644
>--- a/Documentation/virtual/kvm/api.txt
>+++ b/Documentation/virtual/kvm/api.txt
>@@ -199,11 +199,10 @@ Type: vm ioctl
> Parameters: vcpu id (apic id on x86)
> Returns: vcpu fd on success, -1 on error
> 
>-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
>-in the range [0, max_vcpus).
>+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
> 
>-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
>-the KVM_CHECK_EXTENSION ioctl() at run-time.
>+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
>+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> The maximum possible value for max_vcpus can be retrieved using the
> KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> 
>diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>index 70ef1a43c114..0278ea146db5 100644
>--- a/arch/mips/kvm/mips.c
>+++ b/arch/mips/kvm/mips.c
>@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> 	int err, size, offset;
> 	void *gebase;
> 	int i;
>+	struct kvm_vcpu *vcpu;
> 
>-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>+	if (id >= KVM_MAX_VCPUS) {
>+		err = -EINVAL;
>+		goto out;
>+	}
> 
>+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> 	if (!vcpu) {
> 		err = -ENOMEM;
> 		goto out;
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 9b7798c7b210..7738202edcce 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
> {
> 	struct kvm_vcpu *vcpu;
> 
>+	if (id >= KVM_MAX_VCPUS)
>+		return ERR_PTR(-EINVAL);
>+
> 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
> 		printk_once(KERN_WARNING
> 		"kvm: SMP vm created on host with unstable TSC; "
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index 4fd482fb9260..6b6cca3cb488 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> 	int r;
> 	struct kvm_vcpu *vcpu;
> 
>-	if (id >= KVM_MAX_VCPUS)
>-		return -EINVAL;
>-
> 	vcpu = kvm_arch_vcpu_create(kvm, id);
> 	if (IS_ERR(vcpu))
> 		return PTR_ERR(vcpu);
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 22, 2016, 9:25 a.m. UTC | #5

Hi Radim !

On Thu, 21 Apr 2016 19:36:11 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-21 18:45+0200, Greg Kurz:
> > On Thu, 21 Apr 2016 18:00:19 +0200
> > Radim Kr?má? <rkrcmar@redhat.com> wrote:  
> >> 2016-04-21 16:20+0200, Greg Kurz:  
> >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> >> > introduced a check to prevent potential kernel memory corruption in case
> >> > the vcpu id is too great.
> >> > 
> >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> >> > 
> >> > This means the check does not belong here and should be moved to some arch
> >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> >> > 
> >> > ARM and s390 already have such a check.
> >> > 
> >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> >> > id is used as described in the above commit: I believe PowerPC can live
> >> > without this check.
> >> > 
> >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> >> > we also update the documentation to dissociate vcpu ids from the maximum
> >> > number of vcpus per virtual machine.
> >> > 
> >> > Acked-by: James Hogan <james.hogan@imgtec.com>
> >> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> >> > ---
> >> > v4: - updated subject for more clarity on what the patch does
> >> >     - added James's and Connie's A-b tags
> >> >     - updated documentation
> >> > 
> >> >  Documentation/virtual/kvm/api.txt |    7 +++----
> >> >  arch/mips/kvm/mips.c              |    7 ++++++-
> >> >  arch/x86/kvm/x86.c                |    3 +++
> >> >  virt/kvm/kvm_main.c               |    3 ---
> >> >  4 files changed, 12 insertions(+), 8 deletions(-)
> >> > 
> >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> > index 4d0542c5206b..486a1d783b82 100644
> >> > --- a/Documentation/virtual/kvm/api.txt
> >> > +++ b/Documentation/virtual/kvm/api.txt
> >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> >> >  Parameters: vcpu id (apic id on x86)
> >> >  Returns: vcpu fd on success, -1 on error
> >> >  
> >> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> >> > -in the range [0, max_vcpus).
> >> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.    
> >> 
> >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> >> provided too high vcpu_id to an old KVM or because new KVM failed in
> >> other areas.  Not a huge problem (because I expect that userspace will
> >> die on both), but a new KVM_CAP would be able to disambiguate it.
> >> 
> >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> >> so the capability could be arch specific ... I think that a generic one
> >> makes more sense, though.
> >>  
> > 
> > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.  
> 
> Yes, userspace cannot disambiguate, but would have the option of not
> doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
> 

It makes sense indeed.

> > So I understand your concern but would we have a user for this ?  
> 
> I think so, new userspace on pre-patch KVM is the most likely one.
> 
> Userspace cannot tell that KVM doesn't support the extension and
> behaving like on patched KVM would result in a failure with cryptic
> error message, because KVM only returns EINVAL.
> 

This is already the case with or without the patch... which only changes
things for PowerPC userspace. And in the case of QEMU, we're already
violating the spec with the way we compute vcpu ids.

> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
> recognizing that the user wanted too much?
> 

No. The error is caught in generic code and QEMU exits for all archs.

And BTW, how would QEMU guess that vcpu id is too high ? I see at
least three paths that can return EINVAL...

> >> Userspace also doesn't know the vcpu id limit anymore, and it might
> >> care.  What do you think about returning the arch-specific limit (or the
> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> >>   
> > 
> > This is partly true: only arch agnostic code would be lost.
> > 
> > Moreover this is a problem for powerpc only at the moment and userspace code
> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.  
> 
> How would that work on KVM without this patch?
> 

It doesn't work for PowerPC :)

KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
KVM_CAP_PPC_SMT indicates the host has 8 threads per core.

KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
it has vcpu_id == 128 * 8 == 1024.

Of course we can patch QEMU to restrict the maximum number of vcpus
to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
KVM for PowerPC is sized to run MAX_VCPUS... :\

> > For other architectures, it is simply KVM_MAX_VCPUS.  
> 
> (Other architectures would not implement the capability.)
> 

So this would be KVM_CAP_PPC_MAX_VCPU_ID ?

> >> I think this would also clarify the connection between VCPU limit and
> >> VCPU_ID limit.  Or is a boolean cap better?
> >>   
> > 
> > Well, I'm not fan of adding a generic API to handle a corner case...  
> 
> I don't like it either, but I think that introducing the capability is
> worth avoided problems.
> 

I admit that having separate capabilities for the number of vcpus and the
maximum vcpu id fixes the confusion once and for all.

> >                                                                      maybe later
> > if we have other scenarios where vcpu ids need to cross the limit ?  
> 
> x86 is going to have that soon too -- vcpu_id will be able to range from
> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> won't be improved to actually scale, so MAX_CPUS will remain lower.
> 

Do you have some pointers to share so that we can see the broader picture ?

Thanks !

--
Greg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 22, 2016, 9:30 a.m. UTC | #6

On Fri, 22 Apr 2016 17:21:03 +0800
Wei Yang <richard.weiyang@huawei.com> wrote:

> Hi, Greg
> 

Hi Wei !

> One confusion.
> 
> There are 5 kvm_arch_vcpu_create() while in this patch you changed 2 of them.
> Some particular reason?
> 

Yes and the reason is given in the changelog:
- ARM and s390 already have such a check
- PowerPC can live without this check
- this patch simply moves the check to MIPS and x86

Does it clarify ?

Cheers.

--
Greg

> On Thu, Apr 21, 2016 at 04:20:53PM +0200, Greg Kurz wrote:
> >Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> >introduced a check to prevent potential kernel memory corruption in case
> >the vcpu id is too great.
> >
> >Unfortunately this check assumes vcpu ids grow in sequence with a common
> >difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> >For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> >mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> >1024, guests may be limited down to 128 vcpus on POWER8.
> >
> >This means the check does not belong here and should be moved to some arch
> >specific function: kvm_arch_vcpu_create() looks like a good candidate.
> >
> >ARM and s390 already have such a check.
> >
> >I could not spot any path in the PowerPC or common KVM code where a vcpu
> >id is used as described in the above commit: I believe PowerPC can live
> >without this check.
> >
> >In the end, this patch simply moves the check to MIPS and x86. While here,
> >we also update the documentation to dissociate vcpu ids from the maximum
> >number of vcpus per virtual machine.
> >
> >Acked-by: James Hogan <james.hogan@imgtec.com>
> >Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> >---
> >v4: - updated subject for more clarity on what the patch does
> >    - added James's and Connie's A-b tags
> >    - updated documentation
> >
> > Documentation/virtual/kvm/api.txt |    7 +++----
> > arch/mips/kvm/mips.c              |    7 ++++++-
> > arch/x86/kvm/x86.c                |    3 +++
> > virt/kvm/kvm_main.c               |    3 ---
> > 4 files changed, 12 insertions(+), 8 deletions(-)
> >
> >diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >index 4d0542c5206b..486a1d783b82 100644
> >--- a/Documentation/virtual/kvm/api.txt
> >+++ b/Documentation/virtual/kvm/api.txt
> >@@ -199,11 +199,10 @@ Type: vm ioctl
> > Parameters: vcpu id (apic id on x86)
> > Returns: vcpu fd on success, -1 on error
> > 
> >-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> >-in the range [0, max_vcpus).
> >+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
> > 
> >-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
> >-the KVM_CHECK_EXTENSION ioctl() at run-time.
> >+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
> >+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> > The maximum possible value for max_vcpus can be retrieved using the
> > KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> > 
> >diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> >index 70ef1a43c114..0278ea146db5 100644
> >--- a/arch/mips/kvm/mips.c
> >+++ b/arch/mips/kvm/mips.c
> >@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> > 	int err, size, offset;
> > 	void *gebase;
> > 	int i;
> >+	struct kvm_vcpu *vcpu;
> > 
> >-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> >+	if (id >= KVM_MAX_VCPUS) {
> >+		err = -EINVAL;
> >+		goto out;
> >+	}
> > 
> >+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> > 	if (!vcpu) {
> > 		err = -ENOMEM;
> > 		goto out;
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index 9b7798c7b210..7738202edcce 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
> > {
> > 	struct kvm_vcpu *vcpu;
> > 
> >+	if (id >= KVM_MAX_VCPUS)
> >+		return ERR_PTR(-EINVAL);
> >+
> > 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
> > 		printk_once(KERN_WARNING
> > 		"kvm: SMP vm created on host with unstable TSC; "
> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >index 4fd482fb9260..6b6cca3cb488 100644
> >--- a/virt/kvm/kvm_main.c
> >+++ b/virt/kvm/kvm_main.c
> >@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> > 	int r;
> > 	struct kvm_vcpu *vcpu;
> > 
> >-	if (id >= KVM_MAX_VCPUS)
> >-		return -EINVAL;
> >-
> > 	vcpu = kvm_arch_vcpu_create(kvm, id);
> > 	if (IS_ERR(vcpu))
> > 		return PTR_ERR(vcpu);
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cornelia Huck April 22, 2016, 10:22 a.m. UTC | #7

On Fri, 22 Apr 2016 11:25:38 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> On Thu, 21 Apr 2016 19:36:11 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:

> > > For other architectures, it is simply KVM_MAX_VCPUS.  
> > 
> > (Other architectures would not implement the capability.)
> > 
> 
> So this would be KVM_CAP_PPC_MAX_VCPU_ID ?
> 
> > >> I think this would also clarify the connection between VCPU limit and
> > >> VCPU_ID limit.  Or is a boolean cap better?
> > >>   
> > > 
> > > Well, I'm not fan of adding a generic API to handle a corner case...  
> > 
> > I don't like it either, but I think that introducing the capability is
> > worth avoided problems.
> > 
> 
> I admit that having separate capabilities for the number of vcpus and the
> maximum vcpu id fixes the confusion once and for all.

Yes, and I think that the new max_vpcu_id cap should be generic for
that reason.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Igor Mammedov April 22, 2016, 11:19 a.m. UTC | #8

On Fri, 22 Apr 2016 11:25:38 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> Hi Radim !
> 
> On Thu, 21 Apr 2016 19:36:11 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 
> > 2016-04-21 18:45+0200, Greg Kurz:  
> > > On Thu, 21 Apr 2016 18:00:19 +0200
> > > Radim Kr?má? <rkrcmar@redhat.com> wrote:    
> > >> 2016-04-21 16:20+0200, Greg Kurz:    
[...]
> > >                                                                      maybe later
> > > if we have other scenarios where vcpu ids need to cross the limit ?    
> > 
> > x86 is going to have that soon too -- vcpu_id will be able to range from
> > 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> > won't be improved to actually scale, so MAX_CPUS will remain lower.
> >  
That's not true, x86 is going to stick with KVM_MAX_VCPUS/qemu's max_cpus,
the only thing that is going to change is that max supported APIC ID
value will be in range 0 to 2^32-1 vs current 8bit one
and since APIC ID is not vcpu_id so it won't affect vcpu_id.
 
> 
> Do you have some pointers to share so that we can see the broader picture ?
> 
> Thanks !
> 
> --
> Greg
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 22, 2016, 1:40 p.m. UTC | #9

2016-04-22 11:25+0200, Greg Kurz:
> Hi Radim !
> 
> On Thu, 21 Apr 2016 19:36:11 +0200
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
> 
> > 2016-04-21 18:45+0200, Greg Kurz:
> > > On Thu, 21 Apr 2016 18:00:19 +0200
> > > Radim Kr?má? <rkrcmar@redhat.com> wrote:  
> > >> 2016-04-21 16:20+0200, Greg Kurz:  
> > >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > >> > introduced a check to prevent potential kernel memory corruption in case
> > >> > the vcpu id is too great.
> > >> > 
> > >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > >> > 
> > >> > This means the check does not belong here and should be moved to some arch
> > >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > >> > 
> > >> > ARM and s390 already have such a check.
> > >> > 
> > >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > >> > id is used as described in the above commit: I believe PowerPC can live
> > >> > without this check.
> > >> > 
> > >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> > >> > we also update the documentation to dissociate vcpu ids from the maximum
> > >> > number of vcpus per virtual machine.
> > >> > 
> > >> > Acked-by: James Hogan <james.hogan@imgtec.com>
> > >> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > >> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > >> > ---
> > >> > v4: - updated subject for more clarity on what the patch does
> > >> >     - added James's and Connie's A-b tags
> > >> >     - updated documentation
> > >> > 
> > >> >  Documentation/virtual/kvm/api.txt |    7 +++----
> > >> >  arch/mips/kvm/mips.c              |    7 ++++++-
> > >> >  arch/x86/kvm/x86.c                |    3 +++
> > >> >  virt/kvm/kvm_main.c               |    3 ---
> > >> >  4 files changed, 12 insertions(+), 8 deletions(-)
> > >> > 
> > >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > >> > index 4d0542c5206b..486a1d783b82 100644
> > >> > --- a/Documentation/virtual/kvm/api.txt
> > >> > +++ b/Documentation/virtual/kvm/api.txt
> > >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> > >> >  Parameters: vcpu id (apic id on x86)
> > >> >  Returns: vcpu fd on success, -1 on error
> > >> >  
> > >> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> > >> > -in the range [0, max_vcpus).
> > >> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.    
> > >> 
> > >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> > >> provided too high vcpu_id to an old KVM or because new KVM failed in
> > >> other areas.  Not a huge problem (because I expect that userspace will
> > >> die on both), but a new KVM_CAP would be able to disambiguate it.
> > >> 
> > >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> > >> so the capability could be arch specific ... I think that a generic one
> > >> makes more sense, though.
> > >>  
> > > 
> > > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.  
> > 
> > Yes, userspace cannot disambiguate, but would have the option of not
> > doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
> > 
> 
> It makes sense indeed.
> 
>> > So I understand your concern but would we have a user for this ?  
>> 
>> I think so, new userspace on pre-patch KVM is the most likely one.
>> 
>> Userspace cannot tell that KVM doesn't support the extension and
>> behaving like on patched KVM would result in a failure with cryptic
>> error message, because KVM only returns EINVAL.
>> 
> 
> This is already the case with or without the patch... which only changes
> things for PowerPC userspace.

I guess that the error message from QEMU should be improved then ...
The spec is quite clear that KVM is going to fail because of invalid
vcpu_id.

x86 QEMU does this when the amount of CPUs doesn't fit APIC constraints:
  # qemu-kvm -smp 160,cores=9
  qemu-system-x86_64: max_cpus is too large. APIC ID of last CPU is 278

Not perfectly understandable to the uninitiated, but it gets the point
across.

>                               And in the case of QEMU, we're already
> violating the spec with the way we compute vcpu ids.

The numbering strategy is valid.  KVM spec never said that numbering
cannot be sparse/arbitrary, just that vcpu_id must not exceed MAX_VCPUS.

>> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
>> recognizing that the user wanted too much?
>> 
> 
> No. The error is caught in generic code and QEMU exits for all archs.
> 
> And BTW, how would QEMU guess that vcpu id is too high ? I see at
> least three paths that can return EINVAL...

I agree, -EINVAL is the problem. :)
(Returning -ERANGE would have made more sense.)

The userspace has to guess if it tries fails, but if it was aware of the
actual limit, it could assume that the creation failed because of high
vcpu_id and report it as a user-amendable error ...
Userspace would better do some sanity checks beforehand instead of
trying and failing, though.

>> >> Userspace also doesn't know the vcpu id limit anymore, and it might
>> >> care.  What do you think about returning the arch-specific limit (or the
>> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
>> >>   
>> > 
>> > This is partly true: only arch agnostic code would be lost.
>> > 
>> > Moreover this is a problem for powerpc only at the moment and userspace code
>> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.  
>> 
>> How would that work on KVM without this patch?
>> 
> 
> It doesn't work for PowerPC :)
> 
> KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
> KVM_CAP_PPC_SMT indicates the host has 8 threads per core.
> 
> KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
> it has vcpu_id == 128 * 8 == 1024.
> 
> Of course we can patch QEMU to restrict the maximum number of vcpus
> to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
> KVM for PowerPC is sized to run MAX_VCPUS... :\

Yeah.  If we introduced the capability, then QEMU would restrict vcpu_id
to MAX_VCPUS or MAX_VCPU_ID and the number of VCPUS always to MAX_VCPUS.

>> > For other architectures, it is simply KVM_MAX_VCPUS.  
>> 
>> (Other architectures would not implement the capability.)
>> 
> 
> So this would be KVM_CAP_PPC_MAX_VCPU_ID ?

No, need.  The capability could be generic, because the concept is
arch-neutral (and it looks like x86 will use it too).

Unimplemented capabilities return 0, which is why other architectures
don't need to implement it.  Nothing can break:  userspace on old KVM
will get 0 for the capability, so it's going to behave as if we didn't
have this patch and this patch made sure that nothing actually changed
for other architectures.

>> >                                                                      maybe later
>> > if we have other scenarios where vcpu ids need to cross the limit ?  
>> 
>> x86 is going to have that soon too -- vcpu_id will be able to range from
>> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
>> won't be improved to actually scale, so MAX_CPUS will remain lower.
>> 
> 
> Do you have some pointers to share so that we can see the broader picture ?

At the moment, the most concrete one is x2APIC spec, sorry.

KVM currently supports only xAPIC, which has 8 bit APIC ID.
(The x2APIC interface that you might see is paravirtualized.)

vcpu_id is APIC ID on x86 and x2APIC still has sparse allocation that
encodes topology.  KVM is going to support standard x2APIC (when QEMU
supports interrupt remapping), which opens a possibility where we'd end
up in exactly the same situation where PowerPC is now -- architectural
code would handle vcpu_id bigger than MAX_CPUS, but userspace couldn't
make us of it, because of API limitations.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář April 22, 2016, 1:48 p.m. UTC | #10

2016-04-22 13:19+0200, Igor Mammedov:
> On Fri, 22 Apr 2016 11:25:38 +0200
> Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
>> On Thu, 21 Apr 2016 19:36:11 +0200
>> Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> > 2016-04-21 18:45+0200, Greg Kurz:  
>> > > On Thu, 21 Apr 2016 18:00:19 +0200
>> > > Radim Kr?má? <rkrcmar@redhat.com> wrote:    
>> > >> 2016-04-21 16:20+0200, Greg Kurz:    
>[...]
>> > >                                                                      maybe later
>> > > if we have other scenarios where vcpu ids need to cross the limit ?    
>> > 
>> > x86 is going to have that soon too -- vcpu_id will be able to range from
>> > 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
>> > won't be improved to actually scale, so MAX_CPUS will remain lower.
>> >  
> That's not true, x86 is going to stick with KVM_MAX_VCPUS/qemu's max_cpus,
> the only thing that is going to change is that max supported APIC ID
> value will be in range 0 to 2^32-1 vs current 8bit one
> and since APIC ID is not vcpu_id so it won't affect vcpu_id.

I wish it wasn't.  vcpu_id is the initial APIC ID -- at least the spec
says so and KVM code behaves like that (QEMU does too).

It doesn't have to be so, though, KVM_SET_LAPIC provides the interface
to set APIC ID.  We'd decouple these two and change some related things.
(And add yet another cap for that? :])

I'll see what would be really needed,

thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 22, 2016, 2:50 p.m. UTC | #11

On Fri, 22 Apr 2016 15:40:30 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-22 11:25+0200, Greg Kurz:
> > Hi Radim !
> > 
> > On Thu, 21 Apr 2016 19:36:11 +0200
> > Radim Kr?má? <rkrcmar@redhat.com> wrote:
> >   
> > > 2016-04-21 18:45+0200, Greg Kurz:  
> > > > On Thu, 21 Apr 2016 18:00:19 +0200
> > > > Radim Kr?má? <rkrcmar@redhat.com> wrote:    
> > > >> 2016-04-21 16:20+0200, Greg Kurz:    
> > > >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > > >> > introduced a check to prevent potential kernel memory corruption in case
> > > >> > the vcpu id is too great.
> > > >> > 
> > > >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > > >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > > >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > > >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > > >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > > >> > 
> > > >> > This means the check does not belong here and should be moved to some arch
> > > >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > > >> > 
> > > >> > ARM and s390 already have such a check.
> > > >> > 
> > > >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > > >> > id is used as described in the above commit: I believe PowerPC can live
> > > >> > without this check.
> > > >> > 
> > > >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> > > >> > we also update the documentation to dissociate vcpu ids from the maximum
> > > >> > number of vcpus per virtual machine.
> > > >> > 
> > > >> > Acked-by: James Hogan <james.hogan@imgtec.com>
> > > >> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > >> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > >> > ---
> > > >> > v4: - updated subject for more clarity on what the patch does
> > > >> >     - added James's and Connie's A-b tags
> > > >> >     - updated documentation
> > > >> > 
> > > >> >  Documentation/virtual/kvm/api.txt |    7 +++----
> > > >> >  arch/mips/kvm/mips.c              |    7 ++++++-
> > > >> >  arch/x86/kvm/x86.c                |    3 +++
> > > >> >  virt/kvm/kvm_main.c               |    3 ---
> > > >> >  4 files changed, 12 insertions(+), 8 deletions(-)
> > > >> > 
> > > >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > >> > index 4d0542c5206b..486a1d783b82 100644
> > > >> > --- a/Documentation/virtual/kvm/api.txt
> > > >> > +++ b/Documentation/virtual/kvm/api.txt
> > > >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> > > >> >  Parameters: vcpu id (apic id on x86)
> > > >> >  Returns: vcpu fd on success, -1 on error
> > > >> >  
> > > >> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> > > >> > -in the range [0, max_vcpus).
> > > >> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.      
> > > >> 
> > > >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> > > >> provided too high vcpu_id to an old KVM or because new KVM failed in
> > > >> other areas.  Not a huge problem (because I expect that userspace will
> > > >> die on both), but a new KVM_CAP would be able to disambiguate it.
> > > >> 
> > > >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> > > >> so the capability could be arch specific ... I think that a generic one
> > > >> makes more sense, though.
> > > >>    
> > > > 
> > > > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > > > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.    
> > > 
> > > Yes, userspace cannot disambiguate, but would have the option of not
> > > doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
> > >   
> > 
> > It makes sense indeed.
> >   
> >> > So I understand your concern but would we have a user for this ?    
> >> 
> >> I think so, new userspace on pre-patch KVM is the most likely one.
> >> 
> >> Userspace cannot tell that KVM doesn't support the extension and
> >> behaving like on patched KVM would result in a failure with cryptic
> >> error message, because KVM only returns EINVAL.
> >>   
> > 
> > This is already the case with or without the patch... which only changes
> > things for PowerPC userspace.  
> 
> I guess that the error message from QEMU should be improved then ...

Definitely.

> The spec is quite clear that KVM is going to fail because of invalid
> vcpu_id.
> 

Agreed.

> x86 QEMU does this when the amount of CPUs doesn't fit APIC constraints:
>   # qemu-kvm -smp 160,cores=9
>   qemu-system-x86_64: max_cpus is too large. APIC ID of last CPU is 278
> 
> Not perfectly understandable to the uninitiated, but it gets the point
> across.
> 
> >                               And in the case of QEMU, we're already
> > violating the spec with the way we compute vcpu ids.  
> 
> The numbering strategy is valid.  KVM spec never said that numbering
> cannot be sparse/arbitrary, just that vcpu_id must not exceed MAX_VCPUS.
> 

You're right... it's just conflicting with the API when trying to run guests with
less threads per core than the host (the situation where we get higher vcpu ids).

This should probably be documented somewhere.

> >> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
> >> recognizing that the user wanted too much?
> >>   
> > 
> > No. The error is caught in generic code and QEMU exits for all archs.
> > 
> > And BTW, how would QEMU guess that vcpu id is too high ? I see at
> > least three paths that can return EINVAL...  
> 
> I agree, -EINVAL is the problem. :)
> (Returning -ERANGE would have made more sense.)
> 
> The userspace has to guess if it tries fails, but if it was aware of the
> actual limit, it could assume that the creation failed because of high
> vcpu_id and report it as a user-amendable error ...
> Userspace would better do some sanity checks beforehand instead of
> trying and failing, though.
> 

I'm convinced. :)

> >> >> Userspace also doesn't know the vcpu id limit anymore, and it might
> >> >> care.  What do you think about returning the arch-specific limit (or the
> >> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> >> >>     
> >> > 
> >> > This is partly true: only arch agnostic code would be lost.
> >> > 
> >> > Moreover this is a problem for powerpc only at the moment and userspace code
> >> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.    
> >> 
> >> How would that work on KVM without this patch?
> >>   
> > 
> > It doesn't work for PowerPC :)
> > 
> > KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
> > KVM_CAP_PPC_SMT indicates the host has 8 threads per core.
> > 
> > KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
> > it has vcpu_id == 128 * 8 == 1024.
> > 
> > Of course we can patch QEMU to restrict the maximum number of vcpus
> > to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
> > KVM for PowerPC is sized to run MAX_VCPUS... :\  
> 
> Yeah.  If we introduced the capability, then QEMU would restrict vcpu_id
> to MAX_VCPUS or MAX_VCPU_ID and the number of VCPUS always to MAX_VCPUS.
> 

Ok. I'll give a try.

Just to be sure I haven't missed something:
- change the spec to introduce the MAX_VCPU_ID concept
- update all related checks in KVM
- provide a KVM_CAP_MAX_VCPU_ID for userspace

> >> > For other architectures, it is simply KVM_MAX_VCPUS.    
> >> 
> >> (Other architectures would not implement the capability.)
> >>   
> > 
> > So this would be KVM_CAP_PPC_MAX_VCPU_ID ?  
> 
> No, need.  The capability could be generic, because the concept is
> arch-neutral (and it looks like x86 will use it too).
> 
> Unimplemented capabilities return 0, which is why other architectures
> don't need to implement it.  Nothing can break:  userspace on old KVM
> will get 0 for the capability, so it's going to behave as if we didn't
> have this patch and this patch made sure that nothing actually changed
> for other architectures.
> 
> >> >                                                                      maybe later
> >> > if we have other scenarios where vcpu ids need to cross the limit ?    
> >> 
> >> x86 is going to have that soon too -- vcpu_id will be able to range from
> >> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> >> won't be improved to actually scale, so MAX_CPUS will remain lower.
> >>   
> > 
> > Do you have some pointers to share so that we can see the broader picture ?  
> 
> At the moment, the most concrete one is x2APIC spec, sorry.
> 
> KVM currently supports only xAPIC, which has 8 bit APIC ID.
> (The x2APIC interface that you might see is paravirtualized.)
> 
> vcpu_id is APIC ID on x86 and x2APIC still has sparse allocation that
> encodes topology.  KVM is going to support standard x2APIC (when QEMU
> supports interrupt remapping), which opens a possibility where we'd end
> up in exactly the same situation where PowerPC is now -- architectural
> code would handle vcpu_id bigger than MAX_CPUS, but userspace couldn't
> make us of it, because of API limitations.
> 

Thanks for the clarification !

Cheers.

--
Greg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wei Yang April 23, 2016, 12:51 a.m. UTC | #12

On Fri, Apr 22, 2016 at 11:30:45AM +0200, Greg Kurz wrote:
>On Fri, 22 Apr 2016 17:21:03 +0800
>Wei Yang <richard.weiyang@huawei.com> wrote:
>
>> Hi, Greg
>> 
>
>Hi Wei !
>
>> One confusion.
>> 
>> There are 5 kvm_arch_vcpu_create() while in this patch you changed 2 of them.
>> Some particular reason?
>> 
>
>Yes and the reason is given in the changelog:
>- ARM and s390 already have such a check
>- PowerPC can live without this check
>- this patch simply moves the check to MIPS and x86
>
>Does it clarify ?
>

Sure, thanks :-)

>Cheers.
>
>--
>Greg
>
>> On Thu, Apr 21, 2016 at 04:20:53PM +0200, Greg Kurz wrote:
>> >Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>> >introduced a check to prevent potential kernel memory corruption in case
>> >the vcpu id is too great.
>> >
>> >Unfortunately this check assumes vcpu ids grow in sequence with a common
>> >difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>> >For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>> >mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>> >1024, guests may be limited down to 128 vcpus on POWER8.
>> >
>> >This means the check does not belong here and should be moved to some arch
>> >specific function: kvm_arch_vcpu_create() looks like a good candidate.
>> >
>> >ARM and s390 already have such a check.
>> >
>> >I could not spot any path in the PowerPC or common KVM code where a vcpu
>> >id is used as described in the above commit: I believe PowerPC can live
>> >without this check.
>> >
>> >In the end, this patch simply moves the check to MIPS and x86. While here,
>> >we also update the documentation to dissociate vcpu ids from the maximum
>> >number of vcpus per virtual machine.
>> >
>> >Acked-by: James Hogan <james.hogan@imgtec.com>
>> >Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> >Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>> >---
>> >v4: - updated subject for more clarity on what the patch does
>> >    - added James's and Connie's A-b tags
>> >    - updated documentation
>> >
>> > Documentation/virtual/kvm/api.txt |    7 +++----
>> > arch/mips/kvm/mips.c              |    7 ++++++-
>> > arch/x86/kvm/x86.c                |    3 +++
>> > virt/kvm/kvm_main.c               |    3 ---
>> > 4 files changed, 12 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> >index 4d0542c5206b..486a1d783b82 100644
>> >--- a/Documentation/virtual/kvm/api.txt
>> >+++ b/Documentation/virtual/kvm/api.txt
>> >@@ -199,11 +199,10 @@ Type: vm ioctl
>> > Parameters: vcpu id (apic id on x86)
>> > Returns: vcpu fd on success, -1 on error
>> > 
>> >-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
>> >-in the range [0, max_vcpus).
>> >+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
>> > 
>> >-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
>> >-the KVM_CHECK_EXTENSION ioctl() at run-time.
>> >+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
>> >+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
>> > The maximum possible value for max_vcpus can be retrieved using the
>> > KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
>> > 
>> >diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>> >index 70ef1a43c114..0278ea146db5 100644
>> >--- a/arch/mips/kvm/mips.c
>> >+++ b/arch/mips/kvm/mips.c
>> >@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>> > 	int err, size, offset;
>> > 	void *gebase;
>> > 	int i;
>> >+	struct kvm_vcpu *vcpu;
>> > 
>> >-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>> >+	if (id >= KVM_MAX_VCPUS) {
>> >+		err = -EINVAL;
>> >+		goto out;
>> >+	}
>> > 
>> >+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>> > 	if (!vcpu) {
>> > 		err = -ENOMEM;
>> > 		goto out;
>> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >index 9b7798c7b210..7738202edcce 100644
>> >--- a/arch/x86/kvm/x86.c
>> >+++ b/arch/x86/kvm/x86.c
>> >@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>> > {
>> > 	struct kvm_vcpu *vcpu;
>> > 
>> >+	if (id >= KVM_MAX_VCPUS)
>> >+		return ERR_PTR(-EINVAL);
>> >+
>> > 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
>> > 		printk_once(KERN_WARNING
>> > 		"kvm: SMP vm created on host with unstable TSC; "
>> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> >index 4fd482fb9260..6b6cca3cb488 100644
>> >--- a/virt/kvm/kvm_main.c
>> >+++ b/virt/kvm/kvm_main.c
>> >@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>> > 	int r;
>> > 	struct kvm_vcpu *vcpu;
>> > 
>> >-	if (id >= KVM_MAX_VCPUS)
>> >-		return -EINVAL;
>> >-
>> > 	vcpu = kvm_arch_vcpu_create(kvm, id);
>> > 	if (IS_ERR(vcpu))
>> > 		return PTR_ERR(vcpu);
>> >
>> >--
>> >To unsubscribe from this list: send the line "unsubscribe kvm" in
>> >the body of a message to majordomo@vger.kernel.org
>> >More majordomo info at  http://vger.kernel.org/majordomo-info.html  
>>

Radim Krčmář April 25, 2016, 2:15 p.m. UTC | #13

2016-04-22 16:50+0200, Greg Kurz:
> Just to be sure I haven't missed something:
> - change the spec to introduce the MAX_VCPU_ID concept
> - update all related checks in KVM
> - provide a KVM_CAP_MAX_VCPU_ID for userspace

That is it, thanks a lot!

(From nitpicks that come to my mind ... MAX_VCPU_ID should not be able
 to lower the VCPU_ID limit below MAX_VCPUS.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg Kurz April 25, 2016, 2:30 p.m. UTC | #14

On Mon, 25 Apr 2016 16:15:22 +0200
Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2016-04-22 16:50+0200, Greg Kurz:
> > Just to be sure I haven't missed something:
> > - change the spec to introduce the MAX_VCPU_ID concept
> > - update all related checks in KVM
> > - provide a KVM_CAP_MAX_VCPU_ID for userspace  
> 
> That is it, thanks a lot!
> 
> (From nitpicks that come to my mind ... MAX_VCPU_ID should not be able
>  to lower the VCPU_ID limit below MAX_VCPUS.)
> 

Indeed it shouldn't. Thanks for the tip !

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v4,2/2] KVM: move vcpu id checking to archs

Commit Message

Comments

Patch