Message ID | 146116689259.20666.15860134511726195550.stgit@bahia.huguette.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 20, 2016 at 05:44:54PM +0200, Greg Kurz wrote: > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > introduced a check to prevent potential kernel memory corruption in case > the vcpu id is too great. > > Unfortunately this check assumes vcpu ids grow in sequence with a common > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > 1024, guests may be limited down to 128 vcpus on POWER8. > > This means the check does not belong here and should be moved to some arch > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > ARM and s390 already have such a check. > > I could not spot any path in the PowerPC or common KVM code where a vcpu > id is used as described in the above commit: I believe PowerPC can live > without this check. > > In the end, this patch simply moves the check to MIPS and x86. > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > --- > v3: use ERR_PTR() > > arch/mips/kvm/mips.c | 7 ++++++- Acked-by: James Hogan <james.hogan@imgtec.com> [MIPS] Cheers James > arch/x86/kvm/x86.c | 3 +++ > virt/kvm/kvm_main.c | 3 --- > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > index 70ef1a43c114..0278ea146db5 100644 > --- a/arch/mips/kvm/mips.c > +++ b/arch/mips/kvm/mips.c > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) > int err, size, offset; > void *gebase; > int i; > + struct kvm_vcpu *vcpu; > > - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > + if (id >= KVM_MAX_VCPUS) { > + err = -EINVAL; > + goto out; > + } > > + vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > if (!vcpu) { > err = -ENOMEM; > goto out; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 9b7798c7b210..7738202edcce 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, > { > struct kvm_vcpu *vcpu; > > + if (id >= KVM_MAX_VCPUS) > + return ERR_PTR(-EINVAL); > + > if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0) > printk_once(KERN_WARNING > "kvm: SMP vm created on host with unstable TSC; " > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4fd482fb9260..6b6cca3cb488 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) > int r; > struct kvm_vcpu *vcpu; > > - if (id >= KVM_MAX_VCPUS) > - return -EINVAL; > - > vcpu = kvm_arch_vcpu_create(kvm, id); > if (IS_ERR(vcpu)) > return PTR_ERR(vcpu); >
On Wed, 20 Apr 2016 17:44:54 +0200 Greg Kurz <gkurz@linux.vnet.ibm.com> wrote: > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > introduced a check to prevent potential kernel memory corruption in case > the vcpu id is too great. > > Unfortunately this check assumes vcpu ids grow in sequence with a common > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > 1024, guests may be limited down to 128 vcpus on POWER8. > > This means the check does not belong here and should be moved to some arch > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > ARM and s390 already have such a check. > > I could not spot any path in the PowerPC or common KVM code where a vcpu > id is used as described in the above commit: I believe PowerPC can live > without this check. > > In the end, this patch simply moves the check to MIPS and x86. > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > --- > v3: use ERR_PTR() > > arch/mips/kvm/mips.c | 7 ++++++- > arch/x86/kvm/x86.c | 3 +++ > virt/kvm/kvm_main.c | 3 --- > 3 files changed, 9 insertions(+), 4 deletions(-) > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-20 17:44+0200, Greg Kurz: > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > introduced a check to prevent potential kernel memory corruption in case > the vcpu id is too great. > > Unfortunately this check assumes vcpu ids grow in sequence with a common > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > 1024, guests may be limited down to 128 vcpus on POWER8. > > This means the check does not belong here and should be moved to some arch > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > ARM and s390 already have such a check. > > I could not spot any path in the PowerPC or common KVM code where a vcpu > id is used as described in the above commit: I believe PowerPC can live > without this check. > > In the end, this patch simply moves the check to MIPS and x86. > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > --- > v3: use ERR_PTR() > > arch/mips/kvm/mips.c | 7 ++++++- > arch/x86/kvm/x86.c | 3 +++ > virt/kvm/kvm_main.c | 3 --- > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > index 70ef1a43c114..0278ea146db5 100644 > --- a/arch/mips/kvm/mips.c > +++ b/arch/mips/kvm/mips.c > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) > int err, size, offset; > void *gebase; > int i; > + struct kvm_vcpu *vcpu; > > - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > + if (id >= KVM_MAX_VCPUS) { > + err = -EINVAL; > + goto out; 'vcpu' looks undefined at this point, so kfree in 'out:' may bug. Just 'return ERR_PTR(-EINVAL)'? > + } > > + vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > if (!vcpu) { > err = -ENOMEM; > goto out; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote: > 2016-04-20 17:44+0200, Greg Kurz: > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > > introduced a check to prevent potential kernel memory corruption in case > > the vcpu id is too great. > > > > Unfortunately this check assumes vcpu ids grow in sequence with a common > > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > > 1024, guests may be limited down to 128 vcpus on POWER8. > > > > This means the check does not belong here and should be moved to some arch > > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > > > ARM and s390 already have such a check. > > > > I could not spot any path in the PowerPC or common KVM code where a vcpu > > id is used as described in the above commit: I believe PowerPC can live > > without this check. > > > > In the end, this patch simply moves the check to MIPS and x86. > > > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > > --- > > v3: use ERR_PTR() > > > > arch/mips/kvm/mips.c | 7 ++++++- > > arch/x86/kvm/x86.c | 3 +++ > > virt/kvm/kvm_main.c | 3 --- > > 3 files changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > > index 70ef1a43c114..0278ea146db5 100644 > > --- a/arch/mips/kvm/mips.c > > +++ b/arch/mips/kvm/mips.c > > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) > > int err, size, offset; > > void *gebase; > > int i; > > + struct kvm_vcpu *vcpu; > > > > - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > > + if (id >= KVM_MAX_VCPUS) { > > + err = -EINVAL; > > + goto out; > > 'vcpu' looks undefined at this point, so kfree in 'out:' may bug. Thats out_free_cpu I think? Cheers James > Just 'return ERR_PTR(-EINVAL)'? > > > + } > > > > + vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > > if (!vcpu) { > > err = -ENOMEM; > > goto out;
2016-04-20 18:09+0100, James Hogan: > On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote: >> 2016-04-20 17:44+0200, Greg Kurz: >> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c >> > index 70ef1a43c114..0278ea146db5 100644 >> > --- a/arch/mips/kvm/mips.c >> > +++ b/arch/mips/kvm/mips.c >> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) >> > int err, size, offset; >> > void *gebase; >> > int i; >> > + struct kvm_vcpu *vcpu; >> > >> > - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); >> > + if (id >= KVM_MAX_VCPUS) { >> > + err = -EINVAL; >> > + goto out; >> >> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug. > > Thats out_free_cpu I think? My bad, it is. Thank you! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2016 19:27:06 +0200 Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2016-04-20 18:09+0100, James Hogan: > > On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote: > >> 2016-04-20 17:44+0200, Greg Kurz: > >> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > >> > index 70ef1a43c114..0278ea146db5 100644 > >> > --- a/arch/mips/kvm/mips.c > >> > +++ b/arch/mips/kvm/mips.c > >> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) > >> > int err, size, offset; > >> > void *gebase; > >> > int i; > >> > + struct kvm_vcpu *vcpu; > >> > > >> > - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); > >> > + if (id >= KVM_MAX_VCPUS) { > >> > + err = -EINVAL; > >> > + goto out; > >> > >> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug. > > > > Thats out_free_cpu I think? > > My bad, it is. Thank you! > I kept the goto based construct because it was done this way for kzalloc(). but I agree that 'return ERR_PTR(-EINVAL)' may look more explicit. Worth a v4 ? Cheers. -- Greg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-20 17:44+0200, Greg Kurz: > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > introduced a check to prevent potential kernel memory corruption in case > the vcpu id is too great. > > Unfortunately this check assumes vcpu ids grow in sequence with a common > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > 1024, guests may be limited down to 128 vcpus on POWER8. > > This means the check does not belong here and should be moved to some arch > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > ARM and s390 already have such a check. > > I could not spot any path in the PowerPC or common KVM code where a vcpu > id is used as described in the above commit: I believe PowerPC can live > without this check. The only problematic path I see is kvm_get_vcpu_by_id(), which returns NULL for any id above KVM_MAX_VCPUS. kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for duplicate ids, so PowerPC could end up with many VCPUs of the same id. I'm not sure what could fail, but code doesn't expect this situation. Patching kvm_get_vcpu_by_id() is easy, though. Second issue is that Documentation/virtual/kvm/api.txt says 4.7 KVM_CREATE_VCPU [...] This API adds a vcpu to a virtual machine. The vcpu id is a small integer in the range [0, max_vcpus). so we'd remove those two lines and change the API too. The change would be somewhat backward compatible, but doesn't PowerPC use high vcpu_id just because KVM is lacking an API to set DT ID? x86 (APIC ID) is affected by this and ARM (MP ID) probably too. (Maybe it is time to decouple VCPU ID used in KVM interfaces from architecture dependent CPU ID that the guest uses ... Mostly for future architectures that won't fit into 32 bits, but clarity of the code could go up as well.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-20 19:53+0200, Greg Kurz: > On Wed, 20 Apr 2016 19:27:06 +0200 > Radim Kr?má? <rkrcmar@redhat.com> wrote: >> 2016-04-20 18:09+0100, James Hogan: >> > On Wed, Apr 20, 2016 at 07:02:10PM +0200, Radim Kr?má? wrote: >> >> 2016-04-20 17:44+0200, Greg Kurz: >> >> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c >> >> > index 70ef1a43c114..0278ea146db5 100644 >> >> > --- a/arch/mips/kvm/mips.c >> >> > +++ b/arch/mips/kvm/mips.c >> >> > @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) >> >> > int err, size, offset; >> >> > void *gebase; >> >> > int i; >> >> > + struct kvm_vcpu *vcpu; >> >> > >> >> > - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); >> >> > + if (id >= KVM_MAX_VCPUS) { >> >> > + err = -EINVAL; >> >> > + goto out; >> >> >> >> 'vcpu' looks undefined at this point, so kfree in 'out:' may bug. >> > >> > Thats out_free_cpu I think? >> >> My bad, it is. Thank you! >> > > I kept the goto based construct because it was done this way for kzalloc(). > but I agree that 'return ERR_PTR(-EINVAL)' may look more explicit. > > Worth a v4 ? No, it is consistent with kzalloc fault handling this way. I was going to queue it, but found an issue with kvm_get_vcpu_by_id() that would allow the guest to create multiple VCPUs with the same id, which led to an unfortunate discourse on KVM API. (Please see a new thread.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2016 20:29:09 +0200 Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2016-04-20 17:44+0200, Greg Kurz: > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > > introduced a check to prevent potential kernel memory corruption in case > > the vcpu id is too great. > > > > Unfortunately this check assumes vcpu ids grow in sequence with a common > > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > > 1024, guests may be limited down to 128 vcpus on POWER8. > > > > This means the check does not belong here and should be moved to some arch > > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > > > ARM and s390 already have such a check. > > > > I could not spot any path in the PowerPC or common KVM code where a vcpu > > id is used as described in the above commit: I believe PowerPC can live > > without this check. > > The only problematic path I see is kvm_get_vcpu_by_id(), which returns > NULL for any id above KVM_MAX_VCPUS. Oops my bad, I started to work on a 4.4 tree and I missed this check brought by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). But again, I believe the check is wrong there also: the changelog just mentions this is a fastpath for the usual case where "VCPU ids match the array index"... why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for > duplicate ids, so PowerPC could end up with many VCPUs of the same id. > I'm not sure what could fail, but code doesn't expect this situation. > Patching kvm_get_vcpu_by_id() is easy, though. > Something like this ? if (id < 0) return NULL; if (id < KVM_MAX_VCPUS) vcpu = kvm_get_vcpu(kvm, id); In the same patch ? > Second issue is that Documentation/virtual/kvm/api.txt says > 4.7 KVM_CREATE_VCPU > [...] > This API adds a vcpu to a virtual machine. The vcpu id is a small > integer in the range [0, max_vcpus). > Yeah and I find the meaning of max_vcpus is unclear. Here it is considered as a limit for vcpu id, but if you look at the code, KVM_MAX_VCPUS is also used as a limit for the number of vcpus: virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { > so we'd remove those two lines and change the API too. The change would > be somewhat backward compatible, but doesn't PowerPC use high vcpu_id > just because KVM is lacking an API to set DT ID? This is related to a limitation when running in book3s_hv mode with cpus that support SMT (multiple HW threads): all HW threads within a core cannot be running in different guests at the same time. We solve this by using a vcpu numbering scheme as follows: vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus that can be scheduled to run on the same real core. So, in the "worst" case where we want to run a guest with 1 thread/core and the host has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS. > x86 (APIC ID) is affected by this and ARM (MP ID) probably too. > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also patch kvm_get_vcpu_by_id() like suggested above. Depending on the platform, ARM can be limited to VGIC_V3_MAX_CPUS (== 255) or VGIC_V8_MAX_CPUS (== 8). I guess it won't be affected either. > (Maybe it is time to decouple VCPU ID used in KVM interfaces from > architecture dependent CPU ID that the guest uses ... Maybe... I did not get that far. > Mostly for future architectures that won't fit into 32 bits, but > clarity of the code could go up as well.) > Thanks for your remarks ! Cheers. -- Greg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 21 Apr 2016 13:29:58 +0200 Greg Kurz <gkurz@linux.vnet.ibm.com> wrote: > On Wed, 20 Apr 2016 20:29:09 +0200 > Radim Kr?má? <rkrcmar@redhat.com> wrote: > > > 2016-04-20 17:44+0200, Greg Kurz: > > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > > > introduced a check to prevent potential kernel memory corruption in case > > > the vcpu id is too great. > > > > > > Unfortunately this check assumes vcpu ids grow in sequence with a common > > > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > > > 1024, guests may be limited down to 128 vcpus on POWER8. > > > > > > This means the check does not belong here and should be moved to some arch > > > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > > > > > ARM and s390 already have such a check. > > > > > > I could not spot any path in the PowerPC or common KVM code where a vcpu > > > id is used as described in the above commit: I believe PowerPC can live > > > without this check. > > > > The only problematic path I see is kvm_get_vcpu_by_id(), which returns > > NULL for any id above KVM_MAX_VCPUS. > > Oops my bad, I started to work on a 4.4 tree and I missed this check brought > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). > > But again, I believe the check is wrong there also: the changelog just mentions > this is a fastpath for the usual case where "VCPU ids match the array index"... > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? Probably because noone considered power :) > > > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for > > duplicate ids, so PowerPC could end up with many VCPUs of the same id. > > I'm not sure what could fail, but code doesn't expect this situation. > > Patching kvm_get_vcpu_by_id() is easy, though. > > > > Something like this ? > > if (id < 0) > return NULL; > if (id < KVM_MAX_VCPUS) > vcpu = kvm_get_vcpu(kvm, id); > > In the same patch ? > > > Second issue is that Documentation/virtual/kvm/api.txt says > > 4.7 KVM_CREATE_VCPU > > [...] > > This API adds a vcpu to a virtual machine. The vcpu id is a small > > integer in the range [0, max_vcpus). > > > > Yeah and I find the meaning of max_vcpus is unclear. > > Here it is considered as a limit for vcpu id, but if you look at the code, > KVM_MAX_VCPUS is also used as a limit for the number of vcpus: > > virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { > > > so we'd remove those two lines and change the API too. The change would > > be somewhat backward compatible, but doesn't PowerPC use high vcpu_id > > just because KVM is lacking an API to set DT ID? > > This is related to a limitation when running in book3s_hv mode with cpus > that support SMT (multiple HW threads): all HW threads within a core > cannot be running in different guests at the same time. > > We solve this by using a vcpu numbering scheme as follows: > > vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest > > where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus > that can be scheduled to run on the same real core. > > So, in the "worst" case where we want to run a guest with 1 thread/core and the host > has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS. > > > x86 (APIC ID) is affected by this and ARM (MP ID) probably too. > > > > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also > patch kvm_get_vcpu_by_id() like suggested above. > > Depending on the platform, ARM can be limited to VGIC_V3_MAX_CPUS (== 255) or > VGIC_V8_MAX_CPUS (== 8). I guess it won't be affected either. For s390, it's either 64 (no esca) or 248 (esca). > > > (Maybe it is time to decouple VCPU ID used in KVM interfaces from > > architecture dependent CPU ID that the guest uses ... > > Maybe... I did not get that far. It seems that the various architectures are more different than I thought... wasn't aware of the complicated situation on power, for example. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 21 Apr 2016 14:26:19 +0200 Cornelia Huck <cornelia.huck@de.ibm.com> wrote: > On Thu, 21 Apr 2016 13:29:58 +0200 > Greg Kurz <gkurz@linux.vnet.ibm.com> wrote: > > > On Wed, 20 Apr 2016 20:29:09 +0200 > > Radim Kr?má? <rkrcmar@redhat.com> wrote: > > > > > 2016-04-20 17:44+0200, Greg Kurz: > > > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > > > > introduced a check to prevent potential kernel memory corruption in case > > > > the vcpu id is too great. > > > > > > > > Unfortunately this check assumes vcpu ids grow in sequence with a common > > > > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > > > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > > > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > > > > 1024, guests may be limited down to 128 vcpus on POWER8. > > > > > > > > This means the check does not belong here and should be moved to some arch > > > > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > > > > > > > ARM and s390 already have such a check. > > > > > > > > I could not spot any path in the PowerPC or common KVM code where a vcpu > > > > id is used as described in the above commit: I believe PowerPC can live > > > > without this check. > > > > > > The only problematic path I see is kvm_get_vcpu_by_id(), which returns > > > NULL for any id above KVM_MAX_VCPUS. > > > > Oops my bad, I started to work on a 4.4 tree and I missed this check brought > > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). > > > > But again, I believe the check is wrong there also: the changelog just mentions > > this is a fastpath for the usual case where "VCPU ids match the array index"... > > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? > > Probably because noone considered power :) > No surprise but the return path is a bit overkill anyway :) > > > > > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for > > > duplicate ids, so PowerPC could end up with many VCPUs of the same id. > > > I'm not sure what could fail, but code doesn't expect this situation. > > > Patching kvm_get_vcpu_by_id() is easy, though. > > > > > > > Something like this ? > > > > if (id < 0) > > return NULL; > > if (id < KVM_MAX_VCPUS) > > vcpu = kvm_get_vcpu(kvm, id); > > > > In the same patch ? > > > > > Second issue is that Documentation/virtual/kvm/api.txt says > > > 4.7 KVM_CREATE_VCPU > > > [...] > > > This API adds a vcpu to a virtual machine. The vcpu id is a small > > > integer in the range [0, max_vcpus). > > > > > > > Yeah and I find the meaning of max_vcpus is unclear. > > > > Here it is considered as a limit for vcpu id, but if you look at the code, > > KVM_MAX_VCPUS is also used as a limit for the number of vcpus: > > > > virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { > > > > > so we'd remove those two lines and change the API too. The change would > > > be somewhat backward compatible, but doesn't PowerPC use high vcpu_id > > > just because KVM is lacking an API to set DT ID? > > > > This is related to a limitation when running in book3s_hv mode with cpus > > that support SMT (multiple HW threads): all HW threads within a core > > cannot be running in different guests at the same time. > > > > We solve this by using a vcpu numbering scheme as follows: > > > > vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest > > > > where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus > > that can be scheduled to run on the same real core. > > > > So, in the "worst" case where we want to run a guest with 1 thread/core and the host > > has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS. > > > > > x86 (APIC ID) is affected by this and ARM (MP ID) probably too. > > > > > > > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also > > patch kvm_get_vcpu_by_id() like suggested above. > > > > Depending on the platform, ARM can be limited to VGIC_V3_MAX_CPUS (== 255) or > > VGIC_V8_MAX_CPUS (== 8). I guess it won't be affected either. > > For s390, it's either 64 (no esca) or 248 (esca). > And it is CONFIG_NR_CPUS for powerpc, i.e. 2048 per default on powernv. But the problem here is more: can we compare the number of vcpus with vcpu ids ? > > > > > (Maybe it is time to decouple VCPU ID used in KVM interfaces from > > > architecture dependent CPU ID that the guest uses ... > > > > Maybe... I did not get that far. > > It seems that the various architectures are more different than I > thought... wasn't aware of the complicated situation on power, for > example. Yeah, and I think moving these vcpu id checks to the archs allows to solve the problem and confine the complexity to the powerpc code. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, 20 Apr 2016 20:29:09 +0200 > Radim Kr?má? <rkrcmar@redhat.com> wrote: > > > 2016-04-20 17:44+0200, Greg Kurz: > > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > > > introduced a check to prevent potential kernel memory corruption in case > > > the vcpu id is too great. > > > > > > Unfortunately this check assumes vcpu ids grow in sequence with a common > > > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > > > 1024, guests may be limited down to 128 vcpus on POWER8. > > > > > > This means the check does not belong here and should be moved to some arch > > > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > > > > > ARM and s390 already have such a check. > > > > > > I could not spot any path in the PowerPC or common KVM code where a vcpu > > > id is used as described in the above commit: I believe PowerPC can live > > > without this check. > > > > The only problematic path I see is kvm_get_vcpu_by_id(), which returns > > NULL for any id above KVM_MAX_VCPUS. > > Oops my bad, I started to work on a 4.4 tree and I missed this check brought > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). > > But again, I believe the check is wrong there also: the changelog just mentions > this is a fastpath for the usual case where "VCPU ids match the array index"... > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? > > > kvm_vm_ioctl_create_vcpu() uses kvm_get_vcpu_by_id() to check for > > duplicate ids, so PowerPC could end up with many VCPUs of the same id. > > I'm not sure what could fail, but code doesn't expect this situation. > > Patching kvm_get_vcpu_by_id() is easy, though. > > > > Something like this ? > > if (id < 0) > return NULL; > if (id < KVM_MAX_VCPUS) > vcpu = kvm_get_vcpu(kvm, id); > > In the same patch ? So the heuristic would only trigger if id < KVM_MAX_VCPUS. By initializing vcpu to NULL this would work. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, 20 Apr 2016 17:44:54 +0200 > Greg Kurz <gkurz@linux.vnet.ibm.com> wrote: > > > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > > introduced a check to prevent potential kernel memory corruption in case > > the vcpu id is too great. > > > > Unfortunately this check assumes vcpu ids grow in sequence with a common > > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > > 1024, guests may be limited down to 128 vcpus on POWER8. > > > > This means the check does not belong here and should be moved to some arch > > specific function: kvm_arch_vcpu_create() looks like a good candidate. > > > > ARM and s390 already have such a check. > > > > I could not spot any path in the PowerPC or common KVM code where a vcpu > > id is used as described in the above commit: I believe PowerPC can live > > without this check. > > > > In the end, this patch simply moves the check to MIPS and x86. > > > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > > --- > > v3: use ERR_PTR() > > > > arch/mips/kvm/mips.c | 7 ++++++- > > arch/x86/kvm/x86.c | 3 +++ > > virt/kvm/kvm_main.c | 3 --- > > 3 files changed, 9 insertions(+), 4 deletions(-) > > > > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> > The existing s390 check looks sane to me, too! David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-21 13:29+0200, Greg Kurz: > On Wed, 20 Apr 2016 20:29:09 +0200 > Radim Kr?má? <rkrcmar@redhat.com> wrote: >> 2016-04-20 17:44+0200, Greg Kurz: >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") >> > introduced a check to prevent potential kernel memory corruption in case >> > the vcpu id is too great. >> > >> > Unfortunately this check assumes vcpu ids grow in sequence with a common >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit. >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is >> > 1024, guests may be limited down to 128 vcpus on POWER8. >> > >> > This means the check does not belong here and should be moved to some arch >> > specific function: kvm_arch_vcpu_create() looks like a good candidate. >> > >> > ARM and s390 already have such a check. >> > >> > I could not spot any path in the PowerPC or common KVM code where a vcpu >> > id is used as described in the above commit: I believe PowerPC can live >> > without this check. >> >> The only problematic path I see is kvm_get_vcpu_by_id(), which returns >> NULL for any id above KVM_MAX_VCPUS. > > Oops my bad, I started to work on a 4.4 tree and I missed this check brought > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). > > But again, I believe the check is wrong there also: the changelog just mentions > this is a fastpath for the usual case where "VCPU ids match the array index"... > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? (The patch had to check id >= KVM_MAX_VCPUS for sanity and there could not be a VCPU with that index according to the spec, so it made a shortcut to the correct NULL result ...) >> Second issue is that Documentation/virtual/kvm/api.txt says >> 4.7 KVM_CREATE_VCPU >> [...] >> This API adds a vcpu to a virtual machine. The vcpu id is a small >> integer in the range [0, max_vcpus). >> > > Yeah and I find the meaning of max_vcpus is unclear. > > Here it is considered as a limit for vcpu id, but if you look at the code, > KVM_MAX_VCPUS is also used as a limit for the number of vcpus: > > virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { I agree. Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make you think that online_vcpus limit interpretation is the correct one, but the code is conflicted. >> so we'd remove those two lines and change the API too. The change would >> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id >> just because KVM is lacking an API to set DT ID? > > This is related to a limitation when running in book3s_hv mode with cpus > that support SMT (multiple HW threads): all HW threads within a core > cannot be running in different guests at the same time. > > We solve this by using a vcpu numbering scheme as follows: > > vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest > > where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus > that can be scheduled to run on the same real core. > > So, in the "worst" case where we want to run a guest with 1 thread/core and the host > has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS. I see, thanks. Accommodating existing users seems like an acceptable excuse to change the API. >> x86 (APIC ID) is affected by this and ARM (MP ID) probably too. >> > > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also > patch kvm_get_vcpu_by_id() like suggested above. x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by reserving blocks of bits for socket/core/thread, so if core or thread count isn't a power of two, then the set of valid APIC IDs is sparse, but max id is still limited by 255, so the effective maximum VCPU count is lower. x86 doesn't support APIC ID over 255 yet, though, so this change wouldn't change a thing in practice. :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 21 Apr 2016 17:29:16 +0200 Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2016-04-21 13:29+0200, Greg Kurz: > > On Wed, 20 Apr 2016 20:29:09 +0200 > > Radim Kr?má? <rkrcmar@redhat.com> wrote: > >> 2016-04-20 17:44+0200, Greg Kurz: > >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") > >> > introduced a check to prevent potential kernel memory corruption in case > >> > the vcpu id is too great. > >> > > >> > Unfortunately this check assumes vcpu ids grow in sequence with a common > >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit. > >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv > >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is > >> > 1024, guests may be limited down to 128 vcpus on POWER8. > >> > > >> > This means the check does not belong here and should be moved to some arch > >> > specific function: kvm_arch_vcpu_create() looks like a good candidate. > >> > > >> > ARM and s390 already have such a check. > >> > > >> > I could not spot any path in the PowerPC or common KVM code where a vcpu > >> > id is used as described in the above commit: I believe PowerPC can live > >> > without this check. > >> > >> The only problematic path I see is kvm_get_vcpu_by_id(), which returns > >> NULL for any id above KVM_MAX_VCPUS. > > > > Oops my bad, I started to work on a 4.4 tree and I missed this check brought > > by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). > > > > But again, I believe the check is wrong there also: the changelog just mentions > > this is a fastpath for the usual case where "VCPU ids match the array index"... > > why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? > > (The patch had to check id >= KVM_MAX_VCPUS for sanity and there could > not be a VCPU with that index according to the spec, so it made a > shortcut to the correct NULL result ...) > With the spec in mind, you're right... the confusion comes from the fact that powerpc decided to use bigger vcpu ids a long time ago but nobody cared to document that. > >> Second issue is that Documentation/virtual/kvm/api.txt says > >> 4.7 KVM_CREATE_VCPU > >> [...] > >> This API adds a vcpu to a virtual machine. The vcpu id is a small > >> integer in the range [0, max_vcpus). > >> > > > > Yeah and I find the meaning of max_vcpus is unclear. > > > > Here it is considered as a limit for vcpu id, but if you look at the code, > > KVM_MAX_VCPUS is also used as a limit for the number of vcpus: > > > > virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { > > I agree. Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make > you think that online_vcpus limit interpretation is the correct one, but > the code is conflicted. > > >> so we'd remove those two lines and change the API too. The change would > >> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id > >> just because KVM is lacking an API to set DT ID? > > > > This is related to a limitation when running in book3s_hv mode with cpus > > that support SMT (multiple HW threads): all HW threads within a core > > cannot be running in different guests at the same time. > > > > We solve this by using a vcpu numbering scheme as follows: > > > > vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest > > > > where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus > > that can be scheduled to run on the same real core. > > > > So, in the "worst" case where we want to run a guest with 1 thread/core and the host > > has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS. > > I see, thanks. Accommodating existing users seems like an acceptable > excuse to change the API. > > >> x86 (APIC ID) is affected by this and ARM (MP ID) probably too. > >> > > > > x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also > > patch kvm_get_vcpu_by_id() like suggested above. > > x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by > reserving blocks of bits for socket/core/thread, so if core or thread > count isn't a power of two, then the set of valid APIC IDs is sparse, > but max id is still limited by 255, so the effective maximum VCPU count > is lower. > > x86 doesn't support APIC ID over 255 yet, though, so this change > wouldn't change a thing in practice. :) > Thanks for the details ! So we're good ? Whose tree can carry these patches ? Cheers. -- Greg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-21 17:49+0200, Greg Kurz: > So we're good ? I support the change, just had a nit about API design for v2. > Whose tree can carry these patches ? (PowerPC is the only immediately affected arch, so I'd it there.) What do you think is best? My experience in this regard is pretty low. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 21 Apr 2016 18:08:41 +0200 Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2016-04-21 17:49+0200, Greg Kurz: > > So we're good ? > > I support the change, just had a nit about API design for v2. > As I said in my other mail, I'm not sure we should do more... if that's okay for you and you still support the change, maybe you can give an Acked-by ? > > Whose tree can carry these patches ? > > (PowerPC is the only immediately affected arch, so I'd it there.) > > What do you think is best? My experience in this regard is pretty low. > Maybe Paolo's tree but I guess we'd need some more acks from x86, ARM and PowerPC :) KVM maintainers... Thanks anyway for your valuable help ! Cheers. -- Greg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-21 19:18+0200, Greg Kurz: > On Thu, 21 Apr 2016 18:08:41 +0200 > Radim Kr?má? <rkrcmar@redhat.com> wrote: >> 2016-04-21 17:49+0200, Greg Kurz: >> > So we're good ? >> >> I support the change, just had a nit about API design for v2. >> > > As I said in my other mail, I'm not sure we should do more... if > that's okay for you and you still support the change, maybe you > can give an Acked-by ? I'm evil when it comes to APIs, so bear it a bit longer. :) >> > Whose tree can carry these patches ? >> >> (PowerPC is the only immediately affected arch, so I'd it there.) >> >> What do you think is best? My experience in this regard is pretty low. >> > > Maybe Paolo's tree but I guess we'd need some more acks from x86, ARM and > PowerPC :) KVM maintainers... Ok. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 21 Apr 2016 19:39:31 +0200 Radim Kr?má? <rkrcmar@redhat.com> wrote: > 2016-04-21 19:18+0200, Greg Kurz: > > On Thu, 21 Apr 2016 18:08:41 +0200 > > Radim Kr?má? <rkrcmar@redhat.com> wrote: > >> 2016-04-21 17:49+0200, Greg Kurz: > >> > So we're good ? > >> > >> I support the change, just had a nit about API design for v2. > >> > > > > As I said in my other mail, I'm not sure we should do more... if > > that's okay for you and you still support the change, maybe you > > can give an Acked-by ? > > I'm evil when it comes to APIs, so bear it a bit longer. :) > Fair enough :) > >> > Whose tree can carry these patches ? > >> > >> (PowerPC is the only immediately affected arch, so I'd it there.) > >> > >> What do you think is best? My experience in this regard is pretty low. > >> > > > > Maybe Paolo's tree but I guess we'd need some more acks from x86, ARM and > > PowerPC :) KVM maintainers... > > Ok. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-21 23:29 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>: > 2016-04-21 13:29+0200, Greg Kurz: >> On Wed, 20 Apr 2016 20:29:09 +0200 >> Radim Kr?má? <rkrcmar@redhat.com> wrote: >>> 2016-04-20 17:44+0200, Greg Kurz: >>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") >>> > introduced a check to prevent potential kernel memory corruption in case >>> > the vcpu id is too great. >>> > >>> > Unfortunately this check assumes vcpu ids grow in sequence with a common >>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit. >>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv >>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is >>> > 1024, guests may be limited down to 128 vcpus on POWER8. >>> > >>> > This means the check does not belong here and should be moved to some arch >>> > specific function: kvm_arch_vcpu_create() looks like a good candidate. >>> > >>> > ARM and s390 already have such a check. >>> > >>> > I could not spot any path in the PowerPC or common KVM code where a vcpu >>> > id is used as described in the above commit: I believe PowerPC can live >>> > without this check. >>> >>> The only problematic path I see is kvm_get_vcpu_by_id(), which returns >>> NULL for any id above KVM_MAX_VCPUS. >> >> Oops my bad, I started to work on a 4.4 tree and I missed this check brought >> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id). >> >> But again, I believe the check is wrong there also: the changelog just mentions >> this is a fastpath for the usual case where "VCPU ids match the array index"... >> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ? > > (The patch had to check id >= KVM_MAX_VCPUS for sanity and there could > not be a VCPU with that index according to the spec, so it made a > shortcut to the correct NULL result ...) > >>> Second issue is that Documentation/virtual/kvm/api.txt says >>> 4.7 KVM_CREATE_VCPU >>> [...] >>> This API adds a vcpu to a virtual machine. The vcpu id is a small >>> integer in the range [0, max_vcpus). >>> >> >> Yeah and I find the meaning of max_vcpus is unclear. >> >> Here it is considered as a limit for vcpu id, but if you look at the code, >> KVM_MAX_VCPUS is also used as a limit for the number of vcpus: >> >> virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) { > > I agree. Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make > you think that online_vcpus limit interpretation is the correct one, but > the code is conflicted. > >>> so we'd remove those two lines and change the API too. The change would >>> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id >>> just because KVM is lacking an API to set DT ID? >> >> This is related to a limitation when running in book3s_hv mode with cpus >> that support SMT (multiple HW threads): all HW threads within a core >> cannot be running in different guests at the same time. >> >> We solve this by using a vcpu numbering scheme as follows: >> >> vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest >> >> where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus >> that can be scheduled to run on the same real core. >> >> So, in the "worst" case where we want to run a guest with 1 thread/core and the host >> has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS. > > I see, thanks. Accommodating existing users seems like an acceptable > excuse to change the API. > >>> x86 (APIC ID) is affected by this and ARM (MP ID) probably too. >>> >> >> x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also >> patch kvm_get_vcpu_by_id() like suggested above. > > x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by > reserving blocks of bits for socket/core/thread, so if core or thread > count isn't a power of two, then the set of valid APIC IDs is sparse, ^^^^^^^^^^^^^^^^^^^ ^^^^^^^ Is this the root reason why recommand max vCPUs per vm is 160 and the KVM_MAX_VCPUS is 255 instead of due to perforamnce concern? Regards, Wanpeng Li > but max id is still limited by 255, so the effective maximum VCPU count > is lower. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-22 09:40+0800, Wanpeng Li: > 2016-04-21 23:29 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>: >> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by >> reserving blocks of bits for socket/core/thread, so if core or thread >> count isn't a power of two, then the set of valid APIC IDs is sparse, > > ^^^^^^^^^^^^^^^^^^^ > ^^^^^^^ > Is this the root reason why recommand max vCPUs per vm is 160 and the > KVM_MAX_VCPUS is 255 instead of due to perforamnce concern? No, the recommended amout of VCPUs is 160 because I didn't bump it after PLE stopped killing big guests. :/ You can get full 255 VCPU guest with a proper configuration, e.g. "-smp 255" or "-smp 255,cores=8" and the only problem is scalability, but I don't know of anything that doesn't scale to that point. (Scaling up to 2^32 is harder, because you don't want O(N) search, nor full allocation on smaller guests. Neither is a big problem now.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-22 21:07 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>: > 2016-04-22 09:40+0800, Wanpeng Li: >> 2016-04-21 23:29 GMT+08:00 Radim Kr?má? <rkrcmar@redhat.com>: >>> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by >>> reserving blocks of bits for socket/core/thread, so if core or thread >>> count isn't a power of two, then the set of valid APIC IDs is sparse, >> >> ^^^^^^^^^^^^^^^^^^^ >> ^^^^^^^ >> Is this the root reason why recommand max vCPUs per vm is 160 and the >> KVM_MAX_VCPUS is 255 instead of due to perforamnce concern? > > No, the recommended amout of VCPUs is 160 because I didn't bump it after > PLE stopped killing big guests. :/ > > You can get full 255 VCPU guest with a proper configuration, e.g. > "-smp 255" or "-smp 255,cores=8" and the only problem is scalability, > but I don't know of anything that doesn't scale to that point. > > (Scaling up to 2^32 is harder, because you don't want O(N) search, nor > full allocation on smaller guests. Neither is a big problem now.) I see, thanks Radim. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 70ef1a43c114..0278ea146db5 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) int err, size, offset; void *gebase; int i; + struct kvm_vcpu *vcpu; - struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); + if (id >= KVM_MAX_VCPUS) { + err = -EINVAL; + goto out; + } + vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL); if (!vcpu) { err = -ENOMEM; goto out; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9b7798c7b210..7738202edcce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, { struct kvm_vcpu *vcpu; + if (id >= KVM_MAX_VCPUS) + return ERR_PTR(-EINVAL); + if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0) printk_once(KERN_WARNING "kvm: SMP vm created on host with unstable TSC; " diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4fd482fb9260..6b6cca3cb488 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) int r; struct kvm_vcpu *vcpu; - if (id >= KVM_MAX_VCPUS) - return -EINVAL; - vcpu = kvm_arch_vcpu_create(kvm, id); if (IS_ERR(vcpu)) return PTR_ERR(vcpu);
Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)") introduced a check to prevent potential kernel memory corruption in case the vcpu id is too great. Unfortunately this check assumes vcpu ids grow in sequence with a common difference of 1, which is wrong: archs are free to use vcpu id as they fit. For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is 1024, guests may be limited down to 128 vcpus on POWER8. This means the check does not belong here and should be moved to some arch specific function: kvm_arch_vcpu_create() looks like a good candidate. ARM and s390 already have such a check. I could not spot any path in the PowerPC or common KVM code where a vcpu id is used as described in the above commit: I believe PowerPC can live without this check. In the end, this patch simply moves the check to MIPS and x86. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> --- v3: use ERR_PTR() arch/mips/kvm/mips.c | 7 ++++++- arch/x86/kvm/x86.c | 3 +++ virt/kvm/kvm_main.c | 3 --- 3 files changed, 9 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html