KVM/x86: Increase max vcpu number to 352

Message ID	1502359259-24966-1-git-send-email-tianyu.lan@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Lan Tianyu <tianyu.lan@intel.com> Cc: Lan Tianyu <tianyu.lan@intel.com>, pbonzini@redhat.com, rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] KVM/x86: Increase max vcpu number to 352 Date: Thu, 10 Aug 2017 18:00:59 +0800 Message-Id: <1502359259-24966-1-git-send-email-tianyu.lan@intel.com> To: unlisted-recipients:; (no To-header on input) Sender: kvm-owner@vger.kernel.org Precedence: bulk

lan,Tianyu Aug. 10, 2017, 10 a.m. UTC

Intel Xeon phi chip will support 352 logical threads. For HPC usage
case, it will create a huge VM with vcpu number as same as host cpus. This
patch is to increase max vcpu number to 352.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

lan,Tianyu Aug. 11, 2017, 7:49 a.m. UTC | #1

Hi Konrad:
	Thanks for your review.

On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
>> case, it will create a huge VM with vcpu number as same as host cpus. This
>> patch is to increase max vcpu number to 352.
> 
> Why not 1024 or 4096?

This is on demand. We can set a higher number since KVM already has
x2apic and vIOMMU interrupt remapping support.

> 
> Are there any issues with increasing the value from 288 to 352 right now?

No found.

> 
> Also perhaps this should be made in an Kconfig entry?

That will be anther option but I find different platforms will define
different MAX_VCPU. If we introduce a generic Kconfig entry, different
platforms should have different range.

Radim & Paolo, Could you give some input? In qemu thread, we will set
max vcpu to 8192 for x86 VM. In KVM, The length of vcpu pointer array in
struct kvm and dest_vcpu_bitmap in kvm_irq_delivery_to_apic() are
specified by KVM_MAX_VCPUS. Should we keep align with Qemu?

David Hildenbrand Aug. 11, 2017, 8:11 a.m. UTC | #2

On 11.08.2017 09:49, Lan Tianyu wrote:
> Hi Konrad:
> 	Thanks for your review.
> 
> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
>>> case, it will create a huge VM with vcpu number as same as host cpus. This
>>> patch is to increase max vcpu number to 352.
>>
>> Why not 1024 or 4096?
> 
> This is on demand. We can set a higher number since KVM already has
> x2apic and vIOMMU interrupt remapping support.
> 
>>
>> Are there any issues with increasing the value from 288 to 352 right now?
> 
> No found.
> 
>>
>> Also perhaps this should be made in an Kconfig entry?
> 
> That will be anther option but I find different platforms will define
> different MAX_VCPU. If we introduce a generic Kconfig entry, different
> platforms should have different range.
> 
> Radim & Paolo, Could you give some input? In qemu thread, we will set
> max vcpu to 8192 for x86 VM. In KVM, The length of vcpu pointer array in
> struct kvm and dest_vcpu_bitmap in kvm_irq_delivery_to_apic() are
> specified by KVM_MAX_VCPUS. Should we keep align with Qemu?
> 

commit 682f732ecf7396e9d6fe24d44738966699fae6c0
Author: Radim Krčmář <rkrcmar@redhat.com>
Date:   Tue Jul 12 22:09:29 2016 +0200

    KVM: x86: bump MAX_VCPUS to 288

    288 is in high demand because of Knights Landing CPU.
    We cannot set the limit to 640k, because that would be wasting space.

I think we want to keep it small as long as possible. I remember a patch
series from Radim which would dynamically allocate memory for these
arrays (using a new VM creation ioctl, specifying the max # of vcpus).
Wonder what happened to that (I remember requesting a simply remalloc
instead of a new VM creation ioctl :] ).

Radim Krčmář Aug. 11, 2017, 1 p.m. UTC | #3

2017-08-11 10:11+0200, David Hildenbrand:
> On 11.08.2017 09:49, Lan Tianyu wrote:
>> Hi Konrad:
>> 	Thanks for your review.
>> 
>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
>>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
>>>> case, it will create a huge VM with vcpu number as same as host cpus. This
>>>> patch is to increase max vcpu number to 352.
>>>
>>> Why not 1024 or 4096?
>> 
>> This is on demand. We can set a higher number since KVM already has
>> x2apic and vIOMMU interrupt remapping support.
>> 
>>>
>>> Are there any issues with increasing the value from 288 to 352 right now?
>> 
>> No found.

Yeah, the only issue until around 2^20 (when we reach the maximum of
logical x2APIC addressing) should be the size of per-VM arrays when only
few VCPUs are going to be used.

>>> Also perhaps this should be made in an Kconfig entry?
>> 
>> That will be anther option but I find different platforms will define
>> different MAX_VCPU. If we introduce a generic Kconfig entry, different
>> platforms should have different range.
>> 
>> Radim & Paolo, Could you give some input? In qemu thread, we will set
>> max vcpu to 8192 for x86 VM. In KVM, The length of vcpu pointer array in
>> struct kvm and dest_vcpu_bitmap in kvm_irq_delivery_to_apic() are
>> specified by KVM_MAX_VCPUS. Should we keep align with Qemu?

That would be great.

> commit 682f732ecf7396e9d6fe24d44738966699fae6c0
> Author: Radim Krčmář <rkrcmar@redhat.com>
> Date:   Tue Jul 12 22:09:29 2016 +0200
> 
>     KVM: x86: bump MAX_VCPUS to 288
> 
>     288 is in high demand because of Knights Landing CPU.
>     We cannot set the limit to 640k, because that would be wasting space.
> 
> I think we want to keep it small as long as possible. I remember a patch
> series from Radim which would dynamically allocate memory for these
> arrays (using a new VM creation ioctl, specifying the max # of vcpus).
> Wonder what happened to that (I remember requesting a simply remalloc
> instead of a new VM creation ioctl :] ).

Eh, I forgot about them ...  I didn't like the dynamic allocation as we
would need to protect the memory, which would result in a much bigger
changeset, or fragile macros.

I can't recall the disgust now, so I'll send a RFC with the dynamic
version to see how it turned out.

Thanks.

Konrad Rzeszutek Wilk Aug. 11, 2017, 7:35 p.m. UTC | #4

On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
> 2017-08-11 10:11+0200, David Hildenbrand:
> > On 11.08.2017 09:49, Lan Tianyu wrote:
> >> Hi Konrad:
> >> 	Thanks for your review.
> >> 
> >> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
> >>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
> >>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
> >>>> case, it will create a huge VM with vcpu number as same as host cpus. This
> >>>> patch is to increase max vcpu number to 352.
> >>>
> >>> Why not 1024 or 4096?
> >> 
> >> This is on demand. We can set a higher number since KVM already has
> >> x2apic and vIOMMU interrupt remapping support.
> >> 
> >>>
> >>> Are there any issues with increasing the value from 288 to 352 right now?
> >> 
> >> No found.
> 
> Yeah, the only issue until around 2^20 (when we reach the maximum of
> logical x2APIC addressing) should be the size of per-VM arrays when only
> few VCPUs are going to be used.

Migration with 352 CPUs all being busy dirtying memory and also poking
at various I/O ports (say all of them dirtying the VGA) is no problem?


> 
> >>> Also perhaps this should be made in an Kconfig entry?
> >> 
> >> That will be anther option but I find different platforms will define
> >> different MAX_VCPU. If we introduce a generic Kconfig entry, different
> >> platforms should have different range.


By different platforms you mean q35 vs the older one, and such?
Not whether the underlaying accelerator is tcg, Xen, KVM, or bHyve?

What I was trying to understand whether it makes even sense for
the platforms to have such limits in the first place - and instead the
accelerators should be the ones setting it?


> >> 
> >> Radim & Paolo, Could you give some input? In qemu thread, we will set
> >> max vcpu to 8192 for x86 VM. In KVM, The length of vcpu pointer array in
> >> struct kvm and dest_vcpu_bitmap in kvm_irq_delivery_to_apic() are
> >> specified by KVM_MAX_VCPUS. Should we keep align with Qemu?
> 
> That would be great.
> 
> > commit 682f732ecf7396e9d6fe24d44738966699fae6c0
> > Author: Radim Krčmář <rkrcmar@redhat.com>
> > Date:   Tue Jul 12 22:09:29 2016 +0200
> > 
> >     KVM: x86: bump MAX_VCPUS to 288
> > 
> >     288 is in high demand because of Knights Landing CPU.
> >     We cannot set the limit to 640k, because that would be wasting space.
> > 
> > I think we want to keep it small as long as possible. I remember a patch
> > series from Radim which would dynamically allocate memory for these
> > arrays (using a new VM creation ioctl, specifying the max # of vcpus).
> > Wonder what happened to that (I remember requesting a simply remalloc
> > instead of a new VM creation ioctl :] ).
> 
> Eh, I forgot about them ...  I didn't like the dynamic allocation as we
> would need to protect the memory, which would result in a much bigger
> changeset, or fragile macros.
> 
> I can't recall the disgust now, so I'll send a RFC with the dynamic
> version to see how it turned out.
> 
> Thanks.

Denys Vlasenko Aug. 11, 2017, 10:47 p.m. UTC | #5

On Thu, Aug 10, 2017 at 12:00 PM, Lan Tianyu <tianyu.lan@intel.com> wrote:
> Intel Xeon phi chip will support 352 logical threads. For HPC usage
> case, it will create a huge VM with vcpu number as same as host cpus. This
> patch is to increase max vcpu number to 352.

This number was bumped in the past to 288 to accommodate Knights Landing,
but KNL's max designed thread number is actually 304: the on-die
interconnect mesh is 6*7, with four cells taken for interconnect
and memory controllers, there are 38 CPU cells.

Each CPU cell has two cores with shared L2.
Each core is SMT4. 38*8 = 304.

Intel fuses two cells (or more), so 288 is the largest number of threads
on a KNL you can buy, but 304 thread KNLs most probably also exist
(however they may be rather rare since they require completely
defect-free die).

I think it's better if Linux would support those too.

What is the design maximum for these new "nominally 352 thread" Xeon Phis
which are "nominally 352 thread"? 360? (If the mesh is 7*7 and the same
4 cells are taked for non-CPU needs)

lan,Tianyu Aug. 15, 2017, 3 a.m. UTC | #6

On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
>> 2017-08-11 10:11+0200, David Hildenbrand:
>>> On 11.08.2017 09:49, Lan Tianyu wrote:
>>>> Hi Konrad:
>>>> 	Thanks for your review.
>>>>
>>>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
>>>>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
>>>>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
>>>>>> case, it will create a huge VM with vcpu number as same as host cpus. This
>>>>>> patch is to increase max vcpu number to 352.
>>>>>
>>>>> Why not 1024 or 4096?
>>>>
>>>> This is on demand. We can set a higher number since KVM already has
>>>> x2apic and vIOMMU interrupt remapping support.
>>>>
>>>>>
>>>>> Are there any issues with increasing the value from 288 to 352 right now?
>>>>
>>>> No found.
>>
>> Yeah, the only issue until around 2^20 (when we reach the maximum of
>> logical x2APIC addressing) should be the size of per-VM arrays when only
>> few VCPUs are going to be used.
> 
> Migration with 352 CPUs all being busy dirtying memory and also poking
> at various I/O ports (say all of them dirtying the VGA) is no problem?

This depends on what kind of workload is running during migration. I
think this may affect service down time since there maybe a lot of dirty
memory data to transfer after stopping vcpus. This also depends on how
user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
will break migration function.

> 
> 
>>
>>>>> Also perhaps this should be made in an Kconfig entry?
>>>>
>>>> That will be anther option but I find different platforms will define
>>>> different MAX_VCPU. If we introduce a generic Kconfig entry, different
>>>> platforms should have different range.
> 
> 
> By different platforms you mean q35 vs the older one, and such?

I meant x86, arm, sparc and other vendors' code define different max
vcpu number.

> Not whether the underlaying accelerator is tcg, Xen, KVM, or bHyve?
> 
> What I was trying to understand whether it makes even sense for
> the platforms to have such limits in the first place - and instead the
> accelerators should be the ones setting it?
> 
> 
>>>>
>>>> Radim & Paolo, Could you give some input? In qemu thread, we will set
>>>> max vcpu to 8192 for x86 VM. In KVM, The length of vcpu pointer array in
>>>> struct kvm and dest_vcpu_bitmap in kvm_irq_delivery_to_apic() are
>>>> specified by KVM_MAX_VCPUS. Should we keep align with Qemu?
>>
>> That would be great.
>>
>>> commit 682f732ecf7396e9d6fe24d44738966699fae6c0
>>> Author: Radim Krčmář <rkrcmar@redhat.com>
>>> Date:   Tue Jul 12 22:09:29 2016 +0200
>>>
>>>     KVM: x86: bump MAX_VCPUS to 288
>>>
>>>     288 is in high demand because of Knights Landing CPU.
>>>     We cannot set the limit to 640k, because that would be wasting space.
>>>
>>> I think we want to keep it small as long as possible. I remember a patch
>>> series from Radim which would dynamically allocate memory for these
>>> arrays (using a new VM creation ioctl, specifying the max # of vcpus).
>>> Wonder what happened to that (I remember requesting a simply remalloc
>>> instead of a new VM creation ioctl :] ).
>>
>> Eh, I forgot about them ...  I didn't like the dynamic allocation as we
>> would need to protect the memory, which would result in a much bigger
>> changeset, or fragile macros.
>>
>> I can't recall the disgust now, so I'll send a RFC with the dynamic
>> version to see how it turned out.
>>
>> Thanks.

Konrad Rzeszutek Wilk Aug. 15, 2017, 2:10 p.m. UTC | #7

On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> > On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
> >> 2017-08-11 10:11+0200, David Hildenbrand:
> >>> On 11.08.2017 09:49, Lan Tianyu wrote:
> >>>> Hi Konrad:
> >>>> 	Thanks for your review.
> >>>>
> >>>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
> >>>>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
> >>>>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
> >>>>>> case, it will create a huge VM with vcpu number as same as host cpus. This
> >>>>>> patch is to increase max vcpu number to 352.
> >>>>>
> >>>>> Why not 1024 or 4096?
> >>>>
> >>>> This is on demand. We can set a higher number since KVM already has
> >>>> x2apic and vIOMMU interrupt remapping support.
> >>>>
> >>>>>
> >>>>> Are there any issues with increasing the value from 288 to 352 right now?
> >>>>
> >>>> No found.
> >>
> >> Yeah, the only issue until around 2^20 (when we reach the maximum of
> >> logical x2APIC addressing) should be the size of per-VM arrays when only
> >> few VCPUs are going to be used.
> > 
> > Migration with 352 CPUs all being busy dirtying memory and also poking
> > at various I/O ports (say all of them dirtying the VGA) is no problem?
> 
> This depends on what kind of workload is running during migration. I
> think this may affect service down time since there maybe a lot of dirty
> memory data to transfer after stopping vcpus. This also depends on how
> user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> will break migration function.

OK, so let me take a step back.

I see this nice 'supported' CPU count that is exposed in kvm module.

Then there is QEMU throwing out a warning if you crank up the CPU count
above that number.

Red Hat's web-pages talk about CPU count as well.

And I am assuming all of those are around what has been tested and
what has shown to work. And one of those test-cases surely must
be migration.

Ergo, if the vCPU count increase will break migration, then it is
a regression.

Or a fix/work needs to be done to support a higher CPU count for
migrating?


Is my understanding incorrect?

> 
> > 
> > 
> >>
> >>>>> Also perhaps this should be made in an Kconfig entry?
> >>>>
> >>>> That will be anther option but I find different platforms will define
> >>>> different MAX_VCPU. If we introduce a generic Kconfig entry, different
> >>>> platforms should have different range.
> > 
> > 
> > By different platforms you mean q35 vs the older one, and such?
> 
> I meant x86, arm, sparc and other vendors' code define different max
> vcpu number.

Right, and?
> 
> > Not whether the underlaying accelerator is tcg, Xen, KVM, or bHyve?
> > 
> > What I was trying to understand whether it makes even sense for
> > the platforms to have such limits in the first place - and instead the
> > accelerators should be the ones setting it?
> > 
> > 
> >>>>
> >>>> Radim & Paolo, Could you give some input? In qemu thread, we will set
> >>>> max vcpu to 8192 for x86 VM. In KVM, The length of vcpu pointer array in
> >>>> struct kvm and dest_vcpu_bitmap in kvm_irq_delivery_to_apic() are
> >>>> specified by KVM_MAX_VCPUS. Should we keep align with Qemu?
> >>
> >> That would be great.
> >>
> >>> commit 682f732ecf7396e9d6fe24d44738966699fae6c0
> >>> Author: Radim Krčmář <rkrcmar@redhat.com>
> >>> Date:   Tue Jul 12 22:09:29 2016 +0200
> >>>
> >>>     KVM: x86: bump MAX_VCPUS to 288
> >>>
> >>>     288 is in high demand because of Knights Landing CPU.
> >>>     We cannot set the limit to 640k, because that would be wasting space.
> >>>
> >>> I think we want to keep it small as long as possible. I remember a patch
> >>> series from Radim which would dynamically allocate memory for these
> >>> arrays (using a new VM creation ioctl, specifying the max # of vcpus).
> >>> Wonder what happened to that (I remember requesting a simply remalloc
> >>> instead of a new VM creation ioctl :] ).
> >>
> >> Eh, I forgot about them ...  I didn't like the dynamic allocation as we
> >> would need to protect the memory, which would result in a much bigger
> >> changeset, or fragile macros.
> >>
> >> I can't recall the disgust now, so I'll send a RFC with the dynamic
> >> version to see how it turned out.
> >>
> >> Thanks.
> 
> 
> -- 
> Best regards
> Tianyu Lan

Radim Krčmář Aug. 15, 2017, 2:20 p.m. UTC | #8

2017-08-15 11:00+0800, Lan Tianyu:
> On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
>> On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
>>> 2017-08-11 10:11+0200, David Hildenbrand:
>>>> On 11.08.2017 09:49, Lan Tianyu wrote:
>>>>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
>>>>>> Are there any issues with increasing the value from 288 to 352 right now?
>>>>>
>>>>> No found.
>>>
>>> Yeah, the only issue until around 2^20 (when we reach the maximum of
>>> logical x2APIC addressing) should be the size of per-VM arrays when only
>>> few VCPUs are going to be used.

(I was talking only about the KVM side.)

>> Migration with 352 CPUs all being busy dirtying memory and also poking
>> at various I/O ports (say all of them dirtying the VGA) is no problem?
> 
> This depends on what kind of workload is running during migration. I
> think this may affect service down time since there maybe a lot of dirty
> memory data to transfer after stopping vcpus. This also depends on how
> user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> will break migration function.

Utilizing post-copy in the last migration phase should make migration of
busy big guests possible.  (I agree that pre-copy in not going to be
feasible.)

Radim Krčmář Aug. 15, 2017, 4:13 p.m. UTC | #9

(Missed this mail before my last reply.)

2017-08-15 10:10-0400, Konrad Rzeszutek Wilk:
> On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> > On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> > > Migration with 352 CPUs all being busy dirtying memory and also poking
> > > at various I/O ports (say all of them dirtying the VGA) is no problem?
> > 
> > This depends on what kind of workload is running during migration. I
> > think this may affect service down time since there maybe a lot of dirty
> > memory data to transfer after stopping vcpus. This also depends on how
> > user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> > will break migration function.
> 
> OK, so let me take a step back.
> 
> I see this nice 'supported' CPU count that is exposed in kvm module.
> 
> Then there is QEMU throwing out a warning if you crank up the CPU count
> above that number.

I find the range between "recommended max" and "hard max" VCPU count
confusing at best ... IIUC, it was there because KVM internals had
problems with scaling and we will hit more in the future because some
loops still are linear on VCPU count.

The exposed value doesn't say whether migration will work, because that
is a userspace thing and we're not aware of bottlenecks on the KVM side.

> Red Hat's web-pages talk about CPU count as well.
> 
> And I am assuming all of those are around what has been tested and
> what has shown to work. And one of those test-cases surely must
> be migration.

Right, Red Hat will only allow/support what it has tested, even if
upstream has a practically unlimited count.  I think the upstream number
used to be raised by Red Hat, which is why upstream isn't at the hard
implementation limit ...

> Ergo, if the vCPU count increase will break migration, then it is
> a regression.

Raising the limit would not break existing guests, but I would rather
avoid adding higher VCPU count as a feature that disables migration.

> Or a fix/work needs to be done to support a higher CPU count for
> migrating?

Post-copy migration should handle higher CPU count and it is the default
fallback on QEMU.  Asking the question on a userspace list would yield
better answers, though.

Thanks.

lan,Tianyu Aug. 16, 2017, 3:07 a.m. UTC | #10

On 2017年08月15日 22:10, Konrad Rzeszutek Wilk wrote:
> On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
>> On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
>>>> 2017-08-11 10:11+0200, David Hildenbrand:
>>>>> On 11.08.2017 09:49, Lan Tianyu wrote:
>>>>>> Hi Konrad:
>>>>>> 	Thanks for your review.
>>>>>>
>>>>>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
>>>>>>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
>>>>>>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
>>>>>>>> case, it will create a huge VM with vcpu number as same as host cpus. This
>>>>>>>> patch is to increase max vcpu number to 352.
>>>>>>>
>>>>>>> Why not 1024 or 4096?
>>>>>>
>>>>>> This is on demand. We can set a higher number since KVM already has
>>>>>> x2apic and vIOMMU interrupt remapping support.
>>>>>>
>>>>>>>
>>>>>>> Are there any issues with increasing the value from 288 to 352 right now?
>>>>>>
>>>>>> No found.
>>>>
>>>> Yeah, the only issue until around 2^20 (when we reach the maximum of
>>>> logical x2APIC addressing) should be the size of per-VM arrays when only
>>>> few VCPUs are going to be used.
>>>
>>> Migration with 352 CPUs all being busy dirtying memory and also poking
>>> at various I/O ports (say all of them dirtying the VGA) is no problem?
>>
>> This depends on what kind of workload is running during migration. I
>> think this may affect service down time since there maybe a lot of dirty
>> memory data to transfer after stopping vcpus. This also depends on how
>> user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
>> will break migration function.
> 
> OK, so let me take a step back.
> 
> I see this nice 'supported' CPU count that is exposed in kvm module.
> 
> Then there is QEMU throwing out a warning if you crank up the CPU count
> above that number.
> 
> Red Hat's web-pages talk about CPU count as well.
> 
> And I am assuming all of those are around what has been tested and
> what has shown to work. And one of those test-cases surely must
> be migration.
> 

Sorry. This is a typo. I originally meant increasing vcpu shouldn't
break migration function and just affect service downtime. If there was
such issue, we should fix it.


> Ergo, if the vCPU count increase will break migration, then it is
> a regression.
> 
> Or a fix/work needs to be done to support a higher CPU count for
> migrating?
> 
> 
> Is my understanding incorrect?

You are right.

> 
>>
>>>
>>>
>>>>
>>>>>>> Also perhaps this should be made in an Kconfig entry?
>>>>>>
>>>>>> That will be anther option but I find different platforms will define
>>>>>> different MAX_VCPU. If we introduce a generic Kconfig entry, different
>>>>>> platforms should have different range.
>>>
>>>
>>> By different platforms you mean q35 vs the older one, and such?
>>
>> I meant x86, arm, sparc and other vendors' code define different max
>> vcpu number.
> 
> Right, and?

If we introduce a general kconfig of max vcpus for all vendors, it
should have different max vcpu range for different vendor.

Konrad Rzeszutek Wilk Aug. 18, 2017, 1:57 p.m. UTC | #11

On Tue, Aug 15, 2017 at 06:13:29PM +0200, Radim Krčmář wrote:
> (Missed this mail before my last reply.)
> 
> 2017-08-15 10:10-0400, Konrad Rzeszutek Wilk:
> > On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> > > On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> > > > Migration with 352 CPUs all being busy dirtying memory and also poking
> > > > at various I/O ports (say all of them dirtying the VGA) is no problem?
> > > 
> > > This depends on what kind of workload is running during migration. I
> > > think this may affect service down time since there maybe a lot of dirty
> > > memory data to transfer after stopping vcpus. This also depends on how
> > > user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> > > will break migration function.
> > 
> > OK, so let me take a step back.
> > 
> > I see this nice 'supported' CPU count that is exposed in kvm module.
> > 
> > Then there is QEMU throwing out a warning if you crank up the CPU count
> > above that number.
> 
> I find the range between "recommended max" and "hard max" VCPU count
> confusing at best ... IIUC, it was there because KVM internals had
> problems with scaling and we will hit more in the future because some
> loops still are linear on VCPU count.

Is that documented somewhere? There are some folks would be interested
in looking at that if it was known what exactly to look for..

> 
> The exposed value doesn't say whether migration will work, because that
> is a userspace thing and we're not aware of bottlenecks on the KVM side.
> 
> > Red Hat's web-pages talk about CPU count as well.
> > 
> > And I am assuming all of those are around what has been tested and
> > what has shown to work. And one of those test-cases surely must
> > be migration.
> 
> Right, Red Hat will only allow/support what it has tested, even if
> upstream has a practically unlimited count.  I think the upstream number
> used to be raised by Red Hat, which is why upstream isn't at the hard
> implementation limit ...

Aim for the sky! Perhaps then lets crank it up to 4096 upstream and let
each vendor/distro/cloud decide the right number based on their
testing.

And also have more folks report issues as they try running say running
these huge vCPU guests?

> 
> > Ergo, if the vCPU count increase will break migration, then it is
> > a regression.
> 
> Raising the limit would not break existing guests, but I would rather
> avoid adding higher VCPU count as a feature that disables migration.
> 
> > Or a fix/work needs to be done to support a higher CPU count for
> > migrating?
> 
> Post-copy migration should handle higher CPU count and it is the default
> fallback on QEMU.  Asking the question on a userspace list would yield
> better answers, though.
> 
> Thanks.

Konrad Rzeszutek Wilk Aug. 18, 2017, 2:20 p.m. UTC | #12

On Wed, Aug 16, 2017 at 11:07:55AM +0800, Lan Tianyu wrote:
> On 2017年08月15日 22:10, Konrad Rzeszutek Wilk wrote:
> > On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> >> On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
> >>>> 2017-08-11 10:11+0200, David Hildenbrand:
> >>>>> On 11.08.2017 09:49, Lan Tianyu wrote:
> >>>>>> Hi Konrad:
> >>>>>> 	Thanks for your review.
> >>>>>>
> >>>>>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
> >>>>>>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
> >>>>>>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
> >>>>>>>> case, it will create a huge VM with vcpu number as same as host cpus. This
> >>>>>>>> patch is to increase max vcpu number to 352.
> >>>>>>>
> >>>>>>> Why not 1024 or 4096?
> >>>>>>
> >>>>>> This is on demand. We can set a higher number since KVM already has
> >>>>>> x2apic and vIOMMU interrupt remapping support.
> >>>>>>
> >>>>>>>
> >>>>>>> Are there any issues with increasing the value from 288 to 352 right now?
> >>>>>>
> >>>>>> No found.
> >>>>
> >>>> Yeah, the only issue until around 2^20 (when we reach the maximum of
> >>>> logical x2APIC addressing) should be the size of per-VM arrays when only
> >>>> few VCPUs are going to be used.
> >>>
> >>> Migration with 352 CPUs all being busy dirtying memory and also poking
> >>> at various I/O ports (say all of them dirtying the VGA) is no problem?
> >>
> >> This depends on what kind of workload is running during migration. I
> >> think this may affect service down time since there maybe a lot of dirty
> >> memory data to transfer after stopping vcpus. This also depends on how
> >> user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> >> will break migration function.
> > 
> > OK, so let me take a step back.
> > 
> > I see this nice 'supported' CPU count that is exposed in kvm module.
> > 
> > Then there is QEMU throwing out a warning if you crank up the CPU count
> > above that number.
> > 
> > Red Hat's web-pages talk about CPU count as well.
> > 
> > And I am assuming all of those are around what has been tested and
> > what has shown to work. And one of those test-cases surely must
> > be migration.
> > 
> 
> Sorry. This is a typo. I originally meant increasing vcpu shouldn't
> break migration function and just affect service downtime. If there was
> such issue, we should fix it.
> 
> 
> > Ergo, if the vCPU count increase will break migration, then it is
> > a regression.
> > 
> > Or a fix/work needs to be done to support a higher CPU count for
> > migrating?
> > 
> > 
> > Is my understanding incorrect?
> 
> You are right.
> 
> > 
> >>
> >>>
> >>>
> >>>>
> >>>>>>> Also perhaps this should be made in an Kconfig entry?
> >>>>>>
> >>>>>> That will be anther option but I find different platforms will define
> >>>>>> different MAX_VCPU. If we introduce a generic Kconfig entry, different
> >>>>>> platforms should have different range.
> >>>
> >>>
> >>> By different platforms you mean q35 vs the older one, and such?
> >>
> >> I meant x86, arm, sparc and other vendors' code define different max
> >> vcpu number.
> > 
> > Right, and?
> 
> If we introduce a general kconfig of max vcpus for all vendors, it
> should have different max vcpu range for different vendor.

Sounds sensible as well. But based on this thread it seems that the
issue of what is 'supported' vs what is in the code is completely
at odds of each other.

Meaning you may as well go forth and put in a huge amount and it
would be OK with the maintainers?

> 
> 
> 
> 
> -- 
> Best regards
> Tianyu Lan

Radim Krčmář Aug. 21, 2017, 3:44 p.m. UTC | #13

2017-08-18 09:57-0400, Konrad Rzeszutek Wilk:
> On Tue, Aug 15, 2017 at 06:13:29PM +0200, Radim Krčmář wrote:
> > (Missed this mail before my last reply.)
> > 
> > 2017-08-15 10:10-0400, Konrad Rzeszutek Wilk:
> > > On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> > > > On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> > > > > Migration with 352 CPUs all being busy dirtying memory and also poking
> > > > > at various I/O ports (say all of them dirtying the VGA) is no problem?
> > > > 
> > > > This depends on what kind of workload is running during migration. I
> > > > think this may affect service down time since there maybe a lot of dirty
> > > > memory data to transfer after stopping vcpus. This also depends on how
> > > > user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> > > > will break migration function.
> > > 
> > > OK, so let me take a step back.
> > > 
> > > I see this nice 'supported' CPU count that is exposed in kvm module.
> > > 
> > > Then there is QEMU throwing out a warning if you crank up the CPU count
> > > above that number.
> > 
> > I find the range between "recommended max" and "hard max" VCPU count
> > confusing at best ... IIUC, it was there because KVM internals had
> > problems with scaling and we will hit more in the future because some
> > loops still are linear on VCPU count.
> 
> Is that documented somewhere? There are some folks would be interested
> in looking at that if it was known what exactly to look for..

Not really, Documentation/virtual/kvm/api.txt says:

  The recommended max_vcpus value can be retrieved using the
  KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.

And "recommended" is not explained any further.  We can only state that
the value has no connection with userspace functionality, because it is
provided by KVM.

Red Hat was raising the KVM_CAP_NR_VCPUS after testing on a machine that
had enough physical cores.  (PLE had to be slightly optimized when going
to 240.)

> > The exposed value doesn't say whether migration will work, because that
> > is a userspace thing and we're not aware of bottlenecks on the KVM side.
> > 
> > > Red Hat's web-pages talk about CPU count as well.
> > > 
> > > And I am assuming all of those are around what has been tested and
> > > what has shown to work. And one of those test-cases surely must
> > > be migration.
> > 
> > Right, Red Hat will only allow/support what it has tested, even if
> > upstream has a practically unlimited count.  I think the upstream number
> > used to be raised by Red Hat, which is why upstream isn't at the hard
> > implementation limit ...
> 
> Aim for the sky! Perhaps then lets crank it up to 4096 upstream and let
> each vendor/distro/cloud decide the right number based on their
> testing.

And hit the ceiling. :)
NR_CPUS seems like a good number upstream.

KVM/x86: Increase max vcpu number to 352

Commit Message

Comments

Patch