diff mbox

[v7,01/13] machine: Don't allow CPU toplogies with partially filled cores

Message ID 20160129175215.5b65f98c@nial.brq.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Igor Mammedov Jan. 29, 2016, 4:52 p.m. UTC
On Fri, 29 Jan 2016 13:36:05 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Jan 29, 2016 at 04:10:47PM +0100, Igor Mammedov wrote:
> > On Fri, 29 Jan 2016 12:24:18 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Fri, Jan 29, 2016 at 02:52:30PM +1100, David Gibson wrote:  
> > > > On Thu, Jan 28, 2016 at 11:19:43AM +0530, Bharata B Rao wrote:    
> > > > > Prevent guests from booting with CPU topologies that have partially
> > > > > filled CPU cores or can result in partially filled CPU cores after
> > > > > CPU hotplug like
> > > > > 
> > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=16 or
> > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=17.
> > > > > 
> > > > > This is enforced by introducing MachineClass::validate_smp_config()
> > > > > that gets called from generic SMP parsing code. Machine type versions
> > > > > that want to enforce this can define this to the generic version
> > > > > provided.
> > > > > 
> > > > > Only sPAPR and PC machine types starting from version 2.6 enforce this in
> > > > > this patch.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>    
> > > > 
> > > > I've been kind of lost in the back and forth about
> > > > threads/cores/sockets.
> > > > 
> > > > What, in the end, is the rationale for allowing partially filled
> > > > sockets, but not partially filled cores?    
> > > 
> > > I don't think there's a good reason for that (at least for PC).
> > > 
> > > It's easier to relax the requirements later if necessary, than
> > > dealing with compatibility issues again when making the code more
> > > strict. So I suggest we make validate_smp_config_generic() also
> > > check if smp_cpus % (smp_threads * smp_cores) == 0.  
> > 
> > that would break exiting setups.  
> 
> Not if we do that only on newer machine classes.
> validate_smp_config_generic() will be used only on *-2.6 and
> newer.
> 
> 
> > 
> > Also in case of cpu hotplug this patch will break migration
> > as target QEMU might refuse starting with hotplugged CPU thread.  
> 
> This won't change older machine-types.
> 
> But I think you are right: it can break migration on pc-2.6, too.
> But: isn't migration already broken when creating other sets of
> CPUs that can't represented using -smp?
> 
> How exactly would you migrate a machine today, if you run:
> 
>   $ qemu-system-x86_64 -smp 16,sockets=2,cores=2,threads=2,maxcpus=32
>   (QMP) cpu-add id=31
that's invalid topology and should exit with error at start-up,
however it shouldn't be smp_cpus vs sockets,cores,threads check
but rather max_cpus vs sockets,cores,threads,maxcpus check.
something like this:


> 
> 
> > 
> > Perhaps this check should be enforced per target/machine if
> > arch requires it.  
> 
> It is. Please see the patch. It introduces a validate_smp_config
> method.
> 
> But we need your input to clarify if
> validate_smp_config_generic() is safe for pc-2.6 too.
it breaks migration as it could prevent target from starting if
there is hotplugged CPUs on source side.

Comments

Eduardo Habkost Jan. 29, 2016, 5:24 p.m. UTC | #1
On Fri, Jan 29, 2016 at 05:52:15PM +0100, Igor Mammedov wrote:
> On Fri, 29 Jan 2016 13:36:05 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Fri, Jan 29, 2016 at 04:10:47PM +0100, Igor Mammedov wrote:
> > > On Fri, 29 Jan 2016 12:24:18 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Fri, Jan 29, 2016 at 02:52:30PM +1100, David Gibson wrote:  
> > > > > On Thu, Jan 28, 2016 at 11:19:43AM +0530, Bharata B Rao wrote:    
> > > > > > Prevent guests from booting with CPU topologies that have partially
> > > > > > filled CPU cores or can result in partially filled CPU cores after
> > > > > > CPU hotplug like
> > > > > > 
> > > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=16 or
> > > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=17.
> > > > > > 
> > > > > > This is enforced by introducing MachineClass::validate_smp_config()
> > > > > > that gets called from generic SMP parsing code. Machine type versions
> > > > > > that want to enforce this can define this to the generic version
> > > > > > provided.
> > > > > > 
> > > > > > Only sPAPR and PC machine types starting from version 2.6 enforce this in
> > > > > > this patch.
> > > > > > 
> > > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>    
> > > > > 
> > > > > I've been kind of lost in the back and forth about
> > > > > threads/cores/sockets.
> > > > > 
> > > > > What, in the end, is the rationale for allowing partially filled
> > > > > sockets, but not partially filled cores?    
> > > > 
> > > > I don't think there's a good reason for that (at least for PC).
> > > > 
> > > > It's easier to relax the requirements later if necessary, than
> > > > dealing with compatibility issues again when making the code more
> > > > strict. So I suggest we make validate_smp_config_generic() also
> > > > check if smp_cpus % (smp_threads * smp_cores) == 0.  
> > > 
> > > that would break exiting setups.  
> > 
> > Not if we do that only on newer machine classes.
> > validate_smp_config_generic() will be used only on *-2.6 and
> > newer.
> > 
> > 
> > > 
> > > Also in case of cpu hotplug this patch will break migration
> > > as target QEMU might refuse starting with hotplugged CPU thread.  
> > 
> > This won't change older machine-types.
> > 
> > But I think you are right: it can break migration on pc-2.6, too.
> > But: isn't migration already broken when creating other sets of
> > CPUs that can't represented using -smp?
> > 
> > How exactly would you migrate a machine today, if you run:
> > 
> >   $ qemu-system-x86_64 -smp 16,sockets=2,cores=2,threads=2,maxcpus=32
> >   (QMP) cpu-add id=31
> that's invalid topology and should exit with error at start-up,

Oops, my mistake. Now the same question with the right numbers:

How exactly would you migrate a machine today, if you do the
following?

  $ qemu-system-x86_64 -smp 8,sockets=2,cores=2,threads=2,maxcpus=16
  (QMP) cpu-add id=15


> however it shouldn't be smp_cpus vs sockets,cores,threads check
> but rather max_cpus vs sockets,cores,threads,maxcpus check.
> something like this:
> 
> diff --git a/vl.c b/vl.c
> index f043009..3afa0b6 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1239,9 +1239,9 @@ static void smp_parse(QemuOpts *opts)
>          }
>  
>          max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
> -        if (sockets * cores * threads > max_cpus) {
> -            error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) > "
> +        if (sockets * cores * threads == max_cpus) {
> +            error_report("invalid cpu topology: "
> +                         "sockets (%u) * cores (%u) * threads (%u) not equal "
>                           "maxcpus (%u)",
>                           sockets, cores, threads, max_cpus);
>              exit(1);
> 
> > 
> > 
> > > 
> > > Perhaps this check should be enforced per target/machine if
> > > arch requires it.  
> > 
> > It is. Please see the patch. It introduces a validate_smp_config
> > method.
> > 
> > But we need your input to clarify if
> > validate_smp_config_generic() is safe for pc-2.6 too.
> it breaks migration as it could prevent target from starting if
> there is hotplugged CPUs on source side.

It looks like this is a problem only if the machine allows
hotplug of individual threads. What if we just add this to the
beginning of validate_smp_config_generic():

    if (mc->hot_add_cpu && max_cpus > smp_cpus) {
        /* hot_add_cpu() allows hotplug of individual threads,
         * allowing incomplete cores/sockets, so we can't prevent
         * it from running.
         */
         return 0;
    }
Igor Mammedov Feb. 1, 2016, 9:41 a.m. UTC | #2
On Fri, 29 Jan 2016 15:24:12 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Jan 29, 2016 at 05:52:15PM +0100, Igor Mammedov wrote:
> > On Fri, 29 Jan 2016 13:36:05 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Fri, Jan 29, 2016 at 04:10:47PM +0100, Igor Mammedov wrote:  
> > > > On Fri, 29 Jan 2016 12:24:18 -0200
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >     
> > > > > On Fri, Jan 29, 2016 at 02:52:30PM +1100, David Gibson wrote:    
> > > > > > On Thu, Jan 28, 2016 at 11:19:43AM +0530, Bharata B Rao wrote:      
> > > > > > > Prevent guests from booting with CPU topologies that have partially
> > > > > > > filled CPU cores or can result in partially filled CPU cores after
> > > > > > > CPU hotplug like
> > > > > > > 
> > > > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=16 or
> > > > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=17.
> > > > > > > 
> > > > > > > This is enforced by introducing MachineClass::validate_smp_config()
> > > > > > > that gets called from generic SMP parsing code. Machine type versions
> > > > > > > that want to enforce this can define this to the generic version
> > > > > > > provided.
> > > > > > > 
> > > > > > > Only sPAPR and PC machine types starting from version 2.6 enforce this in
> > > > > > > this patch.
> > > > > > > 
> > > > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>      
> > > > > > 
> > > > > > I've been kind of lost in the back and forth about
> > > > > > threads/cores/sockets.
> > > > > > 
> > > > > > What, in the end, is the rationale for allowing partially filled
> > > > > > sockets, but not partially filled cores?      
> > > > > 
> > > > > I don't think there's a good reason for that (at least for PC).
> > > > > 
> > > > > It's easier to relax the requirements later if necessary, than
> > > > > dealing with compatibility issues again when making the code more
> > > > > strict. So I suggest we make validate_smp_config_generic() also
> > > > > check if smp_cpus % (smp_threads * smp_cores) == 0.    
> > > > 
> > > > that would break exiting setups.    
> > > 
> > > Not if we do that only on newer machine classes.
> > > validate_smp_config_generic() will be used only on *-2.6 and
> > > newer.
> > > 
> > >   
> > > > 
> > > > Also in case of cpu hotplug this patch will break migration
> > > > as target QEMU might refuse starting with hotplugged CPU thread.    
> > > 
> > > This won't change older machine-types.
> > > 
> > > But I think you are right: it can break migration on pc-2.6, too.
> > > But: isn't migration already broken when creating other sets of
> > > CPUs that can't represented using -smp?
> > > 
> > > How exactly would you migrate a machine today, if you run:
> > > 
> > >   $ qemu-system-x86_64 -smp 16,sockets=2,cores=2,threads=2,maxcpus=32
> > >   (QMP) cpu-add id=31  
> > that's invalid topology and should exit with error at start-up,  
> 
> Oops, my mistake. Now the same question with the right numbers:
> 
> How exactly would you migrate a machine today, if you do the
> following?
> 
>   $ qemu-system-x86_64 -smp 8,sockets=2,cores=2,threads=2,maxcpus=16
>   (QMP) cpu-add id=15
isn't it the same broken topology? 
  sockets*cores*threads != maxcpus
But if you ask if it's possible to migrate machine with non-sequentially
hotplugged CPUs than answer is no it's not possible with cpu-add.

> > however it shouldn't be smp_cpus vs sockets,cores,threads check
> > but rather max_cpus vs sockets,cores,threads,maxcpus check.
> > something like this:
> > 
> > diff --git a/vl.c b/vl.c
> > index f043009..3afa0b6 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -1239,9 +1239,9 @@ static void smp_parse(QemuOpts *opts)
> >          }
> >  
> >          max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
> > -        if (sockets * cores * threads > max_cpus) {
> > -            error_report("cpu topology: "
> > -                         "sockets (%u) * cores (%u) * threads (%u) > "
> > +        if (sockets * cores * threads == max_cpus) {
> > +            error_report("invalid cpu topology: "
> > +                         "sockets (%u) * cores (%u) * threads (%u) not equal "
> >                           "maxcpus (%u)",
> >                           sockets, cores, threads, max_cpus);
> >              exit(1);
> >   
> > > 
> > >   
> > > > 
> > > > Perhaps this check should be enforced per target/machine if
> > > > arch requires it.    
> > > 
> > > It is. Please see the patch. It introduces a validate_smp_config
> > > method.
> > > 
> > > But we need your input to clarify if
> > > validate_smp_config_generic() is safe for pc-2.6 too.  
> > it breaks migration as it could prevent target from starting if
> > there is hotplugged CPUs on source side.  
> 
> It looks like this is a problem only if the machine allows
> hotplug of individual threads. What if we just add this to the
> beginning of validate_smp_config_generic():
> 
>     if (mc->hot_add_cpu && max_cpus > smp_cpus) {
It would break migration after hotplugging CPUs upto max_cpus
and then trying to migrate.

Also when we move x86 to device_add that will regress since
hot_add_cpu() won't be used in that case.

I think there is not much point enforcing restrictions
in this patch as generic. We should leave hook empty and allow
target to override it. Then SPAPR could enforce it's own limit
on partial cores.

>         /* hot_add_cpu() allows hotplug of individual threads,
>          * allowing incomplete cores/sockets, so we can't prevent
>          * it from running.
>          */
>          return 0;
>     }
Eduardo Habkost Feb. 3, 2016, 5:38 p.m. UTC | #3
On Mon, Feb 01, 2016 at 10:41:58AM +0100, Igor Mammedov wrote:
[...]
> > 
> > How exactly would you migrate a machine today, if you do the
> > following?
> > 
> >   $ qemu-system-x86_64 -smp 8,sockets=2,cores=2,threads=2,maxcpus=16
> >   (QMP) cpu-add id=15
> isn't it the same broken topology? 
>   sockets*cores*threads != maxcpus
> But if you ask if it's possible to migrate machine with non-sequentially
> hotplugged CPUs than answer is no it's not possible with cpu-add.

I was asking about migration with non-sequential hotplugged CPUs
(e.g. with -smp 8,sockets=4,cores=2,threads=2,maxcpus=16).


[...]
> > > > > 
> > > > > Perhaps this check should be enforced per target/machine if
> > > > > arch requires it.    
> > > > 
> > > > It is. Please see the patch. It introduces a validate_smp_config
> > > > method.
> > > > 
> > > > But we need your input to clarify if
> > > > validate_smp_config_generic() is safe for pc-2.6 too.  
> > > it breaks migration as it could prevent target from starting if
> > > there is hotplugged CPUs on source side.  
> > 
> > It looks like this is a problem only if the machine allows
> > hotplug of individual threads. What if we just add this to the
> > beginning of validate_smp_config_generic():
> > 
> >     if (mc->hot_add_cpu && max_cpus > smp_cpus) {
> It would break migration after hotplugging CPUs upto max_cpus
> and then trying to migrate.
> 
> Also when we move x86 to device_add that will regress since
> hot_add_cpu() won't be used in that case.
> 
> I think there is not much point enforcing restrictions
> in this patch as generic. We should leave hook empty and allow
> target to override it. Then SPAPR could enforce it's own limit
> on partial cores.

Agreed. It looks like we won't get rid of thread-based CPU
hotplug in x86 in the foreseable future.
Igor Mammedov Feb. 4, 2016, 9:38 a.m. UTC | #4
On Wed, 3 Feb 2016 15:38:09 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Feb 01, 2016 at 10:41:58AM +0100, Igor Mammedov wrote:
> [...]
> > > 
> > > How exactly would you migrate a machine today, if you do the
> > > following?
> > > 
> > >   $ qemu-system-x86_64 -smp 8,sockets=2,cores=2,threads=2,maxcpus=16
> > >   (QMP) cpu-add id=15  
> > isn't it the same broken topology? 
> >   sockets*cores*threads != maxcpus
> > But if you ask if it's possible to migrate machine with non-sequentially
> > hotplugged CPUs than answer is no it's not possible with cpu-add.  
> 
> I was asking about migration with non-sequential hotplugged CPUs
> (e.g. with -smp 8,sockets=4,cores=2,threads=2,maxcpus=16).
machine with non-sequential hotplugged CPUs can't be migrated currently
since there is not any way to say which CPUs should be created on target.

That should be though possible to do so with -device/device_add
interface once we enable it for x86 CPU threads.
Stumble block there is where to get APIC ID for a CPU (i.e. address
thing which tells where to plug CPU at).
Adding socket,core,thread properties to CPU would help to solve it,
and they look generic enough to put them in CPUClass.

> 
> 
> [...]
> > > > > > 
> > > > > > Perhaps this check should be enforced per target/machine if
> > > > > > arch requires it.      
> > > > > 
> > > > > It is. Please see the patch. It introduces a validate_smp_config
> > > > > method.
> > > > > 
> > > > > But we need your input to clarify if
> > > > > validate_smp_config_generic() is safe for pc-2.6 too.    
> > > > it breaks migration as it could prevent target from starting if
> > > > there is hotplugged CPUs on source side.    
> > > 
> > > It looks like this is a problem only if the machine allows
> > > hotplug of individual threads. What if we just add this to the
> > > beginning of validate_smp_config_generic():
> > > 
> > >     if (mc->hot_add_cpu && max_cpus > smp_cpus) {  
> > It would break migration after hotplugging CPUs upto max_cpus
> > and then trying to migrate.
> > 
> > Also when we move x86 to device_add that will regress since
> > hot_add_cpu() won't be used in that case.
> > 
> > I think there is not much point enforcing restrictions
> > in this patch as generic. We should leave hook empty and allow
> > target to override it. Then SPAPR could enforce it's own limit
> > on partial cores.  
> 
> Agreed. It looks like we won't get rid of thread-based CPU
> hotplug in x86 in the foreseable future.
>
diff mbox

Patch

diff --git a/vl.c b/vl.c
index f043009..3afa0b6 100644
--- a/vl.c
+++ b/vl.c
@@ -1239,9 +1239,9 @@  static void smp_parse(QemuOpts *opts)
         }
 
         max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
-        if (sockets * cores * threads > max_cpus) {
-            error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) > "
+        if (sockets * cores * threads == max_cpus) {
+            error_report("invalid cpu topology: "
+                         "sockets (%u) * cores (%u) * threads (%u) not equal "
                          "maxcpus (%u)",
                          sockets, cores, threads, max_cpus);
             exit(1);