[RFC,0/2] Boot time cpupools

Message ID	20211117095711.26596-1-luca.fancellu@arm.com (mailing list archive)
Headers	show Return-Path: <SRS0=zQ2v=QE=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 90EAC61AED Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Luca Fancellu <luca.fancellu@arm.com> To: xen-devel@lists.xenproject.org Cc: bertrand.marquis@arm.com, wei.chen@arm.com, Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, Ian Jackson <iwj@xenproject.org>, Jan Beulich <jbeulich@suse.com>, Wei Liu <wl@xen.org>, Juergen Gross <jgross@suse.com>, Dario Faggioli <dfaggioli@suse.com>, Anthony PERARD <anthony.perard@citrix.com> Subject: [RFC PATCH 0/2] Boot time cpupools Date: Wed, 17 Nov 2021 09:57:09 +0000 Message-Id: <20211117095711.26596-1-luca.fancellu@arm.com>
Series	Boot time cpupools \| expand [RFC,0/2] Boot time cpupools [RFC,1/2] xen/cpupool: Create different cpupools at boot time [RFC,2/2] tools/cpupools: Give a name to unnamed cpupools

Luca Fancellu Nov. 17, 2021, 9:57 a.m. UTC

Currently Xen creates a default cpupool0 that contains all the cpu brought up
during boot and it assumes that the platform has only one kind of CPU.
This assumption does not hold on big.LITTLE platform, but putting different
type of CPU in the same cpupool can result in instability and security issues
for the domains running on the pool.

For this reason this serie introduces an architecture specific way to create
different cpupool at boot time, this is particularly useful on ARM big.LITTLE
platform where there might be the need to have different cpupools for each type
of core, but also systems using NUMA can have different cpu pool for each node.

To test this serie, the hmp_unsafe Xen boot argument is passed to allow Xen
starting different CPUs from the boot core.

Luca Fancellu (2):
  xen/cpupool: Create different cpupools at boot time
  tools/cpupools: Give a name to unnamed cpupools

 tools/libs/light/libxl_utils.c | 13 ++++--
 xen/arch/arm/Kconfig           | 15 ++++++
 xen/arch/arm/Makefile          |  1 +
 xen/arch/arm/cpupool.c         | 84 ++++++++++++++++++++++++++++++++++
 xen/common/sched/cpupool.c     |  5 +-
 xen/include/xen/cpupool.h      | 30 ++++++++++++
 6 files changed, 142 insertions(+), 6 deletions(-)
 create mode 100644 xen/arch/arm/cpupool.c
 create mode 100644 xen/include/xen/cpupool.h

Julien Grall Nov. 17, 2021, 10:26 a.m. UTC | #1

Hi Luca,

On 17/11/2021 09:57, Luca Fancellu wrote:
> Currently Xen creates a default cpupool0 that contains all the cpu brought up
> during boot and it assumes that the platform has only one kind of CPU.
> This assumption does not hold on big.LITTLE platform, but putting different
> type of CPU in the same cpupool can result in instability and security issues
> for the domains running on the pool.

I agree that you can't move a LITTLE vCPU to a big pCPU. However...

> 
> For this reason this serie introduces an architecture specific way to create
> different cpupool at boot time, this is particularly useful on ARM big.LITTLE
> platform where there might be the need to have different cpupools for each type
> of core, but also systems using NUMA can have different cpu pool for each node.

... from my understanding, all the vCPUs of a domain have to be in the 
same cpupool. So with this approach it is not possible:
    1) to have a mix of LITTLE and big vCPUs in the domain
    2) to create a domain spanning across two NUMA nodes

So I think we need to make sure that any solutions we go through will 
not prevent us to implement those setups.

I can see two options here:
   1) Allowing a domain vCPUs to be on a different cpupool
   2) Introducing CPU class (see [1])

I can't remember why Dario suggested 2) rather than 1) in the past. 
@Dario, do you remember it?

Cheers,

[1] https://lore.kernel.org/xen-devel/1481135379.3445.142.camel@citrix.com/

> 
> To test this serie, the hmp_unsafe Xen boot argument is passed to allow Xen
> starting different CPUs from the boot core.
> 
> Luca Fancellu (2):
>    xen/cpupool: Create different cpupools at boot time
>    tools/cpupools: Give a name to unnamed cpupools
> 
>   tools/libs/light/libxl_utils.c | 13 ++++--
>   xen/arch/arm/Kconfig           | 15 ++++++
>   xen/arch/arm/Makefile          |  1 +
>   xen/arch/arm/cpupool.c         | 84 ++++++++++++++++++++++++++++++++++
>   xen/common/sched/cpupool.c     |  5 +-
>   xen/include/xen/cpupool.h      | 30 ++++++++++++
>   6 files changed, 142 insertions(+), 6 deletions(-)
>   create mode 100644 xen/arch/arm/cpupool.c
>   create mode 100644 xen/include/xen/cpupool.h
>

Jürgen Groß Nov. 17, 2021, 10:41 a.m. UTC | #2

On 17.11.21 11:26, Julien Grall wrote:
> Hi Luca,
> 
> On 17/11/2021 09:57, Luca Fancellu wrote:
>> Currently Xen creates a default cpupool0 that contains all the cpu 
>> brought up
>> during boot and it assumes that the platform has only one kind of CPU.
>> This assumption does not hold on big.LITTLE platform, but putting 
>> different
>> type of CPU in the same cpupool can result in instability and security 
>> issues
>> for the domains running on the pool.
> 
> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
> 
>>
>> For this reason this serie introduces an architecture specific way to 
>> create
>> different cpupool at boot time, this is particularly useful on ARM 
>> big.LITTLE
>> platform where there might be the need to have different cpupools for 
>> each type
>> of core, but also systems using NUMA can have different cpu pool for 
>> each node.
> 
> ... from my understanding, all the vCPUs of a domain have to be in the 
> same cpupool. So with this approach it is not possible:
>     1) to have a mix of LITTLE and big vCPUs in the domain
>     2) to create a domain spanning across two NUMA nodes
> 
> So I think we need to make sure that any solutions we go through will 
> not prevent us to implement those setups.
> 
> I can see two options here:
>    1) Allowing a domain vCPUs to be on a different cpupool
>    2) Introducing CPU class (see [1])
> 
> I can't remember why Dario suggested 2) rather than 1) in the past. 
> @Dario, do you remember it?

Having vcpus of a domain in different cpupools would probably require a
major scheduling rework due to several accounting information being per
cpupool today with some of that data being additionally per domain. Not
to mention the per-domain scheduling parameters, which would need to
cope with different schedulers suddenly (imagine one cpupool using
credit2 and the other rt).

I'd rather use implicit pinning e.g. via a cpu class.


Juergen

Bertrand Marquis Nov. 17, 2021, 11:16 a.m. UTC | #3

Hi Julien,

> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
> 
> Hi Luca,
> 
> On 17/11/2021 09:57, Luca Fancellu wrote:
>> Currently Xen creates a default cpupool0 that contains all the cpu brought up
>> during boot and it assumes that the platform has only one kind of CPU.
>> This assumption does not hold on big.LITTLE platform, but putting different
>> type of CPU in the same cpupool can result in instability and security issues
>> for the domains running on the pool.
> 
> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
> 
>> For this reason this serie introduces an architecture specific way to create
>> different cpupool at boot time, this is particularly useful on ARM big.LITTLE
>> platform where there might be the need to have different cpupools for each type
>> of core, but also systems using NUMA can have different cpu pool for each node.
> 
> ... from my understanding, all the vCPUs of a domain have to be in the same cpupool. So with this approach it is not possible:
>   1) to have a mix of LITTLE and big vCPUs in the domain
>   2) to create a domain spanning across two NUMA nodes
> 
> So I think we need to make sure that any solutions we go through will not prevent us to implement those setups.

The point of this patch is to make all cores available without breaking the current behaviour of existing system.

Someone not using cpupool will keep running on the same cores as before.
Someone wanting to use the other cores could assign a guest to the other(s) cpupool (big.LITTLE is just an example with 2 but there are now cores with 3 types of cores).
Someone wanting to build something different can now create new cpupools in Dom0 and assign the cores they want to is to build a guest having access to different types of cores.

The point here is just to make the “other” cores accessible and park them in cpupools so that current behaviour is not changed.

> 
> I can see two options here:
>  1) Allowing a domain vCPUs to be on a different cpupool
>  2) Introducing CPU class (see [1])
> 
> I can't remember why Dario suggested 2) rather than 1) in the past. @Dario, do you remember it?

I think 1) is definitely interesting and something that could be looked at in the future.

This serie just aims at making all cores available without breaking backward compatibility which is a good improvement but does not contradict the 2 options you are suggesting.

Cheers
Bertrand

> 
> Cheers,
> 
> [1] https://lore.kernel.org/xen-devel/1481135379.3445.142.camel@citrix.com/
> 
>> To test this serie, the hmp_unsafe Xen boot argument is passed to allow Xen
>> starting different CPUs from the boot core.
>> Luca Fancellu (2):
>>   xen/cpupool: Create different cpupools at boot time
>>   tools/cpupools: Give a name to unnamed cpupools
>>  tools/libs/light/libxl_utils.c | 13 ++++--
>>  xen/arch/arm/Kconfig           | 15 ++++++
>>  xen/arch/arm/Makefile          |  1 +
>>  xen/arch/arm/cpupool.c         | 84 ++++++++++++++++++++++++++++++++++
>>  xen/common/sched/cpupool.c     |  5 +-
>>  xen/include/xen/cpupool.h      | 30 ++++++++++++
>>  6 files changed, 142 insertions(+), 6 deletions(-)
>>  create mode 100644 xen/arch/arm/cpupool.c
>>  create mode 100644 xen/include/xen/cpupool.h
> 
> -- 
> Julien Grall

Jürgen Groß Nov. 17, 2021, 11:21 a.m. UTC | #4

On 17.11.21 12:16, Bertrand Marquis wrote:
> Hi Julien,
> 
>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Luca,
>>
>> On 17/11/2021 09:57, Luca Fancellu wrote:
>>> Currently Xen creates a default cpupool0 that contains all the cpu brought up
>>> during boot and it assumes that the platform has only one kind of CPU.
>>> This assumption does not hold on big.LITTLE platform, but putting different
>>> type of CPU in the same cpupool can result in instability and security issues
>>> for the domains running on the pool.
>>
>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
>>
>>> For this reason this serie introduces an architecture specific way to create
>>> different cpupool at boot time, this is particularly useful on ARM big.LITTLE
>>> platform where there might be the need to have different cpupools for each type
>>> of core, but also systems using NUMA can have different cpu pool for each node.
>>
>> ... from my understanding, all the vCPUs of a domain have to be in the same cpupool. So with this approach it is not possible:
>>    1) to have a mix of LITTLE and big vCPUs in the domain
>>    2) to create a domain spanning across two NUMA nodes
>>
>> So I think we need to make sure that any solutions we go through will not prevent us to implement those setups.
> 
> The point of this patch is to make all cores available without breaking the current behaviour of existing system.

May I suggest to add a boot parameter for being able to control this
behavior by other means than compile time configuration?

> 
> Someone not using cpupool will keep running on the same cores as before.
> Someone wanting to use the other cores could assign a guest to the other(s) cpupool (big.LITTLE is just an example with 2 but there are now cores with 3 types of cores).
> Someone wanting to build something different can now create new cpupools in Dom0 and assign the cores they want to is to build a guest having access to different types of cores.
> 
> The point here is just to make the “other” cores accessible and park them in cpupools so that current behaviour is not changed.
> 
>>
>> I can see two options here:
>>   1) Allowing a domain vCPUs to be on a different cpupool
>>   2) Introducing CPU class (see [1])
>>
>> I can't remember why Dario suggested 2) rather than 1) in the past. @Dario, do you remember it?
> 
> I think 1) is definitely interesting and something that could be looked at in the future.

 From scheduler point of view this is IMO a nightmare.


Juergen

Julien Grall Nov. 17, 2021, 11:48 a.m. UTC | #5

On 17/11/2021 11:16, Bertrand Marquis wrote:
> Hi Julien,

Hi,

>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Luca,
>>
>> On 17/11/2021 09:57, Luca Fancellu wrote:
>>> Currently Xen creates a default cpupool0 that contains all the cpu brought up
>>> during boot and it assumes that the platform has only one kind of CPU.
>>> This assumption does not hold on big.LITTLE platform, but putting different
>>> type of CPU in the same cpupool can result in instability and security issues
>>> for the domains running on the pool.
>>
>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
>>
>>> For this reason this serie introduces an architecture specific way to create
>>> different cpupool at boot time, this is particularly useful on ARM big.LITTLE
>>> platform where there might be the need to have different cpupools for each type
>>> of core, but also systems using NUMA can have different cpu pool for each node.
>>
>> ... from my understanding, all the vCPUs of a domain have to be in the same cpupool. So with this approach it is not possible:
>>    1) to have a mix of LITTLE and big vCPUs in the domain
>>    2) to create a domain spanning across two NUMA nodes
>>
>> So I think we need to make sure that any solutions we go through will not prevent us to implement those setups.
> 
> The point of this patch is to make all cores available without breaking the current behaviour of existing system.

I might be missing some context here. By breaking current behavior, do 
you mean user that may want to add "hmp-unsafe" on the command line?

Cheers,

Bertrand Marquis Nov. 17, 2021, 12:07 p.m. UTC | #6

Hi Julien,

> On 17 Nov 2021, at 11:48, Julien Grall <julien@xen.org> wrote:
> 
> On 17/11/2021 11:16, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi,
> 
>>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
>>> 
>>> Hi Luca,
>>> 
>>> On 17/11/2021 09:57, Luca Fancellu wrote:
>>>> Currently Xen creates a default cpupool0 that contains all the cpu brought up
>>>> during boot and it assumes that the platform has only one kind of CPU.
>>>> This assumption does not hold on big.LITTLE platform, but putting different
>>>> type of CPU in the same cpupool can result in instability and security issues
>>>> for the domains running on the pool.
>>> 
>>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
>>> 
>>>> For this reason this serie introduces an architecture specific way to create
>>>> different cpupool at boot time, this is particularly useful on ARM big.LITTLE
>>>> platform where there might be the need to have different cpupools for each type
>>>> of core, but also systems using NUMA can have different cpu pool for each node.
>>> 
>>> ... from my understanding, all the vCPUs of a domain have to be in the same cpupool. So with this approach it is not possible:
>>>   1) to have a mix of LITTLE and big vCPUs in the domain
>>>   2) to create a domain spanning across two NUMA nodes
>>> 
>>> So I think we need to make sure that any solutions we go through will not prevent us to implement those setups.
>> The point of this patch is to make all cores available without breaking the current behaviour of existing system.
> 
> I might be missing some context here. By breaking current behavior, do you mean user that may want to add "hmp-unsafe" on the command line?

Right, with hmp-unsafe the behaviour is now the same as without, only extra cores are parked in other cpupools.

So you have a point in fact that behaviour is changed for someone who was using hmp-unsafe before if this is activated.
The command line argument suggested by Jurgen definitely makes sense here.

We could instead do the following:
- when this is activated in the configuration, boot all cores and park them in different pools (depending on command line argument). Current behaviour not change if other pools are not used (but more cores will be on in xen)
- when hmp-unsafe is on, this feature will be turned of (if activated in configuration) and all cores would be added in the same pool.

What do you think ?

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall

Julien Grall Nov. 17, 2021, 7:10 p.m. UTC | #7

(Prunning some CC to just leave Arm and sched folks)

On 17/11/2021 12:07, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 17 Nov 2021, at 11:48, Julien Grall <julien@xen.org> wrote:
>>
>> On 17/11/2021 11:16, Bertrand Marquis wrote:
>>> Hi Julien,
>>
>> Hi,
>>
>>>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
>>>>
>>>> Hi Luca,
>>>>
>>>> On 17/11/2021 09:57, Luca Fancellu wrote:
>>>>> Currently Xen creates a default cpupool0 that contains all the cpu brought up
>>>>> during boot and it assumes that the platform has only one kind of CPU.
>>>>> This assumption does not hold on big.LITTLE platform, but putting different
>>>>> type of CPU in the same cpupool can result in instability and security issues
>>>>> for the domains running on the pool.
>>>>
>>>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
>>>>
>>>>> For this reason this serie introduces an architecture specific way to create
>>>>> different cpupool at boot time, this is particularly useful on ARM big.LITTLE
>>>>> platform where there might be the need to have different cpupools for each type
>>>>> of core, but also systems using NUMA can have different cpu pool for each node.
>>>>
>>>> ... from my understanding, all the vCPUs of a domain have to be in the same cpupool. So with this approach it is not possible:
>>>>    1) to have a mix of LITTLE and big vCPUs in the domain
>>>>    2) to create a domain spanning across two NUMA nodes
>>>>
>>>> So I think we need to make sure that any solutions we go through will not prevent us to implement those setups.
>>> The point of this patch is to make all cores available without breaking the current behaviour of existing system.
>>
>> I might be missing some context here. By breaking current behavior, do you mean user that may want to add "hmp-unsafe" on the command line?
> 
> Right, with hmp-unsafe the behaviour is now the same as without, only extra cores are parked in other cpupools.
> 
> So you have a point in fact that behaviour is changed for someone who was using hmp-unsafe before if this is activated.
> The command line argument suggested by Jurgen definitely makes sense here.
> 
> We could instead do the following:
> - when this is activated in the configuration, boot all cores and park them in different pools (depending on command line argument). Current behaviour not change if other pools are not used (but more cores will be on in xen)

 From my understanding, it is possible to move a pCPU in/out a pool 
afterwards. So the security concern with big.LITTLE is still present, 
even though it would be difficult to hit it.

I am also concerned that it would be more difficult to detect any 
misconfiguration. So I think this option would still need to be turned 
on only if hmp-unsafe are the new command line one are both on.

If we want to enable it without hmp-unsafe on, we would need to at least 
lock the pools.

> - when hmp-unsafe is on, this feature will be turned of (if activated in configuration) and all cores would be added in the same pool.
> 
> What do you think ?

I am split. On one hand, this is making easier for someone to try 
big.LITTLE as you don't have manually pin vCPUs. On the other hand, this 
is handling a single use-case and you would need to use hmp-unsafe and 
pinning if you want to get more exotic setup (e.g. a domain with 
big.LITTLE).

Another possible solution is to pin dom0 vCPUs (AFAIK they are just 
sticky by default) and then create the pools from dom0 userspace. My 
assumption is for dom0less we would want to use pinning instead.

That said I would like to hear from Xilinx and EPAM as, IIRC, they are 
already using hmp-unsafe in production.

Cheers,

Stefano Stabellini Nov. 18, 2021, 2:19 a.m. UTC | #8

On Wed, 17 Nov 2021, Julien Grall wrote:
> > > > > On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
> > > > > 
> > > > > Hi Luca,
> > > > > 
> > > > > On 17/11/2021 09:57, Luca Fancellu wrote:
> > > > > > Currently Xen creates a default cpupool0 that contains all the cpu
> > > > > > brought up
> > > > > > during boot and it assumes that the platform has only one kind of
> > > > > > CPU.
> > > > > > This assumption does not hold on big.LITTLE platform, but putting
> > > > > > different
> > > > > > type of CPU in the same cpupool can result in instability and
> > > > > > security issues
> > > > > > for the domains running on the pool.
> > > > > 
> > > > > I agree that you can't move a LITTLE vCPU to a big pCPU. However...
> > > > > 
> > > > > > For this reason this serie introduces an architecture specific way
> > > > > > to create
> > > > > > different cpupool at boot time, this is particularly useful on ARM
> > > > > > big.LITTLE
> > > > > > platform where there might be the need to have different cpupools
> > > > > > for each type
> > > > > > of core, but also systems using NUMA can have different cpu pool for
> > > > > > each node.
> > > > > 
> > > > > ... from my understanding, all the vCPUs of a domain have to be in the
> > > > > same cpupool. So with this approach it is not possible:
> > > > >    1) to have a mix of LITTLE and big vCPUs in the domain
> > > > >    2) to create a domain spanning across two NUMA nodes
> > > > > 
> > > > > So I think we need to make sure that any solutions we go through will
> > > > > not prevent us to implement those setups.
> > > > The point of this patch is to make all cores available without breaking
> > > > the current behaviour of existing system.
> > > 
> > > I might be missing some context here. By breaking current behavior, do you
> > > mean user that may want to add "hmp-unsafe" on the command line?
> > 
> > Right, with hmp-unsafe the behaviour is now the same as without, only extra
> > cores are parked in other cpupools.
> > 
> > So you have a point in fact that behaviour is changed for someone who was
> > using hmp-unsafe before if this is activated.
> > The command line argument suggested by Jurgen definitely makes sense here.
> > 
> > We could instead do the following:
> > - when this is activated in the configuration, boot all cores and park them
> > in different pools (depending on command line argument). Current behaviour
> > not change if other pools are not used (but more cores will be on in xen)
> 
> From my understanding, it is possible to move a pCPU in/out a pool afterwards.
> So the security concern with big.LITTLE is still present, even though it would
> be difficult to hit it.

As far as I know moving a pCPU in/out of a pool is something that cannot
happen automatically: it requires manual intervention to the user and it
is uncommon. We could print a warning or simply return error to prevent
the action from happening. Or something like:

"This action might result in memory corruptions and invalid behavior. Do
you want to continue? [Y/N]

> I am also concerned that it would be more difficult to detect any
> misconfiguration. So I think this option would still need to be turned on only
> if hmp-unsafe are the new command line one are both on.
> 
> If we want to enable it without hmp-unsafe on, we would need to at least lock
> the pools.

Locking the pools is a good idea.

My preference is not to tie this feature to the hmp-unsafe command line,
more on this below.

> > - when hmp-unsafe is on, this feature will be turned of (if activated in
> > configuration) and all cores would be added in the same pool.
> > 
> > What do you think ?
> 
> I am split. On one hand, this is making easier for someone to try big.LITTLE
> as you don't have manually pin vCPUs. On the other hand, this is handling a
> single use-case and you would need to use hmp-unsafe and pinning if you want
> to get more exotic setup (e.g. a domain with big.LITTLE).
> 
> Another possible solution is to pin dom0 vCPUs (AFAIK they are just sticky by
> default) and then create the pools from dom0 userspace. My assumption is for
> dom0less we would want to use pinning instead.
> 
> That said I would like to hear from Xilinx and EPAM as, IIRC, they are already
> using hmp-unsafe in production.

This discussion has been very interesting, it is cool to hear new ideas
like this one. I have a couple of thoughts to share.

First I think that the ability of creating cpupools at boot time is
super important. It goes way beyond big.LITTLE. It would be incredibly
useful to separate real-time (sched=null) and non-real-time
(sched=credit2) workloads. I think it will only become more important
going forward so I'd love to see an option to configure cpupools that
works for dom0less. It could be based on device tree properties rather
than kconfig options.

It is true that if we had the devicetree-based cpupool configuration I
mentioned, then somebody could use it to create cpupools matching
big.LITTLE. So "in theory" it solves the problem. However, I think that
for big.LITTLE it would be suboptimal. For big.LITTLE it would be best
if Xen configured the cpupools automatically rather than based on the
device tree configuration. That way, it is going to work automatically
without extra steps even in the simplest Xen setups.

So I think that it is a good idea to have a command line option (better
than a kconfig option) to trigger the MIDR-based cpupool creation at
boot time. The option could be called midr-cpupools=on/off or
hw-cpupools=on/off for example.

In terms of whether it should be the default or not, I don't feel
strongly about it. Unfortunately we (Xilinx) rely on a number of
non-default options already so we are already in the situation where we
have to be extra-careful at the options passed. I don't think that
adding one more would make a significant difference either way.

But my preference is *not* to tie the new command line option with
hmp-unsafe because if you use midr-cpupools and don't mess with the
pools then it is actually safe. We could even lock the cpupools like
Julien suggested or warn/return error on changing the cpupools. In this
scenario, it would be detrimental to also pass hmp-unsafe: it would
allow actually unsafe configurations that the user wanted to avoid by
using midr-cpupools. It would end up disabling checks we could put in
place to make midr-cpupools safer.

So in short I think it should be:

- midr-cpupools alone
cpupools created at boot, warning/errors on changing cpupools

- midr-cpupools + hmp-unsafe
cpupools created at boot, changing cpupools is allowed (we could still
warn but no errors)

- hmp-unsafe alone
same as today with hmp-unsafe

For the default as I said I don't have a strong preference. I think
midr-cpupools could be "on" be default.

Jürgen Groß Nov. 18, 2021, 5:19 a.m. UTC | #9

On 18.11.21 03:19, Stefano Stabellini wrote:
> On Wed, 17 Nov 2021, Julien Grall wrote:
>>>>>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
>>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> On 17/11/2021 09:57, Luca Fancellu wrote:
>>>>>>> Currently Xen creates a default cpupool0 that contains all the cpu
>>>>>>> brought up
>>>>>>> during boot and it assumes that the platform has only one kind of
>>>>>>> CPU.
>>>>>>> This assumption does not hold on big.LITTLE platform, but putting
>>>>>>> different
>>>>>>> type of CPU in the same cpupool can result in instability and
>>>>>>> security issues
>>>>>>> for the domains running on the pool.
>>>>>>
>>>>>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
>>>>>>
>>>>>>> For this reason this serie introduces an architecture specific way
>>>>>>> to create
>>>>>>> different cpupool at boot time, this is particularly useful on ARM
>>>>>>> big.LITTLE
>>>>>>> platform where there might be the need to have different cpupools
>>>>>>> for each type
>>>>>>> of core, but also systems using NUMA can have different cpu pool for
>>>>>>> each node.
>>>>>>
>>>>>> ... from my understanding, all the vCPUs of a domain have to be in the
>>>>>> same cpupool. So with this approach it is not possible:
>>>>>>     1) to have a mix of LITTLE and big vCPUs in the domain
>>>>>>     2) to create a domain spanning across two NUMA nodes
>>>>>>
>>>>>> So I think we need to make sure that any solutions we go through will
>>>>>> not prevent us to implement those setups.
>>>>> The point of this patch is to make all cores available without breaking
>>>>> the current behaviour of existing system.
>>>>
>>>> I might be missing some context here. By breaking current behavior, do you
>>>> mean user that may want to add "hmp-unsafe" on the command line?
>>>
>>> Right, with hmp-unsafe the behaviour is now the same as without, only extra
>>> cores are parked in other cpupools.
>>>
>>> So you have a point in fact that behaviour is changed for someone who was
>>> using hmp-unsafe before if this is activated.
>>> The command line argument suggested by Jurgen definitely makes sense here.
>>>
>>> We could instead do the following:
>>> - when this is activated in the configuration, boot all cores and park them
>>> in different pools (depending on command line argument). Current behaviour
>>> not change if other pools are not used (but more cores will be on in xen)
>>
>>  From my understanding, it is possible to move a pCPU in/out a pool afterwards.
>> So the security concern with big.LITTLE is still present, even though it would
>> be difficult to hit it.
> 
> As far as I know moving a pCPU in/out of a pool is something that cannot
> happen automatically: it requires manual intervention to the user and it
> is uncommon. We could print a warning or simply return error to prevent
> the action from happening. Or something like:
> 
> "This action might result in memory corruptions and invalid behavior. Do
> you want to continue? [Y/N]

This should only be rejected if the source and target pool are not
compatible. So a cpupool could be attributed to allow only specific
cpus (and maybe domains?) in it.

Otherwise it would be impossible to create new cpupools after boot on
such a system and populating them with cpus.

>> I am also concerned that it would be more difficult to detect any
>> misconfiguration. So I think this option would still need to be turned on only
>> if hmp-unsafe are the new command line one are both on.
>>
>> If we want to enable it without hmp-unsafe on, we would need to at least lock
>> the pools.
> 
> Locking the pools is a good idea.

This would be another option, yes.

> My preference is not to tie this feature to the hmp-unsafe command line,
> more on this below.

I agree.

>>> - when hmp-unsafe is on, this feature will be turned of (if activated in
>>> configuration) and all cores would be added in the same pool.
>>>
>>> What do you think ?
>>
>> I am split. On one hand, this is making easier for someone to try big.LITTLE
>> as you don't have manually pin vCPUs. On the other hand, this is handling a
>> single use-case and you would need to use hmp-unsafe and pinning if you want
>> to get more exotic setup (e.g. a domain with big.LITTLE).
>>
>> Another possible solution is to pin dom0 vCPUs (AFAIK they are just sticky by
>> default) and then create the pools from dom0 userspace. My assumption is for
>> dom0less we would want to use pinning instead.
>>
>> That said I would like to hear from Xilinx and EPAM as, IIRC, they are already
>> using hmp-unsafe in production.
> 
> This discussion has been very interesting, it is cool to hear new ideas
> like this one. I have a couple of thoughts to share.
> 
> First I think that the ability of creating cpupools at boot time is
> super important. It goes way beyond big.LITTLE. It would be incredibly
> useful to separate real-time (sched=null) and non-real-time
> (sched=credit2) workloads. I think it will only become more important
> going forward so I'd love to see an option to configure cpupools that
> works for dom0less. It could be based on device tree properties rather
> than kconfig options.

I think device tree AND command line option should be possible (think of
x86 here).

> It is true that if we had the devicetree-based cpupool configuration I
> mentioned, then somebody could use it to create cpupools matching
> big.LITTLE. So "in theory" it solves the problem. However, I think that
> for big.LITTLE it would be suboptimal. For big.LITTLE it would be best
> if Xen configured the cpupools automatically rather than based on the
> device tree configuration. That way, it is going to work automatically
> without extra steps even in the simplest Xen setups.
> 
> So I think that it is a good idea to have a command line option (better
> than a kconfig option) to trigger the MIDR-based cpupool creation at
> boot time. The option could be called midr-cpupools=on/off or
> hw-cpupools=on/off for example.

I'd rather go for:

cpupools=<options>

With e.g. <options>:

- "auto-midr": split system into cpupools based on MIDR
- "auto-numa": split system into cpupools based on NUMA nodes
- "cpus=<list of cpus>[,sched=<scheduler>]

This would be rather flexible without adding more and more options
doing similar things. Other sub-options could be added rather easily.

> In terms of whether it should be the default or not, I don't feel
> strongly about it. Unfortunately we (Xilinx) rely on a number of
> non-default options already so we are already in the situation where we
> have to be extra-careful at the options passed. I don't think that
> adding one more would make a significant difference either way.
> 
> 
> But my preference is *not* to tie the new command line option with
> hmp-unsafe because if you use midr-cpupools and don't mess with the
> pools then it is actually safe. We could even lock the cpupools like
> Julien suggested or warn/return error on changing the cpupools. In this
> scenario, it would be detrimental to also pass hmp-unsafe: it would
> allow actually unsafe configurations that the user wanted to avoid by
> using midr-cpupools. It would end up disabling checks we could put in
> place to make midr-cpupools safer.
> 
> So in short I think it should be:
> 
> - midr-cpupools alone
> cpupools created at boot, warning/errors on changing cpupools
> 
> - midr-cpupools + hmp-unsafe
> cpupools created at boot, changing cpupools is allowed (we could still
> warn but no errors)

I'd rather add an explicit ",locked" option to above cpupools parameter.

> 
> - hmp-unsafe alone
> same as today with hmp-unsafe
> 
> 
> For the default as I said I don't have a strong preference. I think
> midr-cpupools could be "on" be default.
> 

What about making this a Kconfig option?


Juergen

Oleksandr Tyshchenko Nov. 18, 2021, 3:29 p.m. UTC | #10

On Wed, Nov 17, 2021 at 9:11 PM Julien Grall <julien@xen.org> wrote:

Hi Julien, all

[Sorry for the possible format issues]

(Prunning some CC to just leave Arm and sched folks)
>
> On 17/11/2021 12:07, Bertrand Marquis wrote:
> > Hi Julien,
>
> Hi Bertrand,
>
> >> On 17 Nov 2021, at 11:48, Julien Grall <julien@xen.org> wrote:
> >>
> >> On 17/11/2021 11:16, Bertrand Marquis wrote:
> >>> Hi Julien,
> >>
> >> Hi,
> >>
> >>>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
> >>>>
> >>>> Hi Luca,
> >>>>
> >>>> On 17/11/2021 09:57, Luca Fancellu wrote:
> >>>>> Currently Xen creates a default cpupool0 that contains all the cpu
> brought up
> >>>>> during boot and it assumes that the platform has only one kind of
> CPU.
> >>>>> This assumption does not hold on big.LITTLE platform, but putting
> different
> >>>>> type of CPU in the same cpupool can result in instability and
> security issues
> >>>>> for the domains running on the pool.
> >>>>
> >>>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
> >>>>
> >>>>> For this reason this serie introduces an architecture specific way
> to create
> >>>>> different cpupool at boot time, this is particularly useful on ARM
> big.LITTLE
> >>>>> platform where there might be the need to have different cpupools
> for each type
> >>>>> of core, but also systems using NUMA can have different cpu pool for
> each node.
> >>>>
> >>>> ... from my understanding, all the vCPUs of a domain have to be in
> the same cpupool. So with this approach it is not possible:
> >>>>    1) to have a mix of LITTLE and big vCPUs in the domain
> >>>>    2) to create a domain spanning across two NUMA nodes
> >>>>
> >>>> So I think we need to make sure that any solutions we go through will
> not prevent us to implement those setups.
> >>> The point of this patch is to make all cores available without
> breaking the current behaviour of existing system.
> >>
> >> I might be missing some context here. By breaking current behavior, do
> you mean user that may want to add "hmp-unsafe" on the command line?
> >
> > Right, with hmp-unsafe the behaviour is now the same as without, only
> extra cores are parked in other cpupools.
> >
> > So you have a point in fact that behaviour is changed for someone who
> was using hmp-unsafe before if this is activated.
> > The command line argument suggested by Jurgen definitely makes sense
> here.
> >
> > We could instead do the following:
> > - when this is activated in the configuration, boot all cores and park
> them in different pools (depending on command line argument). Current
> behaviour not change if other pools are not used (but more cores will be on
> in xen)
>
>  From my understanding, it is possible to move a pCPU in/out a pool
> afterwards. So the security concern with big.LITTLE is still present,
> even though it would be difficult to hit it.
>
> I am also concerned that it would be more difficult to detect any
> misconfiguration. So I think this option would still need to be turned
> on only if hmp-unsafe are the new command line one are both on.
>
> If we want to enable it without hmp-unsafe on, we would need to at least
> lock the pools.
>
> > - when hmp-unsafe is on, this feature will be turned of (if activated in
> configuration) and all cores would be added in the same pool.
> >
> > What do you think ?
>
> I am split. On one hand, this is making easier for someone to try
> big.LITTLE as you don't have manually pin vCPUs. On the other hand, this
> is handling a single use-case and you would need to use hmp-unsafe and
> pinning if you want to get more exotic setup (e.g. a domain with
> big.LITTLE).
>
> Another possible solution is to pin dom0 vCPUs (AFAIK they are just
> sticky by default) and then create the pools from dom0 userspace. My
> assumption is for dom0less we would want to use pinning instead.
>
> That said I would like to hear from Xilinx and EPAM as, IIRC, they are
> already using hmp-unsafe in production.
>


We have been using hmp-unsafe since it's introduction, yes we are aware of
possible consequences of enabling that option (as different CPU types might
have different errata, cache lines, PA bits (?), etc), so we are trying to
deal with it carefully.
In our target system we pin Dom0's vCPUs to the pCPUs of the same type from
userspace via "xl vcpu-pin" command, for other domains more exotic
configuration can be used.

I share Stefano's opinion not to tie new feature (boot time MIDR-based
cpupools) to existing hmp-unsafe option and create new command line option
to control new feature.

The proposed algorithm (copy it here for the convenience) looks fine to me.
"So in short I think it should be:
- midr-cpupools alone
cpupools created at boot, warning/errors on changing cpupools
- midr-cpupools + hmp-unsafe
cpupools created at boot, changing cpupools is allowed (we could still
warn but no errors)
- hmp-unsafe alone
same as today with hmp-unsafe"

For the default behaviour I also don't have a strong preference.  One thing
is clear: default behaviour should be safe. I think, the command line
option is preferred over the config option as new feature can be
enabled/disabled without a need to re-build Xen, moreover, if we follow the
proposed algorithm route, the behaviour of new feature at runtime (whether
the changing of cpupools is allowed or not) are going to be depended on the
hmp-unsafe state which is under command line control currently. But, there
is no strong preference here as well.



>
> Cheers,
>
> --
> Julien Grall
>
>

Stefano Stabellini Nov. 18, 2021, 5:27 p.m. UTC | #11

I like all your suggestions below!


On Thu, 18 Nov 2021, Juergen Gross wrote:
> On 18.11.21 03:19, Stefano Stabellini wrote:
> > On Wed, 17 Nov 2021, Julien Grall wrote:
> > > > > > > On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
> > > > > > > 
> > > > > > > Hi Luca,
> > > > > > > 
> > > > > > > On 17/11/2021 09:57, Luca Fancellu wrote:
> > > > > > > > Currently Xen creates a default cpupool0 that contains all the
> > > > > > > > cpu
> > > > > > > > brought up
> > > > > > > > during boot and it assumes that the platform has only one kind
> > > > > > > > of
> > > > > > > > CPU.
> > > > > > > > This assumption does not hold on big.LITTLE platform, but
> > > > > > > > putting
> > > > > > > > different
> > > > > > > > type of CPU in the same cpupool can result in instability and
> > > > > > > > security issues
> > > > > > > > for the domains running on the pool.
> > > > > > > 
> > > > > > > I agree that you can't move a LITTLE vCPU to a big pCPU.
> > > > > > > However...
> > > > > > > 
> > > > > > > > For this reason this serie introduces an architecture specific
> > > > > > > > way
> > > > > > > > to create
> > > > > > > > different cpupool at boot time, this is particularly useful on
> > > > > > > > ARM
> > > > > > > > big.LITTLE
> > > > > > > > platform where there might be the need to have different
> > > > > > > > cpupools
> > > > > > > > for each type
> > > > > > > > of core, but also systems using NUMA can have different cpu pool
> > > > > > > > for
> > > > > > > > each node.
> > > > > > > 
> > > > > > > ... from my understanding, all the vCPUs of a domain have to be in
> > > > > > > the
> > > > > > > same cpupool. So with this approach it is not possible:
> > > > > > >     1) to have a mix of LITTLE and big vCPUs in the domain
> > > > > > >     2) to create a domain spanning across two NUMA nodes
> > > > > > > 
> > > > > > > So I think we need to make sure that any solutions we go through
> > > > > > > will
> > > > > > > not prevent us to implement those setups.
> > > > > > The point of this patch is to make all cores available without
> > > > > > breaking
> > > > > > the current behaviour of existing system.
> > > > > 
> > > > > I might be missing some context here. By breaking current behavior, do
> > > > > you
> > > > > mean user that may want to add "hmp-unsafe" on the command line?
> > > > 
> > > > Right, with hmp-unsafe the behaviour is now the same as without, only
> > > > extra
> > > > cores are parked in other cpupools.
> > > > 
> > > > So you have a point in fact that behaviour is changed for someone who
> > > > was
> > > > using hmp-unsafe before if this is activated.
> > > > The command line argument suggested by Jurgen definitely makes sense
> > > > here.
> > > > 
> > > > We could instead do the following:
> > > > - when this is activated in the configuration, boot all cores and park
> > > > them
> > > > in different pools (depending on command line argument). Current
> > > > behaviour
> > > > not change if other pools are not used (but more cores will be on in
> > > > xen)
> > > 
> > >  From my understanding, it is possible to move a pCPU in/out a pool
> > > afterwards.
> > > So the security concern with big.LITTLE is still present, even though it
> > > would
> > > be difficult to hit it.
> > 
> > As far as I know moving a pCPU in/out of a pool is something that cannot
> > happen automatically: it requires manual intervention to the user and it
> > is uncommon. We could print a warning or simply return error to prevent
> > the action from happening. Or something like:
> > 
> > "This action might result in memory corruptions and invalid behavior. Do
> > you want to continue? [Y/N]
> 
> This should only be rejected if the source and target pool are not
> compatible. So a cpupool could be attributed to allow only specific
> cpus (and maybe domains?) in it.

Yes you are right.


> Otherwise it would be impossible to create new cpupools after boot on
> such a system and populating them with cpus.
>
> > > I am also concerned that it would be more difficult to detect any
> > > misconfiguration. So I think this option would still need to be turned on
> > > only
> > > if hmp-unsafe are the new command line one are both on.
> > > 
> > > If we want to enable it without hmp-unsafe on, we would need to at least
> > > lock
> > > the pools.
> > 
> > Locking the pools is a good idea.
> 
> This would be another option, yes.
> 
> > My preference is not to tie this feature to the hmp-unsafe command line,
> > more on this below.
> 
> I agree.
> 
> > > > - when hmp-unsafe is on, this feature will be turned of (if activated in
> > > > configuration) and all cores would be added in the same pool.
> > > > 
> > > > What do you think ?
> > > 
> > > I am split. On one hand, this is making easier for someone to try
> > > big.LITTLE
> > > as you don't have manually pin vCPUs. On the other hand, this is handling
> > > a
> > > single use-case and you would need to use hmp-unsafe and pinning if you
> > > want
> > > to get more exotic setup (e.g. a domain with big.LITTLE).
> > > 
> > > Another possible solution is to pin dom0 vCPUs (AFAIK they are just sticky
> > > by
> > > default) and then create the pools from dom0 userspace. My assumption is
> > > for
> > > dom0less we would want to use pinning instead.
> > > 
> > > That said I would like to hear from Xilinx and EPAM as, IIRC, they are
> > > already
> > > using hmp-unsafe in production.
> > 
> > This discussion has been very interesting, it is cool to hear new ideas
> > like this one. I have a couple of thoughts to share.
> > 
> > First I think that the ability of creating cpupools at boot time is
> > super important. It goes way beyond big.LITTLE. It would be incredibly
> > useful to separate real-time (sched=null) and non-real-time
> > (sched=credit2) workloads. I think it will only become more important
> > going forward so I'd love to see an option to configure cpupools that
> > works for dom0less. It could be based on device tree properties rather
> > than kconfig options.
> 
> I think device tree AND command line option should be possible (think of
> x86 here).

Sure


> > It is true that if we had the devicetree-based cpupool configuration I
> > mentioned, then somebody could use it to create cpupools matching
> > big.LITTLE. So "in theory" it solves the problem. However, I think that
> > for big.LITTLE it would be suboptimal. For big.LITTLE it would be best
> > if Xen configured the cpupools automatically rather than based on the
> > device tree configuration. That way, it is going to work automatically
> > without extra steps even in the simplest Xen setups.
> > 
> > So I think that it is a good idea to have a command line option (better
> > than a kconfig option) to trigger the MIDR-based cpupool creation at
> > boot time. The option could be called midr-cpupools=on/off or
> > hw-cpupools=on/off for example.
> 
> I'd rather go for:
> 
> cpupools=<options>
> 
> With e.g. <options>:
> 
> - "auto-midr": split system into cpupools based on MIDR
> - "auto-numa": split system into cpupools based on NUMA nodes
> - "cpus=<list of cpus>[,sched=<scheduler>]
> 
> This would be rather flexible without adding more and more options
> doing similar things. Other sub-options could be added rather easily.

I like this


> > In terms of whether it should be the default or not, I don't feel
> > strongly about it. Unfortunately we (Xilinx) rely on a number of
> > non-default options already so we are already in the situation where we
> > have to be extra-careful at the options passed. I don't think that
> > adding one more would make a significant difference either way.
> > 
> > 
> > But my preference is *not* to tie the new command line option with
> > hmp-unsafe because if you use midr-cpupools and don't mess with the
> > pools then it is actually safe. We could even lock the cpupools like
> > Julien suggested or warn/return error on changing the cpupools. In this
> > scenario, it would be detrimental to also pass hmp-unsafe: it would
> > allow actually unsafe configurations that the user wanted to avoid by
> > using midr-cpupools. It would end up disabling checks we could put in
> > place to make midr-cpupools safer.
> > 
> > So in short I think it should be:
> > 
> > - midr-cpupools alone
> > cpupools created at boot, warning/errors on changing cpupools
> > 
> > - midr-cpupools + hmp-unsafe
> > cpupools created at boot, changing cpupools is allowed (we could still
> > warn but no errors)
> 
> I'd rather add an explicit ",locked" option to above cpupools parameter.

yeah that's better

 
> > 
> > - hmp-unsafe alone
> > same as today with hmp-unsafe
> > 
> > 
> > For the default as I said I don't have a strong preference. I think
> > midr-cpupools could be "on" be default.
> > 
> 
> What about making this a Kconfig option?

Could also be a good idea

Julien Grall Nov. 19, 2021, 6:02 p.m. UTC | #12

Hi Stefano,

On 18/11/2021 02:19, Stefano Stabellini wrote:
> On Wed, 17 Nov 2021, Julien Grall wrote:
>>>>>> On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
>>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> On 17/11/2021 09:57, Luca Fancellu wrote:
>>>>>>> Currently Xen creates a default cpupool0 that contains all the cpu
>>>>>>> brought up
>>>>>>> during boot and it assumes that the platform has only one kind of
>>>>>>> CPU.
>>>>>>> This assumption does not hold on big.LITTLE platform, but putting
>>>>>>> different
>>>>>>> type of CPU in the same cpupool can result in instability and
>>>>>>> security issues
>>>>>>> for the domains running on the pool.
>>>>>>
>>>>>> I agree that you can't move a LITTLE vCPU to a big pCPU. However...
>>>>>>
>>>>>>> For this reason this serie introduces an architecture specific way
>>>>>>> to create
>>>>>>> different cpupool at boot time, this is particularly useful on ARM
>>>>>>> big.LITTLE
>>>>>>> platform where there might be the need to have different cpupools
>>>>>>> for each type
>>>>>>> of core, but also systems using NUMA can have different cpu pool for
>>>>>>> each node.
>>>>>>
>>>>>> ... from my understanding, all the vCPUs of a domain have to be in the
>>>>>> same cpupool. So with this approach it is not possible:
>>>>>>     1) to have a mix of LITTLE and big vCPUs in the domain
>>>>>>     2) to create a domain spanning across two NUMA nodes
>>>>>>
>>>>>> So I think we need to make sure that any solutions we go through will
>>>>>> not prevent us to implement those setups.
>>>>> The point of this patch is to make all cores available without breaking
>>>>> the current behaviour of existing system.
>>>>
>>>> I might be missing some context here. By breaking current behavior, do you
>>>> mean user that may want to add "hmp-unsafe" on the command line?
>>>
>>> Right, with hmp-unsafe the behaviour is now the same as without, only extra
>>> cores are parked in other cpupools.
>>>
>>> So you have a point in fact that behaviour is changed for someone who was
>>> using hmp-unsafe before if this is activated.
>>> The command line argument suggested by Jurgen definitely makes sense here.
>>>
>>> We could instead do the following:
>>> - when this is activated in the configuration, boot all cores and park them
>>> in different pools (depending on command line argument). Current behaviour
>>> not change if other pools are not used (but more cores will be on in xen)
>>
>>  From my understanding, it is possible to move a pCPU in/out a pool afterwards.
>> So the security concern with big.LITTLE is still present, even though it would
>> be difficult to hit it.
> 
> As far as I know moving a pCPU in/out of a pool is something that cannot
> happen automatically: it requires manual intervention to the user and it
> is uncommon. 
> We could print a warning or simply return error to prevent
> the action from happening. Or something like:
> 
> "This action might result in memory corruptions and invalid behavior. Do
> you want to continue? [Y/N]
> 
> 
>> I am also concerned that it would be more difficult to detect any
>> misconfiguration. So I think this option would still need to be turned on only
>> if hmp-unsafe are the new command line one are both on.
>>
>> If we want to enable it without hmp-unsafe on, we would need to at least lock
>> the pools.
> 
> Locking the pools is a good idea.
> 
> My preference is not to tie this feature to the hmp-unsafe command line,
> more on this below.

The only reason I suggested to tie with hmp-unsafe is if you (or anyone 
else) really wanted to use a version where the pool are unlocked.

If we are going to implement the lock, then I think the hmp-unsafe would 
not be necessary for such configuration.

> 
> 
>>> - when hmp-unsafe is on, this feature will be turned of (if activated in
>>> configuration) and all cores would be added in the same pool.
>>>
>>> What do you think ?
>>
>> I am split. On one hand, this is making easier for someone to try big.LITTLE
>> as you don't have manually pin vCPUs. On the other hand, this is handling a
>> single use-case and you would need to use hmp-unsafe and pinning if you want
>> to get more exotic setup (e.g. a domain with big.LITTLE).
>>
>> Another possible solution is to pin dom0 vCPUs (AFAIK they are just sticky by
>> default) and then create the pools from dom0 userspace. My assumption is for
>> dom0less we would want to use pinning instead.
>>
>> That said I would like to hear from Xilinx and EPAM as, IIRC, they are already
>> using hmp-unsafe in production.
> 
> This discussion has been very interesting, it is cool to hear new ideas
> like this one. I have a couple of thoughts to share.
> 
> First I think that the ability of creating cpupools at boot time is
> super important. It goes way beyond big.LITTLE. It would be incredibly
> useful to separate real-time (sched=null) and non-real-time
> (sched=credit2) workloads. I think it will only become more important
> going forward so I'd love to see an option to configure cpupools that
> works for dom0less. It could be based on device tree properties rather
> than kconfig options.
> 
> It is true that if we had the devicetree-based cpupool configuration I
> mentioned, then somebody could use it to create cpupools matching
> big.LITTLE. > So "in theory" it solves the problem. However, I think that
> for big.LITTLE it would be suboptimal. For big.LITTLE it would be best
> if Xen configured the cpupools automatically rather than based on the
> device tree configuration. 

This brings one question. How do Linux detect and use big.LITTLE? Is 
there a Device-Tree binding?

[...]

> So I think that it is a good idea to have a command line option (better
> than a kconfig option) to trigger the MIDR-based cpupool creation at
> boot time. The option could be called midr-cpupools=on/off or
> hw-cpupools=on/off for example.
> In terms of whether it should be the default or not, I don't feel
> strongly about it.

On by default means this will security supported and we need to be 
reasonably confident this cannot be broken.

In this case, I am not only referring to moving a pCPU between pool but 
also Xen doing the right thing (e.g. finding the minimum cache line size).


[...]

> - midr-cpupools alone
> cpupools created at boot, warning/errors on changing cpupools >
> - midr-cpupools + hmp-unsafe
> cpupools created at boot, changing cpupools is allowed (we could still
> warn but no errors)
> 
> - hmp-unsafe alone
> same as today with hmp-unsafe

I like better Juergen's version. But before agreeing on the command line 
, I would like to understand how Linux is dealing with big.LITTLE today 
(see my question above).

> 
> For the default as I said I don't have a strong preference. I think
> midr-cpupools could be "on" be default.

I think this should be off at the beginning until the feature is matured 
enough. We are soon opening the tree again for the next development 
cycle. So I think we have enough time to make sure everything work 
properly to have turned on by default before next release.


Cheers,

Stefano Stabellini Nov. 19, 2021, 6:55 p.m. UTC | #13

On Fri, 19 Nov 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 18/11/2021 02:19, Stefano Stabellini wrote:
> > On Wed, 17 Nov 2021, Julien Grall wrote:
> > > > > > > On 17 Nov 2021, at 10:26, Julien Grall <julien@xen.org> wrote:
> > > > > > > 
> > > > > > > Hi Luca,
> > > > > > > 
> > > > > > > On 17/11/2021 09:57, Luca Fancellu wrote:
> > > > > > > > Currently Xen creates a default cpupool0 that contains all the
> > > > > > > > cpu
> > > > > > > > brought up
> > > > > > > > during boot and it assumes that the platform has only one kind
> > > > > > > > of
> > > > > > > > CPU.
> > > > > > > > This assumption does not hold on big.LITTLE platform, but
> > > > > > > > putting
> > > > > > > > different
> > > > > > > > type of CPU in the same cpupool can result in instability and
> > > > > > > > security issues
> > > > > > > > for the domains running on the pool.
> > > > > > > 
> > > > > > > I agree that you can't move a LITTLE vCPU to a big pCPU.
> > > > > > > However...
> > > > > > > 
> > > > > > > > For this reason this serie introduces an architecture specific
> > > > > > > > way
> > > > > > > > to create
> > > > > > > > different cpupool at boot time, this is particularly useful on
> > > > > > > > ARM
> > > > > > > > big.LITTLE
> > > > > > > > platform where there might be the need to have different
> > > > > > > > cpupools
> > > > > > > > for each type
> > > > > > > > of core, but also systems using NUMA can have different cpu pool
> > > > > > > > for
> > > > > > > > each node.
> > > > > > > 
> > > > > > > ... from my understanding, all the vCPUs of a domain have to be in
> > > > > > > the
> > > > > > > same cpupool. So with this approach it is not possible:
> > > > > > >     1) to have a mix of LITTLE and big vCPUs in the domain
> > > > > > >     2) to create a domain spanning across two NUMA nodes
> > > > > > > 
> > > > > > > So I think we need to make sure that any solutions we go through
> > > > > > > will
> > > > > > > not prevent us to implement those setups.
> > > > > > The point of this patch is to make all cores available without
> > > > > > breaking
> > > > > > the current behaviour of existing system.
> > > > > 
> > > > > I might be missing some context here. By breaking current behavior, do
> > > > > you
> > > > > mean user that may want to add "hmp-unsafe" on the command line?
> > > > 
> > > > Right, with hmp-unsafe the behaviour is now the same as without, only
> > > > extra
> > > > cores are parked in other cpupools.
> > > > 
> > > > So you have a point in fact that behaviour is changed for someone who
> > > > was
> > > > using hmp-unsafe before if this is activated.
> > > > The command line argument suggested by Jurgen definitely makes sense
> > > > here.
> > > > 
> > > > We could instead do the following:
> > > > - when this is activated in the configuration, boot all cores and park
> > > > them
> > > > in different pools (depending on command line argument). Current
> > > > behaviour
> > > > not change if other pools are not used (but more cores will be on in
> > > > xen)
> > > 
> > >  From my understanding, it is possible to move a pCPU in/out a pool
> > > afterwards.
> > > So the security concern with big.LITTLE is still present, even though it
> > > would
> > > be difficult to hit it.
> > 
> > As far as I know moving a pCPU in/out of a pool is something that cannot
> > happen automatically: it requires manual intervention to the user and it
> > is uncommon. We could print a warning or simply return error to prevent
> > the action from happening. Or something like:
> > 
> > "This action might result in memory corruptions and invalid behavior. Do
> > you want to continue? [Y/N]
> > 
> > 
> > > I am also concerned that it would be more difficult to detect any
> > > misconfiguration. So I think this option would still need to be turned on
> > > only
> > > if hmp-unsafe are the new command line one are both on.
> > > 
> > > If we want to enable it without hmp-unsafe on, we would need to at least
> > > lock
> > > the pools.
> > 
> > Locking the pools is a good idea.
> > 
> > My preference is not to tie this feature to the hmp-unsafe command line,
> > more on this below.
> 
> The only reason I suggested to tie with hmp-unsafe is if you (or anyone else)
> really wanted to use a version where the pool are unlocked.
> 
> If we are going to implement the lock, then I think the hmp-unsafe would not
> be necessary for such configuration.
> 
> > 
> > 
> > > > - when hmp-unsafe is on, this feature will be turned of (if activated in
> > > > configuration) and all cores would be added in the same pool.
> > > > 
> > > > What do you think ?
> > > 
> > > I am split. On one hand, this is making easier for someone to try
> > > big.LITTLE
> > > as you don't have manually pin vCPUs. On the other hand, this is handling
> > > a
> > > single use-case and you would need to use hmp-unsafe and pinning if you
> > > want
> > > to get more exotic setup (e.g. a domain with big.LITTLE).
> > > 
> > > Another possible solution is to pin dom0 vCPUs (AFAIK they are just sticky
> > > by
> > > default) and then create the pools from dom0 userspace. My assumption is
> > > for
> > > dom0less we would want to use pinning instead.
> > > 
> > > That said I would like to hear from Xilinx and EPAM as, IIRC, they are
> > > already
> > > using hmp-unsafe in production.
> > 
> > This discussion has been very interesting, it is cool to hear new ideas
> > like this one. I have a couple of thoughts to share.
> > 
> > First I think that the ability of creating cpupools at boot time is
> > super important. It goes way beyond big.LITTLE. It would be incredibly
> > useful to separate real-time (sched=null) and non-real-time
> > (sched=credit2) workloads. I think it will only become more important
> > going forward so I'd love to see an option to configure cpupools that
> > works for dom0less. It could be based on device tree properties rather
> > than kconfig options.
> > 
> > It is true that if we had the devicetree-based cpupool configuration I
> > mentioned, then somebody could use it to create cpupools matching
> > big.LITTLE. > So "in theory" it solves the problem. However, I think that
> > for big.LITTLE it would be suboptimal. For big.LITTLE it would be best
> > if Xen configured the cpupools automatically rather than based on the
> > device tree configuration. 
> 
> This brings one question. How do Linux detect and use big.LITTLE? Is there a
> Device-Tree binding?
> 
> [...]
> 
> > So I think that it is a good idea to have a command line option (better
> > than a kconfig option) to trigger the MIDR-based cpupool creation at
> > boot time. The option could be called midr-cpupools=on/off or
> > hw-cpupools=on/off for example.
> > In terms of whether it should be the default or not, I don't feel
> > strongly about it.
> 
> On by default means this will security supported and we need to be reasonably
> confident this cannot be broken.
> 
> In this case, I am not only referring to moving a pCPU between pool but also
> Xen doing the right thing (e.g. finding the minimum cache line size).
> 
> 
> [...]
> 
> > - midr-cpupools alone
> > cpupools created at boot, warning/errors on changing cpupools >
> > - midr-cpupools + hmp-unsafe
> > cpupools created at boot, changing cpupools is allowed (we could still
> > warn but no errors)
> > 
> > - hmp-unsafe alone
> > same as today with hmp-unsafe
> 
> I like better Juergen's version. But before agreeing on the command line , I
> would like to understand how Linux is dealing with big.LITTLE today (see my
> question above).

I also like Juergen's version better :-)

The device tree binding that covers big.LITTLE is the same that covers
cpus: Documentation/devicetree/bindings/arm/cpus.yaml

So basically, you can tell it is a big.LITTLE system because the
compatible strings differ between certain cpus, e.g.:

      cpu@0 {
        device_type = "cpu";
        compatible = "arm,cortex-a15";
        reg = <0x0>;
      };

      cpu@1 {
        device_type = "cpu";
        compatible = "arm,cortex-a15";
        reg = <0x1>;
      };

      cpu@100 {
        device_type = "cpu";
        compatible = "arm,cortex-a7";
        reg = <0x100>;
      };

      cpu@101 {
        device_type = "cpu";
        compatible = "arm,cortex-a7";
        reg = <0x101>;
      };


> > For the default as I said I don't have a strong preference. I think
> > midr-cpupools could be "on" be default.
> 
> I think this should be off at the beginning until the feature is matured
> enough. We are soon opening the tree again for the next development cycle. So
> I think we have enough time to make sure everything work properly to have
> turned on by default before next release.

That's fine

Julien Grall Nov. 23, 2021, 1:54 p.m. UTC | #14

Hi Stefano,

On 19/11/2021 18:55, Stefano Stabellini wrote:
> On Fri, 19 Nov 2021, Julien Grall wrote:
>> I like better Juergen's version. But before agreeing on the command line , I
>> would like to understand how Linux is dealing with big.LITTLE today (see my
>> question above).
> 
> I also like Juergen's version better :-)
> 
> The device tree binding that covers big.LITTLE is the same that covers
> cpus: Documentation/devicetree/bindings/arm/cpus.yaml

Are you sure? I found 
Documentation/devicetree/bindings/arm/cpu-capacity.txt.

> 
> So basically, you can tell it is a big.LITTLE system because the
> compatible strings differ between certain cpus, e.g.:
> 
>        cpu@0 {
>          device_type = "cpu";
>          compatible = "arm,cortex-a15";
>          reg = <0x0>;
>        };
> 
>        cpu@1 {
>          device_type = "cpu";
>          compatible = "arm,cortex-a15";
>          reg = <0x1>;
>        };
> 
>        cpu@100 {
>          device_type = "cpu";
>          compatible = "arm,cortex-a7";
>          reg = <0x100>;
>        };
> 
>        cpu@101 {
>          device_type = "cpu";
>          compatible = "arm,cortex-a7";
>          reg = <0x101>;
>        };

I have not found any code in Linux using the compatible. Instead, they 
all seem to use the cpu-map (see drivers/base/arch_topology.c).

Anyway, to me it would be better to parse the Device-Tree over using the 
MIDR. The advantage is we can cover a wide range of cases (you may have 
processor with the same MIDR but different frequency). For now, we could 
create a cpupool per cluster.

Cheers,

Bertrand Marquis Nov. 23, 2021, 2:45 p.m. UTC | #15

Hi Julien,

> On 23 Nov 2021, at 13:54, Julien Grall <julien@xen.org> wrote:
> 
> Hi Stefano,
> 
> On 19/11/2021 18:55, Stefano Stabellini wrote:
>> On Fri, 19 Nov 2021, Julien Grall wrote:
>>> I like better Juergen's version. But before agreeing on the command line , I
>>> would like to understand how Linux is dealing with big.LITTLE today (see my
>>> question above).
>> I also like Juergen's version better :-)
>> The device tree binding that covers big.LITTLE is the same that covers
>> cpus: Documentation/devicetree/bindings/arm/cpus.yaml
> 
> Are you sure? I found Documentation/devicetree/bindings/arm/cpu-capacity.txt.
> 
>> So basically, you can tell it is a big.LITTLE system because the
>> compatible strings differ between certain cpus, e.g.:
>>       cpu@0 {
>>         device_type = "cpu";
>>         compatible = "arm,cortex-a15";
>>         reg = <0x0>;
>>       };
>>       cpu@1 {
>>         device_type = "cpu";
>>         compatible = "arm,cortex-a15";
>>         reg = <0x1>;
>>       };
>>       cpu@100 {
>>         device_type = "cpu";
>>         compatible = "arm,cortex-a7";
>>         reg = <0x100>;
>>       };
>>       cpu@101 {
>>         device_type = "cpu";
>>         compatible = "arm,cortex-a7";
>>         reg = <0x101>;
>>       };
> 
> I have not found any code in Linux using the compatible. Instead, they all seem to use the cpu-map (see drivers/base/arch_topology.c).
> 
> Anyway, to me it would be better to parse the Device-Tree over using the MIDR. The advantage is we can cover a wide range of cases (you may have processor with the same MIDR but different frequency). For now, we could create a cpupool per cluster.

This is a really good idea as this could also work for NUMA systems.

So reg & ~0xff would give the cluster number ?

We will probably end up splitting cores into different cpupools even if they are all the same as there are several CPUs having several clusters but the same cores (I recall some NXP boards being like that).

Some device tree also have a cpu-map definition:
        cpu-map {
            cluster0 {
                core0 {
                    cpu = <&A57_0>;
                };
                core1 {
                    cpu = <&A57_1>;
                };
            };

            cluster1 {
                core0 {
                    cpu = <&A53_0>;
                };
                core1 {
                    cpu = <&A53_1>;
                };
                core2 {
                    cpu = <&A53_2>;
                };
                core3 {
                    cpu = <&A53_3>;
                };
            };
        };

@stefano: is the cpu-map something standard ? Lots of device trees do have it in Linux now but I do not recall seeing that on older device trees.

Maybe using cpu-map could be a solution, we could say that device tree without a cpu-map are not supported.

Cheers
Bertrand



> 
> Cheers,
> 
> -- 
> Julien Grall

Stefano Stabellini Nov. 23, 2021, 10:01 p.m. UTC | #16

On Tue, 23 Nov 2021, Bertrand Marquis wrote:
> Hi Julien,
> 
> > On 23 Nov 2021, at 13:54, Julien Grall <julien@xen.org> wrote:
> > 
> > Hi Stefano,
> > 
> > On 19/11/2021 18:55, Stefano Stabellini wrote:
> >> On Fri, 19 Nov 2021, Julien Grall wrote:
> >>> I like better Juergen's version. But before agreeing on the command line , I
> >>> would like to understand how Linux is dealing with big.LITTLE today (see my
> >>> question above).
> >> I also like Juergen's version better :-)
> >> The device tree binding that covers big.LITTLE is the same that covers
> >> cpus: Documentation/devicetree/bindings/arm/cpus.yaml
> > 
> > Are you sure? I found Documentation/devicetree/bindings/arm/cpu-capacity.txt.
> > 
> >> So basically, you can tell it is a big.LITTLE system because the
> >> compatible strings differ between certain cpus, e.g.:
> >>       cpu@0 {
> >>         device_type = "cpu";
> >>         compatible = "arm,cortex-a15";
> >>         reg = <0x0>;
> >>       };
> >>       cpu@1 {
> >>         device_type = "cpu";
> >>         compatible = "arm,cortex-a15";
> >>         reg = <0x1>;
> >>       };
> >>       cpu@100 {
> >>         device_type = "cpu";
> >>         compatible = "arm,cortex-a7";
> >>         reg = <0x100>;
> >>       };
> >>       cpu@101 {
> >>         device_type = "cpu";
> >>         compatible = "arm,cortex-a7";
> >>         reg = <0x101>;
> >>       };
> > 
> > I have not found any code in Linux using the compatible. Instead, they all seem to use the cpu-map (see drivers/base/arch_topology.c).
> > 
> > Anyway, to me it would be better to parse the Device-Tree over using the MIDR. The advantage is we can cover a wide range of cases (you may have processor with the same MIDR but different frequency). For now, we could create a cpupool per cluster.
> 
> This is a really good idea as this could also work for NUMA systems.
> 
> So reg & ~0xff would give the cluster number ?
> 
> We will probably end up splitting cores into different cpupools even if they are all the same as there are several CPUs having several clusters but the same cores (I recall some NXP boards being like that).
> 
> Some device tree also have a cpu-map definition:
>         cpu-map {
>             cluster0 {
>                 core0 {
>                     cpu = <&A57_0>;
>                 };
>                 core1 {
>                     cpu = <&A57_1>;
>                 };
>             };
> 
>             cluster1 {
>                 core0 {
>                     cpu = <&A53_0>;
>                 };
>                 core1 {
>                     cpu = <&A53_1>;
>                 };
>                 core2 {
>                     cpu = <&A53_2>;
>                 };
>                 core3 {
>                     cpu = <&A53_3>;
>                 };
>             };
>         };
> 
> @stefano: is the cpu-map something standard ? Lots of device trees do have it in Linux now but I do not recall seeing that on older device trees.
> Maybe using cpu-map could be a solution, we could say that device tree without a cpu-map are not supported.


cpu-map is newer than big.LITTLE support in Linux. See for instance
commit 4ab328f06. Before cpu-map was introduced, Linux used mostly the
MPIDR or sometimes the *machine* compatible string. I did find one
example of a board where the cpu compatible string is the same for both
big and LITTLE CPUs: arch/arm64/boot/dts/rockchip/rk3368.dtsi. That
could be the reason why the cpu compatible string is not used for
detecting big.LITTLE. Sorry about that, my earlier suggestion was wrong.

Yes I think using cpu-map would be fine. It seems to be widely used by
different vendors. (Even if something as generic as cpu-map should
really be under the devicetree.org spec, not under Linux, but anyway.)

Only one note: you mentioned NUMA. As you can see from
Documentation/devicetree/bindings/numa.txt, NUMA doesn't use cpu-map.
Instead, it uses numa-node-id and distance-map.

Luca Fancellu Dec. 7, 2021, 9:27 a.m. UTC | #17

Hi all,

Thank you for all your feedbacks, sorry for the late response. Given the amount of suggestions I’ve been working
on a proposal for the boot time cpupools that I hope could be good for everyone.

The feature will be enabled by CONFIG_BOOT_TIME_CPUPOOLS, so without it everything is behaving as now.

When the feature is enabled, the code will check the device tree and use informations from there to create the cpupools:

a72_1: cpu@0 {
        compatible = "arm,cortex-a72";
        reg = <0x0 0x0>;
        device_type = "cpu";
        [...]
};

a72_2: cpu@1 {
        compatible = "arm,cortex-a72";
        reg = <0x0 0x1>;
        device_type = "cpu";
        [...]
};

cpu@2 {
        compatible = "arm,cortex-a72";
        reg = <0x0 0x2>;
        device_type = "cpu";
        [...]
};

a53_1: cpu@100 {
        compatible = "arm,cortex-a53";
        reg = <0x0 0x100>;
        device_type = "cpu";
        [...]
};

a53_2: cpu@101 {
        compatible = "arm,cortex-a53";
        reg = <0x0 0x101>;
        device_type = "cpu";
        [...]
};

chosen {

    cpupool_a {
        compatible = "xen,cpupool";
        xen,cpupool-id = <0>;
        xen,cpupool-cpus = <&a72_1 &a72_2>;     
    };
    cpupool_b {
        compatible = "xen,cpupool";
        xen,cpupool-id = <1>;
        xen,cpupool-cpus = <&a53_1 &a53_2>;
        xen,cpupool-sched = "credit2";
    };
    
   […]

};

So for every node under chosen with the compatible “xen,cpupool”, a cpupool is created (if it doesn’t exists).

Mandatory properties of that node are: 
 - “xen,cpupool-id” which identifies the id of the pool
 - “xen,cpupool-cpus” which lists the handle of the cpus

Optional property is “xen,cpupool-sched” which is a string that identifies the scheduler. A cpupool with identifier
0 (zero) can’t have that property, it will get the default scheduler from Xen.

A set of rules are applied:

  1) The cpupool with id 0 is always created, being it listed or not in the DT
  2) The cpupool with id 0 must have at least one cpu, if it doesn’t the system will panic.
  3) Every cpu that is not assigned to any cpupool will be automatically assigned to the cpupool with id 0 
      (only cpu that are brought up by Xen)
  4) When a cpu is assigned to a cpupool in the DT, but the cpu is not up, the system will panic.

So, given this explanation, the above example will create a system with two cpupool:

 - cpupool with id 0 containing 3 cpu a72 (two are explicitly listed, one was not assigned to any other cpupool)
 - cpupool with id 1 containing 2 cpu a53 (cpus explicitly listed)

Clearly the above example works only if Xen is started using the hmp-unsafe=1 parameter, otherwise some cpus
won’t be started.


Given the above example, we might be able to have an option like this (“xen,domain-cpupool-id”) to assign
dom0less guests to cpupools:

chosen {

    cpupool_a {
        compatible = "xen,cpupool";
        xen,cpupool-id = <0>;
        xen,cpupool-cpus = <&a72_1 &a72_2>;     
    };
    cpupool_b {
        compatible = "xen,cpupool";
        xen,cpupool-id = <1>;
        xen,cpupool-cpus = <&a53_1 &a53_2>;
        xen,cpupool-sched = "credit2";
    };
    
    domU1 {
        #size-cells = <0x1>;
        #address-cells = <0x1>;
        compatible = "xen,domain";
        cpus = <0x1>;
        memory = <0x0 0xc0000>;
        xen,domain-cpupool-id = <1>;            /* Optional */
        vpl011;

        module@0 {
            compatible = "multiboot,kernel", "multiboot,module";
            […]
        };
    };

};


Any thoughts on this?

Cheers,
Luca

[RFC,0/2] Boot time cpupools

Message

Comments