mbox series

[RFC,0/8] SVE feature for arm guests

Message ID 20230111143826.3224-1-luca.fancellu@arm.com (mailing list archive)
Headers show
Series SVE feature for arm guests | expand

Message

Luca Fancellu Jan. 11, 2023, 2:38 p.m. UTC
This serie is introducing the possibility for Dom0 and DomU guests to use
sve/sve2 instructions.

SVE feature introduces new instruction and registers to improve performances on
floating point operations.

The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
when available the ID_AA64ZFR0_EL1 register provides additional information
about the implemented version and other SVE feature.

New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.

Z0-Z31 are scalable vector register whose size is implementation defined and
goes from 128 bits to maximum 2048, the term vector length will be used to refer
to this quantity.
P0-P15 are predicate registers and the size is the vector length divided by 8,
same size is the FFR (First Fault Register).
ZCR_ELx is a register that can control and restrict the maximum vector length
used by the <x> exception level and all the lower exception levels, so for
example EL3 can restrict the vector length usable by EL3,2,1,0.

The platform has a maximum implemented vector length, so for every value
written in ZCR register, if this value is above the implemented length, then the
lower value will be used. The RDVL instruction can be used to check what vector
length is the HW using after setting ZCR.

For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.

SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
register is added to the domain state, to be able to trap only the guests that
are not allowed to use SVE.

This serie is introducing a command line parameter to enable Dom0 to use SVE and
to set its maximum vector length that by default is 0 which means the guest is
not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
the selected value used as maximum allowed vector length (which could be lower
if the implemented one is lower).
For DomUs, an XL parameter with the same way of use is introduced and a dom0less
DTB binding is created.

The context switch is the most critical part because there can be big registers
to be saved, in this serie an easy approach is used and the context is
saved/restored every time for the guests that are allowed to use SVE.


Luca Fancellu (8):
  xen/arm: enable SVE extension for Xen
  xen/arm: add sve_vl_bits field to domain
  xen/arm: Expose SVE feature to the guest
  xen/arm: add SVE exception class handling
  arm/sve: save/restore SVE context switch
  xen/arm: enable Dom0 to use SVE feature
  xen/tools: add sve parameter in XL configuration
  xen/arm: add sve property for dom0less domUs

 docs/man/xl.cfg.5.pod.in                 |  11 ++
 docs/misc/arm/device-tree/booting.txt    |   7 +
 docs/misc/xen-command-line.pandoc        |  12 ++
 tools/golang/xenlight/helpers.gen.go     |   2 +
 tools/golang/xenlight/types.gen.go       |   1 +
 tools/include/libxl.h                    |   5 +
 tools/libs/light/libxl_arm.c             |   2 +
 tools/libs/light/libxl_types.idl         |   1 +
 tools/xl/xl_parse.c                      |  10 ++
 xen/arch/arm/Kconfig                     |   3 +-
 xen/arch/arm/arm64/Makefile              |   1 +
 xen/arch/arm/arm64/cpufeature.c          |   7 +-
 xen/arch/arm/arm64/sve.c                 | 104 +++++++++++++
 xen/arch/arm/arm64/sve_asm.S             | 189 +++++++++++++++++++++++
 xen/arch/arm/arm64/vfp.c                 |  79 ++++++----
 xen/arch/arm/arm64/vsysreg.c             |  39 ++++-
 xen/arch/arm/cpufeature.c                |   6 +-
 xen/arch/arm/domain.c                    |  61 ++++++++
 xen/arch/arm/domain_build.c              |  11 ++
 xen/arch/arm/include/asm/arm64/sve.h     |  72 +++++++++
 xen/arch/arm/include/asm/arm64/sysregs.h |   4 +
 xen/arch/arm/include/asm/arm64/vfp.h     |  10 ++
 xen/arch/arm/include/asm/cpufeature.h    |  14 ++
 xen/arch/arm/include/asm/domain.h        |   8 +
 xen/arch/arm/include/asm/processor.h     |   3 +
 xen/arch/arm/setup.c                     |   5 +-
 xen/arch/arm/traps.c                     |  46 ++++--
 xen/include/public/arch-arm.h            |   2 +
 xen/include/public/domctl.h              |   2 +-
 29 files changed, 661 insertions(+), 56 deletions(-)
 create mode 100644 xen/arch/arm/arm64/sve.c
 create mode 100644 xen/arch/arm/arm64/sve_asm.S
 create mode 100644 xen/arch/arm/include/asm/arm64/sve.h

Comments

Julien Grall Jan. 11, 2023, 4:59 p.m. UTC | #1
Hi Luca,

On 11/01/2023 14:38, Luca Fancellu wrote:
> This serie is introducing the possibility for Dom0 and DomU guests to use
> sve/sve2 instructions.
> 
> SVE feature introduces new instruction and registers to improve performances on
> floating point operations.
> 
> The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
> when available the ID_AA64ZFR0_EL1 register provides additional information
> about the implemented version and other SVE feature.
> 
> New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
> 
> Z0-Z31 are scalable vector register whose size is implementation defined and
> goes from 128 bits to maximum 2048, the term vector length will be used to refer
> to this quantity.
> P0-P15 are predicate registers and the size is the vector length divided by 8,
> same size is the FFR (First Fault Register).
> ZCR_ELx is a register that can control and restrict the maximum vector length
> used by the <x> exception level and all the lower exception levels, so for
> example EL3 can restrict the vector length usable by EL3,2,1,0.
> 
> The platform has a maximum implemented vector length, so for every value
> written in ZCR register, if this value is above the implemented length, then the
> lower value will be used. The RDVL instruction can be used to check what vector
> length is the HW using after setting ZCR.
> 
> For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
> need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.
> 
> SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
> register is added to the domain state, to be able to trap only the guests that
> are not allowed to use SVE.
> 
> This serie is introducing a command line parameter to enable Dom0 to use SVE and
> to set its maximum vector length that by default is 0 which means the guest is
> not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
> the selected value used as maximum allowed vector length (which could be lower
> if the implemented one is lower).
> For DomUs, an XL parameter with the same way of use is introduced and a dom0less
> DTB binding is created.
> 
> The context switch is the most critical part because there can be big registers
> to be saved, in this serie an easy approach is used and the context is
> saved/restored every time for the guests that are allowed to use SVE.

This would be OK for an initial approach. But I would be worry to 
officially support SVE because of the potential large impact on other users.

What's the long term plan?

Cheers,
Luca Fancellu Jan. 12, 2023, 11:58 a.m. UTC | #2
> On 11 Jan 2023, at 16:59, Julien Grall <julien@xen.org> wrote:
> 
> Hi Luca,
> 
> On 11/01/2023 14:38, Luca Fancellu wrote:
>> This serie is introducing the possibility for Dom0 and DomU guests to use
>> sve/sve2 instructions.
>> SVE feature introduces new instruction and registers to improve performances on
>> floating point operations.
>> The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
>> when available the ID_AA64ZFR0_EL1 register provides additional information
>> about the implemented version and other SVE feature.
>> New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
>> Z0-Z31 are scalable vector register whose size is implementation defined and
>> goes from 128 bits to maximum 2048, the term vector length will be used to refer
>> to this quantity.
>> P0-P15 are predicate registers and the size is the vector length divided by 8,
>> same size is the FFR (First Fault Register).
>> ZCR_ELx is a register that can control and restrict the maximum vector length
>> used by the <x> exception level and all the lower exception levels, so for
>> example EL3 can restrict the vector length usable by EL3,2,1,0.
>> The platform has a maximum implemented vector length, so for every value
>> written in ZCR register, if this value is above the implemented length, then the
>> lower value will be used. The RDVL instruction can be used to check what vector
>> length is the HW using after setting ZCR.
>> For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
>> need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.
>> SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
>> register is added to the domain state, to be able to trap only the guests that
>> are not allowed to use SVE.
>> This serie is introducing a command line parameter to enable Dom0 to use SVE and
>> to set its maximum vector length that by default is 0 which means the guest is
>> not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
>> the selected value used as maximum allowed vector length (which could be lower
>> if the implemented one is lower).
>> For DomUs, an XL parameter with the same way of use is introduced and a dom0less
>> DTB binding is created.
>> The context switch is the most critical part because there can be big registers
>> to be saved, in this serie an easy approach is used and the context is
>> saved/restored every time for the guests that are allowed to use SVE.
> 
> This would be OK for an initial approach. But I would be worry to officially support SVE because of the potential large impact on other users.
> 
> What's the long term plan?

Hi Julien,

For the future we can plan some work and decide together how to handle the context switch,
we might need some suggestions from you (arm maintainers) to design that part in the best
way for functional and security perspective.

For now we might flag the feature as unsupported, explaining in the Kconfig help that switching
between SVE and non-SVE guests, or between SVE guests, might add latency compared to
switching between non-SVE guests.

What do you think?

Cheers,
Luca

> 
> Cheers,
> 
> -- 
> Julien Grall
Julien Grall Jan. 13, 2023, 8:44 a.m. UTC | #3
Hi Luca,

On 12/01/2023 11:58, Luca Fancellu wrote:
>> On 11 Jan 2023, at 16:59, Julien Grall <julien@xen.org> wrote:
>> On 11/01/2023 14:38, Luca Fancellu wrote:
>>> This serie is introducing the possibility for Dom0 and DomU guests to use
>>> sve/sve2 instructions.
>>> SVE feature introduces new instruction and registers to improve performances on
>>> floating point operations.
>>> The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
>>> when available the ID_AA64ZFR0_EL1 register provides additional information
>>> about the implemented version and other SVE feature.
>>> New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
>>> Z0-Z31 are scalable vector register whose size is implementation defined and
>>> goes from 128 bits to maximum 2048, the term vector length will be used to refer
>>> to this quantity.
>>> P0-P15 are predicate registers and the size is the vector length divided by 8,
>>> same size is the FFR (First Fault Register).
>>> ZCR_ELx is a register that can control and restrict the maximum vector length
>>> used by the <x> exception level and all the lower exception levels, so for
>>> example EL3 can restrict the vector length usable by EL3,2,1,0.
>>> The platform has a maximum implemented vector length, so for every value
>>> written in ZCR register, if this value is above the implemented length, then the
>>> lower value will be used. The RDVL instruction can be used to check what vector
>>> length is the HW using after setting ZCR.
>>> For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
>>> need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.
>>> SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
>>> register is added to the domain state, to be able to trap only the guests that
>>> are not allowed to use SVE.
>>> This serie is introducing a command line parameter to enable Dom0 to use SVE and
>>> to set its maximum vector length that by default is 0 which means the guest is
>>> not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
>>> the selected value used as maximum allowed vector length (which could be lower
>>> if the implemented one is lower).
>>> For DomUs, an XL parameter with the same way of use is introduced and a dom0less
>>> DTB binding is created.
>>> The context switch is the most critical part because there can be big registers
>>> to be saved, in this serie an easy approach is used and the context is
>>> saved/restored every time for the guests that are allowed to use SVE.
>>
>> This would be OK for an initial approach. But I would be worry to officially support SVE because of the potential large impact on other users.
>>
>> What's the long term plan?
> 
> Hi Julien,
> 
> For the future we can plan some work and decide together how to handle the context switch,
> we might need some suggestions from you (arm maintainers) to design that part in the best
> way for functional and security perspective.
I think SVE will need to be lazily saved/restored. So on context switch, 
we would tell that the context belongs to the a previous domain. The 
first time after the current domain tries to access SVE, then we would 
load it.

> 
> For now we might flag the feature as unsupported, explaining in the Kconfig help that switching
> between SVE and non-SVE guests, or between SVE guests, might add latency compared to
> switching between non-SVE guests.

I am OK with that. I actually like the idea to spell it out because that 
helps us to remember what are the gaps in the code :).

Cheers,
Bertrand Marquis Jan. 25, 2023, 1:21 p.m. UTC | #4
Hi Julien,

> On 13 Jan 2023, at 09:44, Julien Grall <julien@xen.org> wrote:
> 
> Hi Luca,
> 
> On 12/01/2023 11:58, Luca Fancellu wrote:
>>> On 11 Jan 2023, at 16:59, Julien Grall <julien@xen.org> wrote:
>>> On 11/01/2023 14:38, Luca Fancellu wrote:
>>>> This serie is introducing the possibility for Dom0 and DomU guests to use
>>>> sve/sve2 instructions.
>>>> SVE feature introduces new instruction and registers to improve performances on
>>>> floating point operations.
>>>> The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
>>>> when available the ID_AA64ZFR0_EL1 register provides additional information
>>>> about the implemented version and other SVE feature.
>>>> New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
>>>> Z0-Z31 are scalable vector register whose size is implementation defined and
>>>> goes from 128 bits to maximum 2048, the term vector length will be used to refer
>>>> to this quantity.
>>>> P0-P15 are predicate registers and the size is the vector length divided by 8,
>>>> same size is the FFR (First Fault Register).
>>>> ZCR_ELx is a register that can control and restrict the maximum vector length
>>>> used by the <x> exception level and all the lower exception levels, so for
>>>> example EL3 can restrict the vector length usable by EL3,2,1,0.
>>>> The platform has a maximum implemented vector length, so for every value
>>>> written in ZCR register, if this value is above the implemented length, then the
>>>> lower value will be used. The RDVL instruction can be used to check what vector
>>>> length is the HW using after setting ZCR.
>>>> For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
>>>> need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.
>>>> SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
>>>> register is added to the domain state, to be able to trap only the guests that
>>>> are not allowed to use SVE.
>>>> This serie is introducing a command line parameter to enable Dom0 to use SVE and
>>>> to set its maximum vector length that by default is 0 which means the guest is
>>>> not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
>>>> the selected value used as maximum allowed vector length (which could be lower
>>>> if the implemented one is lower).
>>>> For DomUs, an XL parameter with the same way of use is introduced and a dom0less
>>>> DTB binding is created.
>>>> The context switch is the most critical part because there can be big registers
>>>> to be saved, in this serie an easy approach is used and the context is
>>>> saved/restored every time for the guests that are allowed to use SVE.
>>> 
>>> This would be OK for an initial approach. But I would be worry to officially support SVE because of the potential large impact on other users.
>>> 
>>> What's the long term plan?
>> Hi Julien,
>> For the future we can plan some work and decide together how to handle the context switch,
>> we might need some suggestions from you (arm maintainers) to design that part in the best
>> way for functional and security perspective.
> I think SVE will need to be lazily saved/restored. So on context switch, we would tell that the context belongs to the a previous domain. The first time after the current domain tries to access SVE, then we would load it.

We should try to prevent those kind of things because it makes the real time analysis a lot more complex.
The only use case where this would make the system a lot faster is if there is only one guest using SVE (which might be a use case), other than that case this will just create delays when someone else is trying to use SVE instead of having a fix delay at context switch.

> 
>> For now we might flag the feature as unsupported, explaining in the Kconfig help that switching
>> between SVE and non-SVE guests, or between SVE guests, might add latency compared to
>> switching between non-SVE guests.
> 
> I am OK with that. I actually like the idea to spell it out because that helps us to remember what are the gaps in the code :).

I like this solution to.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall
Julien Grall Jan. 25, 2023, 1:50 p.m. UTC | #5
Hi Bertrand,

On 25/01/2023 13:21, Bertrand Marquis wrote:
>> On 13 Jan 2023, at 09:44, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Luca,
>>
>> On 12/01/2023 11:58, Luca Fancellu wrote:
>>>> On 11 Jan 2023, at 16:59, Julien Grall <julien@xen.org> wrote:
>>>> On 11/01/2023 14:38, Luca Fancellu wrote:
>>>>> This serie is introducing the possibility for Dom0 and DomU guests to use
>>>>> sve/sve2 instructions.
>>>>> SVE feature introduces new instruction and registers to improve performances on
>>>>> floating point operations.
>>>>> The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
>>>>> when available the ID_AA64ZFR0_EL1 register provides additional information
>>>>> about the implemented version and other SVE feature.
>>>>> New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
>>>>> Z0-Z31 are scalable vector register whose size is implementation defined and
>>>>> goes from 128 bits to maximum 2048, the term vector length will be used to refer
>>>>> to this quantity.
>>>>> P0-P15 are predicate registers and the size is the vector length divided by 8,
>>>>> same size is the FFR (First Fault Register).
>>>>> ZCR_ELx is a register that can control and restrict the maximum vector length
>>>>> used by the <x> exception level and all the lower exception levels, so for
>>>>> example EL3 can restrict the vector length usable by EL3,2,1,0.
>>>>> The platform has a maximum implemented vector length, so for every value
>>>>> written in ZCR register, if this value is above the implemented length, then the
>>>>> lower value will be used. The RDVL instruction can be used to check what vector
>>>>> length is the HW using after setting ZCR.
>>>>> For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
>>>>> need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.
>>>>> SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
>>>>> register is added to the domain state, to be able to trap only the guests that
>>>>> are not allowed to use SVE.
>>>>> This serie is introducing a command line parameter to enable Dom0 to use SVE and
>>>>> to set its maximum vector length that by default is 0 which means the guest is
>>>>> not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
>>>>> the selected value used as maximum allowed vector length (which could be lower
>>>>> if the implemented one is lower).
>>>>> For DomUs, an XL parameter with the same way of use is introduced and a dom0less
>>>>> DTB binding is created.
>>>>> The context switch is the most critical part because there can be big registers
>>>>> to be saved, in this serie an easy approach is used and the context is
>>>>> saved/restored every time for the guests that are allowed to use SVE.
>>>>
>>>> This would be OK for an initial approach. But I would be worry to officially support SVE because of the potential large impact on other users.
>>>>
>>>> What's the long term plan?
>>> Hi Julien,
>>> For the future we can plan some work and decide together how to handle the context switch,
>>> we might need some suggestions from you (arm maintainers) to design that part in the best
>>> way for functional and security perspective.
>> I think SVE will need to be lazily saved/restored. So on context switch, we would tell that the context belongs to the a previous domain. The first time after the current domain tries to access SVE, then we would load it.
> 
> We should try to prevent those kind of things because it makes the real time analysis a lot more complex.

The choice of SVE (including the vector length) is per-domain. If all 
the VMs are using the same vector length. Then the delay would indeed be 
fixed. Otherwise, the delay will vary depending on the scheduling choice.

It is not clear to me how this is better for real time analysis.

> The only use case where this would make the system a lot faster is if there is only one guest using SVE (which might be a use case), other than that case this will just create delays when someone else is trying to use SVE instead of having a fix delay at context swit
Even in the case you mention, I think it will highly depend on the cost 
of context switching SVE. I have been told this is quite large, and one 
surely don't want to spend an extra thousand cycles when receiving an 
interrupt (I don't expect handler to use SVE).

I think we need to understand the workload (and cost) in order to decide 
whether it should be eager/lazy.

At least, I know that in Linux, only the part common with VFP are 
guaranteed to be preserved (see [1]). So the expectation seems that SVE 
use will be short-lived.

> 
>>
>>> For now we might flag the feature as unsupported, explaining in the Kconfig help that switching
>>> between SVE and non-SVE guests, or between SVE guests, might add latency compared to
>>> switching between non-SVE guests.
>>
>> I am OK with that. I actually like the idea to spell it out because that helps us to remember what are the gaps in the code :).
> 
> I like this solution to.
> 
> Cheers
> Bertrand
> 
>>
>> Cheers,
>>
>> -- 
>> Julien Grall
> 

[1] https://www.kernel.org/doc/Documentation/arm64/sve.txt