diff mbox series

[v1,2/4] KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS

Message ID 20230919175017.538312-3-jingzhangos@google.com (mailing list archive)
State New, archived
Headers show
Series Get writable masks for feature ID from userspace | expand

Commit Message

Jing Zhang Sept. 19, 2023, 5:50 p.m. UTC
Add some basic documentation on how to get feature ID register writable
masks from userspace.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

Comments

Eric Auger Feb. 13, 2024, 1:59 p.m. UTC | #1
Hi,

On 9/19/23 19:50, Jing Zhang wrote:
> Add some basic documentation on how to get feature ID register writable
> masks from userspace.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 21a7578142a1..2defb5e198ce 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6070,6 +6070,48 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
>  interface. No error will be returned, but the resulting offset will not be
>  applied.
>  
> +4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
> +-------------------------------------------
> +
> +:Capability: KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES
> +:Architectures: arm64
> +:Type: vm ioctl
> +:Parameters: struct reg_mask_range (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +
> +::
> +
> +        #define ARM64_FEATURE_ID_SPACE_SIZE	(3 * 8 * 8)
> +        #define ARM64_FEATURE_ID_RANGE_IDREGS	BIT(0)
> +
> +        struct reg_mask_range {
> +                __u64 addr;             /* Pointer to mask array */
> +                __u32 range;            /* Requested range */
> +                __u32 reserved[13];
> +        };
> +
> +This ioctl copies the writable masks for Feature ID registers to userspace.
> +The Feature ID space is defined as the AArch64 System register space with
> +op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
when attempting a migration between Ampere Altra and ThunderXv2 the
first hurdle is to handle a failure when writing ICC_CTLR_EL1
(3.0.12.12.4) on dest. This reg is outside of the scope of the above
single range (BIT(0)).

This may be questionable if we want to migrate between those types of
machines but the goal is to exercise different scenarios to have a
gloval view of the problems.

This reg exposes some RO capabilities such as ExtRange, A3V, SEIS,
IDBits, ...
So to get the migration going further I would need to tweek this on the
source - for instance I guess SEIS could be reset despite the host HW
cap - without making too much trouble.

What would you recommend, adding a new range? But I guess we need to
design ranges carefully otherwise we may be quickly limited by the
number of flag bits.

Any suggestion?

Thanks

Eric
> +
> +The mask array pointed to by ``addr`` is indexed by the macro
> +``ARM64_FEATURE_ID_SPACE_IDX(op0, op1, crn, crm, op2)``, allowing userspace
> +to know what bits can be changed for the system register described by ``op0,
> +op1, crn, crm, op2``.
> +
> +The ``range`` field describes the requested range of registers. The valid
> +ranges can be retrieved by checking the return value of
> +KVM_CAP_CHECK_EXTENSION_VM for the KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES
> +capability, which will return a bitmask of the supported ranges. Each bit
> +set in the return value represents a possible value for the ``range``
> +field.  At the time of writing, only bit 0 is returned set by the
> +capability, meaning that only the value ``ARM64_FEATURE_ID_RANGE_IDREGS``
> +is valid for ``range``.
> +
> +The ``reserved[13]`` array is reserved for future use and should be 0, or
> +KVM may return an error.
> +
>  5. The kvm_run structure
>  ========================
>
Marc Zyngier Feb. 13, 2024, 2:53 p.m. UTC | #2
Hey Eric,

On Tue, 13 Feb 2024 13:59:31 +0000,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi,
> 
> On 9/19/23 19:50, Jing Zhang wrote:
> > Add some basic documentation on how to get feature ID register writable
> > masks from userspace.
> > 
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 21a7578142a1..2defb5e198ce 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6070,6 +6070,48 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
> >  interface. No error will be returned, but the resulting offset will not be
> >  applied.
> >  
> > +4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
> > +-------------------------------------------
> > +
> > +:Capability: KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES
> > +:Architectures: arm64
> > +:Type: vm ioctl
> > +:Parameters: struct reg_mask_range (in/out)
> > +:Returns: 0 on success, < 0 on error
> > +
> > +
> > +::
> > +
> > +        #define ARM64_FEATURE_ID_SPACE_SIZE	(3 * 8 * 8)
> > +        #define ARM64_FEATURE_ID_RANGE_IDREGS	BIT(0)
> > +
> > +        struct reg_mask_range {
> > +                __u64 addr;             /* Pointer to mask array */
> > +                __u32 range;            /* Requested range */
> > +                __u32 reserved[13];
> > +        };
> > +
> > +This ioctl copies the writable masks for Feature ID registers to userspace.
> > +The Feature ID space is defined as the AArch64 System register space with
> > +op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
> when attempting a migration between Ampere Altra and ThunderXv2 the
> first hurdle is to handle a failure when writing ICC_CTLR_EL1
> (3.0.12.12.4) on dest. This reg is outside of the scope of the above
> single range (BIT(0)).

Indeed. But more importantly, this isn't really an ID register. Plenty
of variable bits in there.

> This may be questionable if we want to migrate between those types of
> machines but the goal is to exercise different scenarios to have a
> gloval view of the problems.

I think this is a valuable experiment, and we should definitely
explore this sort of things (as I cannot see the diversity of ARM
system slowing down any time soon).

> 
> This reg exposes some RO capabilities such as ExtRange, A3V, SEIS,
> IDBits, ...
> So to get the migration going further I would need to tweek this on the
> source - for instance I guess SEIS could be reset despite the host HW
> cap - without making too much trouble.

I'm not sure SEIS is such an easy one: if you promised the guest that
it would never get an SError doing the most stupid things (SEIS=0), it
really shouldn't get one after migration. If you advertised it on the
source HW (Altra), a migration to TX2 would be fine.

The other bits are possible to change depending on the requirements of
the VM (aff3, IDBits), and ExtRange should always be set to 0 (because
our GIC implementation doesn't support EPPI/ESPI).

The really ugly part here is that if you want to affect these bits,
you will need to trap and emulate the access. Not a big deal, but in
the absence of FGT, you will need to handle the full Common trap
group, which is going to slow things down (you will have to trap
ICV_PMR_EL1, for example).

> What would you recommend, adding a new range? But I guess we need to
> design ranges carefully otherwise we may be quickly limited by the
> number of flag bits.

I can see a need to adding a range that would cover non-ID registers
that have RO fields. But we also need to consider the case of EL2
registers that take part in this.

For example, ICV_CTLR_EL1 and ICH_VTR_EL2 and deeply linked, and share
some fields. Without NV, no need to expose HCR_VTR_EL2. With NV, this
register actually drives ICV_CTLR_EL1.

So careful planning is required here, and the impact of this measured.

Thanks,

	M.
Eric Auger Feb. 14, 2024, 6:07 p.m. UTC | #3
Hi Marc,

On 2/13/24 15:53, Marc Zyngier wrote:
> Hey Eric,
> 
> On Tue, 13 Feb 2024 13:59:31 +0000,
> Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi,
>>
>> On 9/19/23 19:50, Jing Zhang wrote:
>>> Add some basic documentation on how to get feature ID register writable
>>> masks from userspace.
>>>
>>> Signed-off-by: Jing Zhang <jingzhangos@google.com>
>>> ---
>>>  Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++++++++++
>>>  1 file changed, 42 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 21a7578142a1..2defb5e198ce 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -6070,6 +6070,48 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
>>>  interface. No error will be returned, but the resulting offset will not be
>>>  applied.
>>>  
>>> +4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
>>> +-------------------------------------------
>>> +
>>> +:Capability: KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES
>>> +:Architectures: arm64
>>> +:Type: vm ioctl
>>> +:Parameters: struct reg_mask_range (in/out)
>>> +:Returns: 0 on success, < 0 on error
>>> +
>>> +
>>> +::
>>> +
>>> +        #define ARM64_FEATURE_ID_SPACE_SIZE	(3 * 8 * 8)
>>> +        #define ARM64_FEATURE_ID_RANGE_IDREGS	BIT(0)
>>> +
>>> +        struct reg_mask_range {
>>> +                __u64 addr;             /* Pointer to mask array */
>>> +                __u32 range;            /* Requested range */
>>> +                __u32 reserved[13];
>>> +        };
>>> +
>>> +This ioctl copies the writable masks for Feature ID registers to userspace.
>>> +The Feature ID space is defined as the AArch64 System register space with
>>> +op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
>> when attempting a migration between Ampere Altra and ThunderXv2 the
>> first hurdle is to handle a failure when writing ICC_CTLR_EL1
>> (3.0.12.12.4) on dest. This reg is outside of the scope of the above
>> single range (BIT(0)).
> 
> Indeed. But more importantly, this isn't really an ID register. Plenty
> of variable bits in there.
> 
>> This may be questionable if we want to migrate between those types of
>> machines but the goal is to exercise different scenarios to have a
>> gloval view of the problems.
> 
> I think this is a valuable experiment, and we should definitely
> explore this sort of things (as I cannot see the diversity of ARM
> system slowing down any time soon).
> 
>>
>> This reg exposes some RO capabilities such as ExtRange, A3V, SEIS,hyp/nvhe/hyp-main.c
>> IDBits, ...
>> So to get the migration going further I would need to tweek this on the
>> source - for instance I guess SEIS could be reset despite the host HW
>> cap - without making too much trouble.
> 
> I'm not sure SEIS is such an easy one: if you promised the guest that
> it would never get an SError doing the most stupid things (SEIS=0), it
> really shouldn't get one after migration. If you advertised it on the
> source HW (Altra), a migration to TX2 would be fine.
I see. Indeed for sure we must make sure KVM cannot inject SEIs in the
guest if SEIS is not advertised on guest side.

In this case SEIS is 0 on src Altra. On dst TX2 ich_vtr_el2 advertises
it and host seis =1 causing set_gic_ctlr to fail (vgic-sys-reg-v3.c).

> 
> The other bits are possible to change depending on the requirements of
> the VM (aff3, IDBits), and ExtRange should always be set to 0 (because
> our GIC implementation doesn't support EPPI/ESPI).
> 
> The really ugly part here is that if you want to affect these bits,
> you will need to trap and emulate the access. Not a big deal, but in
> the absence of FGT, you will need to handle the full Common trap
> group, which is going to slow things down (you will have to trap
> ICV_PMR_EL1, for example).
Would it be sensible to simplify things and only support the new range
if FGT is supported?
> 
>> What would you recommend, adding a new range? But I guess we need to
>> design ranges carefully otherwise we may be quickly limited by the
>> number of flag bits.
> 
> I can see a need to adding a range that would cover non-ID registers
> that have RO fields. But we also need to consider the case of EL2
> registers that take part in this.
Yes I need to better understand the consistency with ICH_VTR_EL2
> 
> For example, ICV_CTLR_EL1 and ICH_VTR_EL2 and deeply linked, and share
> some fields. Without NV, no need to expose HCR_VTR_EL2. With NV, this
> register actually drives ICV_CTLR_EL1.
understood.
> 
> So careful planning is required here, and the impact of this measured.

Understood. More time needed to study the code ;-)

Thanks!

Eric
> 
> Thanks,
> 
> 	M.
>
Marc Zyngier Feb. 14, 2024, 8:16 p.m. UTC | #4
Hi Eric,

On Wed, 14 Feb 2024 18:07:58 +0000,
Eric Auger <eauger@redhat.com> wrote:
> 
> > I'm not sure SEIS is such an easy one: if you promised the guest that
> > it would never get an SError doing the most stupid things (SEIS=0), it
> > really shouldn't get one after migration. If you advertised it on the
> > source HW (Altra), a migration to TX2 would be fine.
> I see. Indeed for sure we must make sure KVM cannot inject SEIs in the
> guest if SEIS is not advertised on guest side.

The problem isn't KVM injecting an SError, but the HW doing it (that's
what SEIS indicates). On some HW, it only takes an old enough EFI
build to trigger those.

> In this case SEIS is 0 on src Altra. On dst TX2 ich_vtr_el2 advertises
> it and host seis =1 causing set_gic_ctlr to fail (vgic-sys-reg-v3.c).
> 
> > 
> > The other bits are possible to change depending on the requirements of
> > the VM (aff3, IDBits), and ExtRange should always be set to 0 (because
> > our GIC implementation doesn't support EPPI/ESPI).
> > 
> > The really ugly part here is that if you want to affect these bits,
> > you will need to trap and emulate the access. Not a big deal, but in
> > the absence of FGT, you will need to handle the full Common trap
> > group, which is going to slow things down (you will have to trap
> > ICV_PMR_EL1, for example).
> Would it be sensible to simplify things and only support the new range
> if FGT is supported?

I'm not sure that helps. The only FGT that covers the GIC is for
ICC_IGRPENn_EL1, and we don't care much about that guy. So
ICH_VMCR_EL2.TC it is, and that sucks a bit. But if that's what you
want to show, you don't have a choice.

Thanks,

	M.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 21a7578142a1..2defb5e198ce 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6070,6 +6070,48 @@  writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
 interface. No error will be returned, but the resulting offset will not be
 applied.
 
+4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
+-------------------------------------------
+
+:Capability: KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct reg_mask_range (in/out)
+:Returns: 0 on success, < 0 on error
+
+
+::
+
+        #define ARM64_FEATURE_ID_SPACE_SIZE	(3 * 8 * 8)
+        #define ARM64_FEATURE_ID_RANGE_IDREGS	BIT(0)
+
+        struct reg_mask_range {
+                __u64 addr;             /* Pointer to mask array */
+                __u32 range;            /* Requested range */
+                __u32 reserved[13];
+        };
+
+This ioctl copies the writable masks for Feature ID registers to userspace.
+The Feature ID space is defined as the AArch64 System register space with
+op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
+
+The mask array pointed to by ``addr`` is indexed by the macro
+``ARM64_FEATURE_ID_SPACE_IDX(op0, op1, crn, crm, op2)``, allowing userspace
+to know what bits can be changed for the system register described by ``op0,
+op1, crn, crm, op2``.
+
+The ``range`` field describes the requested range of registers. The valid
+ranges can be retrieved by checking the return value of
+KVM_CAP_CHECK_EXTENSION_VM for the KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES
+capability, which will return a bitmask of the supported ranges. Each bit
+set in the return value represents a possible value for the ``range``
+field.  At the time of writing, only bit 0 is returned set by the
+capability, meaning that only the value ``ARM64_FEATURE_ID_RANGE_IDREGS``
+is valid for ``range``.
+
+The ``reserved[13]`` array is reserved for future use and should be 0, or
+KVM may return an error.
+
 5. The kvm_run structure
 ========================