diff mbox series

[RFC,v2,4/4] arm64: Export id_aar64fpr0 via sysfs

Message ID 20201021104611.2744565-5-qais.yousef@arm.com (mailing list archive)
State New, archived
Headers show
Series Add support for Asymmetric AArch32 systems | expand

Commit Message

Qais Yousef Oct. 21, 2020, 10:46 a.m. UTC
So that userspace can detect if the cpu has aarch32 support at EL0.

CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
what it does. And fixed to accept both u64 and u32 without causing the
printf to print out a warning about mismatched type. This was caught
while testing to check the new CPUREGS_USER_ATTR_RO().

The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
on a @cond to user space. The exported fields match the definition in
arm64_ftr_reg so that the content of a register exported via MRS and
sysfs are kept cohesive.

The @cond in our case is that the system is asymmetric aarch32 and the
controlling sysctl.enable_asym_32bit is enabled.

Update Documentation/arm64/cpu-feature-registers.rst to reflect the
newly visible EL0 field in ID_AA64FPR0_EL1.

Note that the MRS interface will still return the sanitized content
_only_.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---

Example output. I was surprised that the 2nd field (bits[7:4]) is printed out
although it's set as FTR_HIDDEN.

# cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
0x0000000000000011
0x0000000000000011
0x0000000000000011
0x0000000000000011
0x0000000000000011
0x0000000000000011

# echo 1 > /proc/sys/kernel/enable_asym_32bit

# cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
0x0000000000000011
0x0000000000000011
0x0000000000000012
0x0000000000000012
0x0000000000000011
0x0000000000000011

 Documentation/arm64/cpu-feature-registers.rst |  2 +-
 arch/arm64/kernel/cpufeature.c                |  2 +-
 arch/arm64/kernel/cpuinfo.c                   | 58 +++++++++++++++++--
 3 files changed, 54 insertions(+), 8 deletions(-)

Comments

Marc Zyngier Oct. 21, 2020, 11:09 a.m. UTC | #1
On 2020-10-21 11:46, Qais Yousef wrote:
> So that userspace can detect if the cpu has aarch32 support at EL0.
> 
> CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better 
> reflect
> what it does. And fixed to accept both u64 and u32 without causing the
> printf to print out a warning about mismatched type. This was caught
> while testing to check the new CPUREGS_USER_ATTR_RO().
> 
> The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
> on a @cond to user space. The exported fields match the definition in
> arm64_ftr_reg so that the content of a register exported via MRS and
> sysfs are kept cohesive.
> 
> The @cond in our case is that the system is asymmetric aarch32 and the
> controlling sysctl.enable_asym_32bit is enabled.
> 
> Update Documentation/arm64/cpu-feature-registers.rst to reflect the
> newly visible EL0 field in ID_AA64FPR0_EL1.
> 
> Note that the MRS interface will still return the sanitized content
> _only_.
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
> 
> Example output. I was surprised that the 2nd field (bits[7:4]) is 
> printed out
> although it's set as FTR_HIDDEN.
> 
> # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 
> # echo 1 > /proc/sys/kernel/enable_asym_32bit
> 
> # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000012
> 0x0000000000000012
> 0x0000000000000011
> 0x0000000000000011

This looks like a terrible userspace interface. It exposes unrelated 
features,
and doesn't expose the single useful information that the kernel has:
the cpumask describing the CPUs supporting  AArch32 at EL0. Why not 
expose
this synthetic piece of information which requires very little effort 
from
userspace and doesn't spit out unrelated stuff? Not to mention the 
discrepancy
with what userspace gets while reading the same register via the MRS 
emulation.

Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy,
but I don't think this fits either.

         M.
Greg Kroah-Hartman Oct. 21, 2020, 11:25 a.m. UTC | #2
On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> On 2020-10-21 11:46, Qais Yousef wrote:
> > So that userspace can detect if the cpu has aarch32 support at EL0.
> > 
> > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
> > what it does. And fixed to accept both u64 and u32 without causing the
> > printf to print out a warning about mismatched type. This was caught
> > while testing to check the new CPUREGS_USER_ATTR_RO().
> > 
> > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
> > on a @cond to user space. The exported fields match the definition in
> > arm64_ftr_reg so that the content of a register exported via MRS and
> > sysfs are kept cohesive.
> > 
> > The @cond in our case is that the system is asymmetric aarch32 and the
> > controlling sysctl.enable_asym_32bit is enabled.
> > 
> > Update Documentation/arm64/cpu-feature-registers.rst to reflect the
> > newly visible EL0 field in ID_AA64FPR0_EL1.
> > 
> > Note that the MRS interface will still return the sanitized content
> > _only_.
> > 
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> > 
> > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > printed out
> > although it's set as FTR_HIDDEN.
> > 
> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 
> > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > 
> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000012
> > 0x0000000000000012
> > 0x0000000000000011
> > 0x0000000000000011
> 
> This looks like a terrible userspace interface.

It's also not allowed, sorry.  sysfs is "one value per file", which is
NOT what is happening at all.

This would be easy to see if there was a Documentation/ABI/ update,
which is also required, was that here?

thanks,

greg k-h
Greg Kroah-Hartman Oct. 21, 2020, 11:28 a.m. UTC | #3
On Wed, Oct 21, 2020 at 11:46:11AM +0100, Qais Yousef wrote:
> So that userspace can detect if the cpu has aarch32 support at EL0.
> 
> CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
> what it does. And fixed to accept both u64 and u32 without causing the
> printf to print out a warning about mismatched type. This was caught
> while testing to check the new CPUREGS_USER_ATTR_RO().
> 
> The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
> on a @cond to user space. The exported fields match the definition in
> arm64_ftr_reg so that the content of a register exported via MRS and
> sysfs are kept cohesive.
> 
> The @cond in our case is that the system is asymmetric aarch32 and the
> controlling sysctl.enable_asym_32bit is enabled.
> 
> Update Documentation/arm64/cpu-feature-registers.rst to reflect the
> newly visible EL0 field in ID_AA64FPR0_EL1.
> 
> Note that the MRS interface will still return the sanitized content
> _only_.
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
> 
> Example output. I was surprised that the 2nd field (bits[7:4]) is printed out
> although it's set as FTR_HIDDEN.
> 
> # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000011
> 
> # echo 1 > /proc/sys/kernel/enable_asym_32bit
> 
> # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> 0x0000000000000011
> 0x0000000000000011
> 0x0000000000000012
> 0x0000000000000012
> 0x0000000000000011
> 0x0000000000000011
> 
>  Documentation/arm64/cpu-feature-registers.rst |  2 +-
>  arch/arm64/kernel/cpufeature.c                |  2 +-
>  arch/arm64/kernel/cpuinfo.c                   | 58 +++++++++++++++++--
>  3 files changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> index f28853f80089..bfcbda6d6f35 100644
> --- a/Documentation/arm64/cpu-feature-registers.rst
> +++ b/Documentation/arm64/cpu-feature-registers.rst
> @@ -166,7 +166,7 @@ infrastructure:
>       +------------------------------+---------+---------+
>       | EL1                          | [7-4]   |    n    |
>       +------------------------------+---------+---------+
> -     | EL0                          | [3-0]   |    n    |
> +     | EL0                          | [3-0]   |    y    |
>       +------------------------------+---------+---------+
>  
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6f795c8221f4..0f7307c8ad80 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -221,7 +221,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL3_SHIFT, 4, 0),
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL2_SHIFT, 4, 0),
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
> -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
> +	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
>  	ARM64_FTR_END,
>  };
>  
> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> index 93c55986ca7f..632b9d5b5230 100644
> --- a/arch/arm64/kernel/cpuinfo.c
> +++ b/arch/arm64/kernel/cpuinfo.c
> @@ -231,25 +231,71 @@ static struct kobj_type cpuregs_kobj_type = {
>   * future expansion without an ABI break.
>   */
>  #define kobj_to_cpuinfo(kobj)	container_of(kobj, struct cpuinfo_arm64, kobj)
> -#define CPUREGS_ATTR_RO(_name, _field)						\
> +#define CPUREGS_RAW_ATTR_RO(_name, _field)				\
>  	static ssize_t _name##_show(struct kobject *kobj,			\
>  			struct kobj_attribute *attr, char *buf)			\
>  	{									\
>  		struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj);		\
>  										\
> -		if (info->reg_midr)						\
> -			return sprintf(buf, "0x%016x\n", info->reg_##_field);	\
> -		else								\
> +		if (info->reg_midr) {						\
> +			u64 val = info->reg_##_field;				\
> +			return sprintf(buf, "0x%016llx\n", val);		\

Nit, for sysfs, use the new sysfs_emit() call instead of sprintf().

thanks,

greg k-h
Marc Zyngier Oct. 21, 2020, 11:46 a.m. UTC | #4
On 2020-10-21 12:25, Greg Kroah-Hartman wrote:
> On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
>> On 2020-10-21 11:46, Qais Yousef wrote:
>> > So that userspace can detect if the cpu has aarch32 support at EL0.
>> >
>> > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
>> > what it does. And fixed to accept both u64 and u32 without causing the
>> > printf to print out a warning about mismatched type. This was caught
>> > while testing to check the new CPUREGS_USER_ATTR_RO().
>> >
>> > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
>> > on a @cond to user space. The exported fields match the definition in
>> > arm64_ftr_reg so that the content of a register exported via MRS and
>> > sysfs are kept cohesive.
>> >
>> > The @cond in our case is that the system is asymmetric aarch32 and the
>> > controlling sysctl.enable_asym_32bit is enabled.
>> >
>> > Update Documentation/arm64/cpu-feature-registers.rst to reflect the
>> > newly visible EL0 field in ID_AA64FPR0_EL1.
>> >
>> > Note that the MRS interface will still return the sanitized content
>> > _only_.
>> >
>> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
>> > ---
>> >
>> > Example output. I was surprised that the 2nd field (bits[7:4]) is
>> > printed out
>> > although it's set as FTR_HIDDEN.
>> >
>> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
>> > 0x0000000000000011
>> > 0x0000000000000011
>> > 0x0000000000000011
>> > 0x0000000000000011
>> > 0x0000000000000011
>> > 0x0000000000000011
>> >
>> > # echo 1 > /proc/sys/kernel/enable_asym_32bit
>> >
>> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
>> > 0x0000000000000011
>> > 0x0000000000000011
>> > 0x0000000000000012
>> > 0x0000000000000012
>> > 0x0000000000000011
>> > 0x0000000000000011
>> 
>> This looks like a terrible userspace interface.
> 
> It's also not allowed, sorry.  sysfs is "one value per file", which is
> NOT what is happening at all.

I *think* Qais got that part right, though it is hard to tell without
knowing how many CPUs this system has (cpu/cpu* is ambiguous).

         M.
Greg Kroah-Hartman Oct. 21, 2020, 12:11 p.m. UTC | #5
On Wed, Oct 21, 2020 at 12:46:46PM +0100, Marc Zyngier wrote:
> On 2020-10-21 12:25, Greg Kroah-Hartman wrote:
> > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> > > On 2020-10-21 11:46, Qais Yousef wrote:
> > > > So that userspace can detect if the cpu has aarch32 support at EL0.
> > > >
> > > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
> > > > what it does. And fixed to accept both u64 and u32 without causing the
> > > > printf to print out a warning about mismatched type. This was caught
> > > > while testing to check the new CPUREGS_USER_ATTR_RO().
> > > >
> > > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
> > > > on a @cond to user space. The exported fields match the definition in
> > > > arm64_ftr_reg so that the content of a register exported via MRS and
> > > > sysfs are kept cohesive.
> > > >
> > > > The @cond in our case is that the system is asymmetric aarch32 and the
> > > > controlling sysctl.enable_asym_32bit is enabled.
> > > >
> > > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the
> > > > newly visible EL0 field in ID_AA64FPR0_EL1.
> > > >
> > > > Note that the MRS interface will still return the sanitized content
> > > > _only_.
> > > >
> > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > > > ---
> > > >
> > > > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > > > printed out
> > > > although it's set as FTR_HIDDEN.
> > > >
> > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > >
> > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > > >
> > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000012
> > > > 0x0000000000000012
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > 
> > > This looks like a terrible userspace interface.
> > 
> > It's also not allowed, sorry.  sysfs is "one value per file", which is
> > NOT what is happening at all.
> 
> I *think* Qais got that part right, though it is hard to tell without
> knowing how many CPUs this system has (cpu/cpu* is ambiguous).

Ah, missed the '*' in the middle of that path, my fault.

But without documentation, it's impossible to know...

thanks,

greg k-h
Catalin Marinas Oct. 21, 2020, 12:15 p.m. UTC | #6
On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> On 2020-10-21 11:46, Qais Yousef wrote:
> > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > printed out
> > although it's set as FTR_HIDDEN.
> > 
> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 
> > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > 
> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000012
> > 0x0000000000000012
> > 0x0000000000000011
> > 0x0000000000000011
> 
> This looks like a terrible userspace interface. It exposes unrelated
> features,

Not sure why the EL1 field ended up in here, that's not relevant to the
user.

> and doesn't expose the single useful information that the kernel has:
> the cpumask describing the CPUs supporting  AArch32 at EL0. Why not expose
> this synthetic piece of information which requires very little effort from
> userspace and doesn't spit out unrelated stuff?

I thought the whole idea is to try and avoid the "very little effort"
part ;).

> Not to mention the discrepancy with what userspace gets while reading
> the same register via the MRS emulation.
> 
> Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy,
> but I don't think this fits either.

We already expose MIDR and REVIDR via the current sysfs interface. We
can expand it to include _all_ the other ID_* regs currently available
to user via the MRS emulation and we won't have to debate what a new
interface would look like. The MRS emulation and the sysfs info should
probably match, though that means we need to expose the
ID_AA64PFR0_EL1.EL0 field which we currently don't.

I do agree that an AArch32 cpumask is an easier option both from the
kernel implementation perspective and from the application usability
one, though not as easy as automatic task placement by the scheduler (my
first preference, followed by the id_* regs and the aarch32 mask, though
not a strong preference for any).
Qais Yousef Oct. 21, 2020, 1:18 p.m. UTC | #7
On 10/21/20 12:46, Marc Zyngier wrote:
> On 2020-10-21 12:25, Greg Kroah-Hartman wrote:
> > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> > > On 2020-10-21 11:46, Qais Yousef wrote:
> > > > So that userspace can detect if the cpu has aarch32 support at EL0.
> > > >
> > > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
> > > > what it does. And fixed to accept both u64 and u32 without causing the
> > > > printf to print out a warning about mismatched type. This was caught
> > > > while testing to check the new CPUREGS_USER_ATTR_RO().
> > > >
> > > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
> > > > on a @cond to user space. The exported fields match the definition in
> > > > arm64_ftr_reg so that the content of a register exported via MRS and
> > > > sysfs are kept cohesive.
> > > >
> > > > The @cond in our case is that the system is asymmetric aarch32 and the
> > > > controlling sysctl.enable_asym_32bit is enabled.
> > > >
> > > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the
> > > > newly visible EL0 field in ID_AA64FPR0_EL1.
> > > >
> > > > Note that the MRS interface will still return the sanitized content
> > > > _only_.
> > > >
> > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > > > ---
> > > >
> > > > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > > > printed out
> > > > although it's set as FTR_HIDDEN.
> > > >
> > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > >
> > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > > >
> > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000012
> > > > 0x0000000000000012
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > 
> > > This looks like a terrible userspace interface.
> > 
> > It's also not allowed, sorry.  sysfs is "one value per file", which is
> > NOT what is happening at all.
> 
> I *think* Qais got that part right, though it is hard to tell without
> knowing how many CPUs this system has (cpu/cpu* is ambiguous).

Correct. This interface already exists for other registers and reads a single
value for each CPU. I just added the ability to read a new register. Though it
has a slightly different behavior: reads Sanitized vs RAW info.

Thanks

--
Qais Yousef
Qais Yousef Oct. 21, 2020, 1:20 p.m. UTC | #8
On 10/21/20 13:15, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> > On 2020-10-21 11:46, Qais Yousef wrote:
> > > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > > printed out
> > > although it's set as FTR_HIDDEN.

                           ^^^^^^^^^

> > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 
> > > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > > 
> > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000012
> > > 0x0000000000000012
> > > 0x0000000000000011
> > > 0x0000000000000011
> > 
> > This looks like a terrible userspace interface. It exposes unrelated
> > features,
> 
> Not sure why the EL1 field ended up in here, that's not relevant to the
> user.

I was surprised by this too. Not sure a bug at my end or a bug in the way
arm64_ftr_reg stores reg->user_val and reg->user_mask.

FWIW, I did highlight the problem above.

It is something to fix indeed.

Thanks

--
Qais Yousef
Qais Yousef Oct. 21, 2020, 1:22 p.m. UTC | #9
On 10/21/20 13:28, Greg Kroah-Hartman wrote:
> On Wed, Oct 21, 2020 at 11:46:11AM +0100, Qais Yousef wrote:
> > So that userspace can detect if the cpu has aarch32 support at EL0.
> > 
> > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect
> > what it does. And fixed to accept both u64 and u32 without causing the
> > printf to print out a warning about mismatched type. This was caught
> > while testing to check the new CPUREGS_USER_ATTR_RO().
> > 
> > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based
> > on a @cond to user space. The exported fields match the definition in
> > arm64_ftr_reg so that the content of a register exported via MRS and
> > sysfs are kept cohesive.
> > 
> > The @cond in our case is that the system is asymmetric aarch32 and the
> > controlling sysctl.enable_asym_32bit is enabled.
> > 
> > Update Documentation/arm64/cpu-feature-registers.rst to reflect the
> > newly visible EL0 field in ID_AA64FPR0_EL1.
> > 
> > Note that the MRS interface will still return the sanitized content
> > _only_.
> > 
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> > 
> > Example output. I was surprised that the 2nd field (bits[7:4]) is printed out
> > although it's set as FTR_HIDDEN.
> > 
> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000011
> > 
> > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > 
> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > 0x0000000000000011
> > 0x0000000000000011
> > 0x0000000000000012
> > 0x0000000000000012
> > 0x0000000000000011
> > 0x0000000000000011
> > 
> >  Documentation/arm64/cpu-feature-registers.rst |  2 +-
> >  arch/arm64/kernel/cpufeature.c                |  2 +-
> >  arch/arm64/kernel/cpuinfo.c                   | 58 +++++++++++++++++--
> >  3 files changed, 54 insertions(+), 8 deletions(-)
> > 
> > diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> > index f28853f80089..bfcbda6d6f35 100644
> > --- a/Documentation/arm64/cpu-feature-registers.rst
> > +++ b/Documentation/arm64/cpu-feature-registers.rst
> > @@ -166,7 +166,7 @@ infrastructure:
> >       +------------------------------+---------+---------+
> >       | EL1                          | [7-4]   |    n    |
> >       +------------------------------+---------+---------+
> > -     | EL0                          | [3-0]   |    n    |
> > +     | EL0                          | [3-0]   |    y    |
> >       +------------------------------+---------+---------+
> >  
> >  
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 6f795c8221f4..0f7307c8ad80 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -221,7 +221,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
> >  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL3_SHIFT, 4, 0),
> >  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL2_SHIFT, 4, 0),
> >  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
> > -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
> > +	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
> >  	ARM64_FTR_END,
> >  };
> >  
> > diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> > index 93c55986ca7f..632b9d5b5230 100644
> > --- a/arch/arm64/kernel/cpuinfo.c
> > +++ b/arch/arm64/kernel/cpuinfo.c
> > @@ -231,25 +231,71 @@ static struct kobj_type cpuregs_kobj_type = {
> >   * future expansion without an ABI break.
> >   */
> >  #define kobj_to_cpuinfo(kobj)	container_of(kobj, struct cpuinfo_arm64, kobj)
> > -#define CPUREGS_ATTR_RO(_name, _field)						\
> > +#define CPUREGS_RAW_ATTR_RO(_name, _field)				\
> >  	static ssize_t _name##_show(struct kobject *kobj,			\
> >  			struct kobj_attribute *attr, char *buf)			\
> >  	{									\
> >  		struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj);		\
> >  										\
> > -		if (info->reg_midr)						\
> > -			return sprintf(buf, "0x%016x\n", info->reg_##_field);	\
> > -		else								\
> > +		if (info->reg_midr) {						\
> > +			u64 val = info->reg_##_field;				\
> > +			return sprintf(buf, "0x%016llx\n", val);		\
> 
> Nit, for sysfs, use the new sysfs_emit() call instead of sprintf().

Will do.

Thanks!

--
Qais Yousef
Morten Rasmussen Oct. 21, 2020, 1:33 p.m. UTC | #10
On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> one, though not as easy as automatic task placement by the scheduler (my
> first preference, followed by the id_* regs and the aarch32 mask, though
> not a strong preference for any).

Automatic task placement by the scheduler would mean giving up the
requirement that the user-space affinity mask must always be honoured.
Is that on the table?

Killing aarch32 tasks with an empty intersection between the user-space
mask and aarch32_mask is not really "automatic" and would require the
aarch32 capability to be exposed anyway.

Morten
Catalin Marinas Oct. 21, 2020, 2:09 p.m. UTC | #11
On Wed, Oct 21, 2020 at 03:33:29PM +0200, Morten Rasmussen wrote:
> On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > one, though not as easy as automatic task placement by the scheduler (my
> > first preference, followed by the id_* regs and the aarch32 mask, though
> > not a strong preference for any).
> 
> Automatic task placement by the scheduler would mean giving up the
> requirement that the user-space affinity mask must always be honoured.
> Is that on the table?

I think Peter rejected it but I still find it a nicer interface from a
dumb application perspective. It may interact badly with cpusets though
(at least on Android).

> Killing aarch32 tasks with an empty intersection between the
> user-space mask and aarch32_mask is not really "automatic" and would
> require the aarch32 capability to be exposed anyway.

I agree, especially if overriding the user mask is not desirable. But if
one doesn't play around with cpusets, 32-bit apps would run "fine" with
the scheduler transparently placing them on the correct CPU.

Anyway, if the task placement is entirely off the table, the next thing
is asking applications to set their own mask and kill them if they do
the wrong thing. Here I see two possibilities for killing an app:

1. When it ends up scheduled on a non-AArch32-capable CPU

2. If the user cpumask (bar the offline CPUs) is not a subset of the
   aarch32_mask

Option 1 is simpler but 2 would be slightly more consistent.

There's also the question on whether the kernel should allow an ELF32 to
be loaded (and potentially killed subsequently) if the user mask is not
correct on execve().
Qais Yousef Oct. 21, 2020, 2:31 p.m. UTC | #12
On 10/21/20 15:33, Morten Rasmussen wrote:
> On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > one, though not as easy as automatic task placement by the scheduler (my
> > first preference, followed by the id_* regs and the aarch32 mask, though
> > not a strong preference for any).
> 
> Automatic task placement by the scheduler would mean giving up the
> requirement that the user-space affinity mask must always be honoured.
> Is that on the table?
> 
> Killing aarch32 tasks with an empty intersection between the user-space
> mask and aarch32_mask is not really "automatic" and would require the
> aarch32 capability to be exposed anyway.

I just noticed this nasty corner case too.


Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9

"If such a task had been bound to some subset of its cpuset using the
sched_setaffinity() call, the task will be allowed to run on any CPU allowed in
its new cpuset, negating the effect of the prior sched_setaffinity() call."


So user space must put the tasks into a valid cpuset to fix the problem. Or
make the scheduler behave like the affinity is associated with a cpuset.

Can user space put the task into the correct cpuset without a race? Clone3
syscall learnt to specify a cgroup to attach to when forking. Should we do the
same for execve()?


Thanks

--
Qais Yousef
Will Deacon Oct. 21, 2020, 2:41 p.m. UTC | #13
On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> > On 2020-10-21 11:46, Qais Yousef wrote:
> > > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > > printed out
> > > although it's set as FTR_HIDDEN.
> > > 
> > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 
> > > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > > 
> > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > 0x0000000000000011
> > > 0x0000000000000011
> > > 0x0000000000000012
> > > 0x0000000000000012
> > > 0x0000000000000011
> > > 0x0000000000000011
> > 
> > This looks like a terrible userspace interface. It exposes unrelated
> > features,
> 
> Not sure why the EL1 field ended up in here, that's not relevant to the
> user.
> 
> > and doesn't expose the single useful information that the kernel has:
> > the cpumask describing the CPUs supporting  AArch32 at EL0. Why not expose
> > this synthetic piece of information which requires very little effort from
> > userspace and doesn't spit out unrelated stuff?
> 
> I thought the whole idea is to try and avoid the "very little effort"
> part ;).
> 
> > Not to mention the discrepancy with what userspace gets while reading
> > the same register via the MRS emulation.
> > 
> > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy,
> > but I don't think this fits either.
> 
> We already expose MIDR and REVIDR via the current sysfs interface. We
> can expand it to include _all_ the other ID_* regs currently available
> to user via the MRS emulation and we won't have to debate what a new
> interface would look like. The MRS emulation and the sysfs info should
> probably match, though that means we need to expose the
> ID_AA64PFR0_EL1.EL0 field which we currently don't.
> 
> I do agree that an AArch32 cpumask is an easier option both from the
> kernel implementation perspective and from the application usability
> one, though not as easy as automatic task placement by the scheduler (my
> first preference, followed by the id_* regs and the aarch32 mask, though
> not a strong preference for any).

If a cpumask is easier to implement and easier to use, then I think that's
what we should do. It's also then dead easy to disable if necessary by
just returning 0. The only alternative I would prefer is not having to
expose this information altogether, but I'm not sure that figuring this
out from MIDR/REVIDR alone is reliable.

Will
Morten Rasmussen Oct. 21, 2020, 2:41 p.m. UTC | #14
On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 03:33:29PM +0200, Morten Rasmussen wrote:
> > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > > one, though not as easy as automatic task placement by the scheduler (my
> > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > not a strong preference for any).
> > 
> > Automatic task placement by the scheduler would mean giving up the
> > requirement that the user-space affinity mask must always be honoured.
> > Is that on the table?
> 
> I think Peter rejected it but I still find it a nicer interface from a
> dumb application perspective. It may interact badly with cpusets though
> (at least on Android).

Agree that it would be nice for supporting legacy applications. Due to
the cpuset interaction I think there is fair chance that user-space
would want to know the aarch32 cpumask anyway though.

> 
> > Killing aarch32 tasks with an empty intersection between the
> > user-space mask and aarch32_mask is not really "automatic" and would
> > require the aarch32 capability to be exposed anyway.
> 
> I agree, especially if overriding the user mask is not desirable. But if
> one doesn't play around with cpusets, 32-bit apps would run "fine" with
> the scheduler transparently placing them on the correct CPU.

Ruling out user-space setting affinity is another way of solving the
problem ;-)

> Anyway, if the task placement is entirely off the table, the next thing
> is asking applications to set their own mask and kill them if they do
> the wrong thing. Here I see two possibilities for killing an app:
> 
> 1. When it ends up scheduled on a non-AArch32-capable CPU
> 
> 2. If the user cpumask (bar the offline CPUs) is not a subset of the
>    aarch32_mask
> 
> Option 1 is simpler but 2 would be slightly more consistent.

I don't have strong preference. More consistent killing is probably nice
for debugging purposes. If we go with 2, we would go round and kill all
tasks in cpuset (both running and sleeping) if the cpuset mask was
changed to not include aarch32 CPUs?

> There's also the question on whether the kernel should allow an ELF32 to
> be loaded (and potentially killed subsequently) if the user mask is not
> correct on execve().

I wonder how many apps that would handle execve() failing? If we allow
killing, the simplest solution if there is any doubt seems to be just
to kill the task :-)

Morten
Will Deacon Oct. 21, 2020, 2:45 p.m. UTC | #15
On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 03:33:29PM +0200, Morten Rasmussen wrote:
> > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > > one, though not as easy as automatic task placement by the scheduler (my
> > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > not a strong preference for any).
> > 
> > Automatic task placement by the scheduler would mean giving up the
> > requirement that the user-space affinity mask must always be honoured.
> > Is that on the table?
> 
> I think Peter rejected it but I still find it a nicer interface from a
> dumb application perspective. It may interact badly with cpusets though
> (at least on Android).
> 
> > Killing aarch32 tasks with an empty intersection between the
> > user-space mask and aarch32_mask is not really "automatic" and would
> > require the aarch32 capability to be exposed anyway.
> 
> I agree, especially if overriding the user mask is not desirable. But if
> one doesn't play around with cpusets, 32-bit apps would run "fine" with
> the scheduler transparently placing them on the correct CPU.
> 
> Anyway, if the task placement is entirely off the table, the next thing
> is asking applications to set their own mask and kill them if they do
> the wrong thing. Here I see two possibilities for killing an app:
> 
> 1. When it ends up scheduled on a non-AArch32-capable CPU

That sounds fine to me. If we could do the exception return and take a
SIGILL, that's what we'd do, but we can't so we have to catch it before.

> 2. If the user cpumask (bar the offline CPUs) is not a subset of the
>    aarch32_mask
> 
> Option 1 is simpler but 2 would be slightly more consistent.

I disagree -- if we did this for something like fpsimd, then the consistent
behaviour would be to SIGILL on the cores without the instructions.

> There's also the question on whether the kernel should allow an ELF32 to
> be loaded (and potentially killed subsequently) if the user mask is not
> correct on execve().

I don't see the point in distinguishing between "you did execve() on a core
without 32-bit" and "you did execve() on a core with 32-bit and then
migrated to a core without 32-bit".

Will
Qais Yousef Oct. 21, 2020, 3:03 p.m. UTC | #16
On 10/21/20 15:41, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> > > On 2020-10-21 11:46, Qais Yousef wrote:
> > > > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > > > printed out
> > > > although it's set as FTR_HIDDEN.
> > > > 
> > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 
> > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > > > 
> > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > > 0x0000000000000012
> > > > 0x0000000000000012
> > > > 0x0000000000000011
> > > > 0x0000000000000011
> > > 
> > > This looks like a terrible userspace interface. It exposes unrelated
> > > features,
> > 
> > Not sure why the EL1 field ended up in here, that's not relevant to the
> > user.
> > 
> > > and doesn't expose the single useful information that the kernel has:
> > > the cpumask describing the CPUs supporting  AArch32 at EL0. Why not expose
> > > this synthetic piece of information which requires very little effort from
> > > userspace and doesn't spit out unrelated stuff?
> > 
> > I thought the whole idea is to try and avoid the "very little effort"
> > part ;).
> > 
> > > Not to mention the discrepancy with what userspace gets while reading
> > > the same register via the MRS emulation.
> > > 
> > > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy,
> > > but I don't think this fits either.
> > 
> > We already expose MIDR and REVIDR via the current sysfs interface. We
> > can expand it to include _all_ the other ID_* regs currently available
> > to user via the MRS emulation and we won't have to debate what a new
> > interface would look like. The MRS emulation and the sysfs info should
> > probably match, though that means we need to expose the
> > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > 
> > I do agree that an AArch32 cpumask is an easier option both from the
> > kernel implementation perspective and from the application usability
> > one, though not as easy as automatic task placement by the scheduler (my
> > first preference, followed by the id_* regs and the aarch32 mask, though
> > not a strong preference for any).
> 
> If a cpumask is easier to implement and easier to use, then I think that's
> what we should do. It's also then dead easy to disable if necessary by
> just returning 0. The only alternative I would prefer is not having to
> expose this information altogether, but I'm not sure that figuring this
> out from MIDR/REVIDR alone is reliable.

I did suggest this before, but I'll try gain. If we want to assume a custom
bootloader and custom user space, we can make them provide the mask.

For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
a bool as it currently is. Or we can make it a cmdline parameter too.
In both cases some admin (bootloader or init process) has to ensure to fill it
correctly for the target platform. The bootloader should be able to read the
registers to figure out the mask. So more weight to make it a cmdline param.

Then the rest of user space can easily parse the cpumask from
/proc/sys/kernel/enable_asym_32bit or from /proc/cmdline.


Thanks

--
Qais Yousef
Catalin Marinas Oct. 21, 2020, 3:10 p.m. UTC | #17
On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> > Anyway, if the task placement is entirely off the table, the next thing
> > is asking applications to set their own mask and kill them if they do
> > the wrong thing. Here I see two possibilities for killing an app:
> > 
> > 1. When it ends up scheduled on a non-AArch32-capable CPU
> 
> That sounds fine to me. If we could do the exception return and take a
> SIGILL, that's what we'd do, but we can't so we have to catch it before.

Indeed, the illegal ERET doesn't work for this scenario.

> > 2. If the user cpumask (bar the offline CPUs) is not a subset of the
> >    aarch32_mask
> > 
> > Option 1 is simpler but 2 would be slightly more consistent.
> 
> I disagree -- if we did this for something like fpsimd, then the consistent
> behaviour would be to SIGILL on the cores without the instructions.

For fpsimd it makes sense since the main ISA is still available and the
application may be able to do something with the signal. But here we
can't do much since the entire AArch32 mode is not supported. That's why
we went for SIGKILL instead of SIGILL but thinking of it, after execve()
the signals are reset to SIG_DFL so SIGILL cannot be ignored.

I think it depends on whether you look at this fault as a part of ISA
not being available or as the overall application not compatible with
the system it is running on. If the latter, option 2 above makes more
sense.

> > There's also the question on whether the kernel should allow an ELF32 to
> > be loaded (and potentially killed subsequently) if the user mask is not
> > correct on execve().
> 
> I don't see the point in distinguishing between "you did execve() on a core
> without 32-bit" and "you did execve() on a core with 32-bit and then
> migrated to a core without 32-bit".

In the context of option 2 above, its more about whether execve()
returns -ENOEXEC or the process gets a SIGKILL immediately.
Will Deacon Oct. 21, 2020, 3:23 p.m. UTC | #18
On Wed, Oct 21, 2020 at 04:03:13PM +0100, Qais Yousef wrote:
> On 10/21/20 15:41, Will Deacon wrote:
> > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote:
> > > > On 2020-10-21 11:46, Qais Yousef wrote:
> > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is
> > > > > printed out
> > > > > although it's set as FTR_HIDDEN.
> > > > > 
> > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > > 
> > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit
> > > > > 
> > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > > 0x0000000000000012
> > > > > 0x0000000000000012
> > > > > 0x0000000000000011
> > > > > 0x0000000000000011
> > > > 
> > > > This looks like a terrible userspace interface. It exposes unrelated
> > > > features,
> > > 
> > > Not sure why the EL1 field ended up in here, that's not relevant to the
> > > user.
> > > 
> > > > and doesn't expose the single useful information that the kernel has:
> > > > the cpumask describing the CPUs supporting  AArch32 at EL0. Why not expose
> > > > this synthetic piece of information which requires very little effort from
> > > > userspace and doesn't spit out unrelated stuff?
> > > 
> > > I thought the whole idea is to try and avoid the "very little effort"
> > > part ;).
> > > 
> > > > Not to mention the discrepancy with what userspace gets while reading
> > > > the same register via the MRS emulation.
> > > > 
> > > > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy,
> > > > but I don't think this fits either.
> > > 
> > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > can expand it to include _all_ the other ID_* regs currently available
> > > to user via the MRS emulation and we won't have to debate what a new
> > > interface would look like. The MRS emulation and the sysfs info should
> > > probably match, though that means we need to expose the
> > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > 
> > > I do agree that an AArch32 cpumask is an easier option both from the
> > > kernel implementation perspective and from the application usability
> > > one, though not as easy as automatic task placement by the scheduler (my
> > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > not a strong preference for any).
> > 
> > If a cpumask is easier to implement and easier to use, then I think that's
> > what we should do. It's also then dead easy to disable if necessary by
> > just returning 0. The only alternative I would prefer is not having to
> > expose this information altogether, but I'm not sure that figuring this
> > out from MIDR/REVIDR alone is reliable.
> 
> I did suggest this before, but I'll try gain. If we want to assume a custom
> bootloader and custom user space, we can make them provide the mask.

Who mentioned a custom bootloader? In the context of Android, we're
talking about a user-space that already manages scheduling affinity.

> For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> a bool as it currently is. Or we can make it a cmdline parameter too.
> In both cases some admin (bootloader or init process) has to ensure to fill it
> correctly for the target platform. The bootloader should be able to read the
> registers to figure out the mask. So more weight to make it a cmdline param.

I think this is adding complexity for the sake of it. I'm much more in
favour of keeping the implementation and ABI as simple as possible: expose
the fact that the system is heterogenous, have an opt-in for userspace to
say it can handle that and let it handle it.

Will
Will Deacon Oct. 21, 2020, 3:37 p.m. UTC | #19
On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote:
> > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> > > Anyway, if the task placement is entirely off the table, the next thing
> > > is asking applications to set their own mask and kill them if they do
> > > the wrong thing. Here I see two possibilities for killing an app:
> > > 
> > > 1. When it ends up scheduled on a non-AArch32-capable CPU
> > 
> > That sounds fine to me. If we could do the exception return and take a
> > SIGILL, that's what we'd do, but we can't so we have to catch it before.
> 
> Indeed, the illegal ERET doesn't work for this scenario.
> 
> > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the
> > >    aarch32_mask
> > > 
> > > Option 1 is simpler but 2 would be slightly more consistent.
> > 
> > I disagree -- if we did this for something like fpsimd, then the consistent
> > behaviour would be to SIGILL on the cores without the instructions.
> 
> For fpsimd it makes sense since the main ISA is still available and the
> application may be able to do something with the signal. But here we
> can't do much since the entire AArch32 mode is not supported. That's why
> we went for SIGKILL instead of SIGILL but thinking of it, after execve()
> the signals are reset to SIG_DFL so SIGILL cannot be ignored.
> 
> I think it depends on whether you look at this fault as a part of ISA
> not being available or as the overall application not compatible with
> the system it is running on. If the latter, option 2 above makes more
> sense.

Hmm, I'm not sure I see the distinction in practice: you still have a binary
application that cannot run on all CPUs in the system. Who cares if some of
the instructions work?

> > > There's also the question on whether the kernel should allow an ELF32 to
> > > be loaded (and potentially killed subsequently) if the user mask is not
> > > correct on execve().
> > 
> > I don't see the point in distinguishing between "you did execve() on a core
> > without 32-bit" and "you did execve() on a core with 32-bit and then
> > migrated to a core without 32-bit".
> 
> In the context of option 2 above, its more about whether execve()
> returns -ENOEXEC or the process gets a SIGKILL immediately.

I just don't see what we gain by returning -ENOEXEC except for extra code
and behaviour in the ABI (and if you wanted consistentcy you'd also need
to fail attempts to widen the affinity mask to include 64-bit-only cores
from a 32-bit task).

In other words, I don't think the kernel needs to hold userspace's hand
for an opt-in feature that requires userspace to handle scheduling for
optimal power/performance _anyway_. Allowing the affinity to be set
arbitrarily and then killing the task if it ends up trying to run on the
wrong CPU is both simple and sufficient.

Will
Qais Yousef Oct. 21, 2020, 4:07 p.m. UTC | #20
On 10/21/20 16:23, Will Deacon wrote:
> > > If a cpumask is easier to implement and easier to use, then I think that's
> > > what we should do. It's also then dead easy to disable if necessary by
> > > just returning 0. The only alternative I would prefer is not having to
> > > expose this information altogether, but I'm not sure that figuring this
> > > out from MIDR/REVIDR alone is reliable.
> > 
> > I did suggest this before, but I'll try gain. If we want to assume a custom
> > bootloader and custom user space, we can make them provide the mask.
> 
> Who mentioned a custom bootloader? In the context of Android, we're

Custom bootloader as in a bootloader that needs to opt-in to enable the
feature (pass the right cmdline param). Catalin suggested to make this a sysctl
to allow also for runtime toggling. But the initial intention was to have this
to enable it at cmdline.

> talking about a user-space that already manages scheduling affinity.
> 
> > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> > a bool as it currently is. Or we can make it a cmdline parameter too.
> > In both cases some admin (bootloader or init process) has to ensure to fill it
> > correctly for the target platform. The bootloader should be able to read the
> > registers to figure out the mask. So more weight to make it a cmdline param.
> 
> I think this is adding complexity for the sake of it. I'm much more in

I actually think it reduces complexity. No special ABI to generate the mask
from the kernel. The same opt-in flag is the cpumask too.

> favour of keeping the implementation and ABI as simple as possible: expose
> the fact that the system is heterogenous, have an opt-in for userspace to
> say it can handle that and let it handle it.

It still has to do that. It's just the origin of the cpumask will change.

So it really depends how you view the opt-in. Ie: it needs to be discovered vs
the user space knows it exists and it just wants to enable it. So far we've
been going in the latter direction AFAICT. My current implementation is
terrible for discovery.

I don't feel strongly about it anyway. Just an idea. I can understand the lack
of appeal. Not sure if there's a none ugly solution yet :-)

Thanks

--
Qais Yousef
Catalin Marinas Oct. 21, 2020, 4:18 p.m. UTC | #21
On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote:
> > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> > > > Anyway, if the task placement is entirely off the table, the next thing
> > > > is asking applications to set their own mask and kill them if they do
> > > > the wrong thing. Here I see two possibilities for killing an app:
> > > > 
> > > > 1. When it ends up scheduled on a non-AArch32-capable CPU
> > > 
> > > That sounds fine to me. If we could do the exception return and take a
> > > SIGILL, that's what we'd do, but we can't so we have to catch it before.
> > 
> > Indeed, the illegal ERET doesn't work for this scenario.
> > 
> > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the
> > > >    aarch32_mask
> > > > 
> > > > Option 1 is simpler but 2 would be slightly more consistent.
> > > 
> > > I disagree -- if we did this for something like fpsimd, then the consistent
> > > behaviour would be to SIGILL on the cores without the instructions.
> > 
> > For fpsimd it makes sense since the main ISA is still available and the
> > application may be able to do something with the signal. But here we
> > can't do much since the entire AArch32 mode is not supported. That's why
> > we went for SIGKILL instead of SIGILL but thinking of it, after execve()
> > the signals are reset to SIG_DFL so SIGILL cannot be ignored.
> > 
> > I think it depends on whether you look at this fault as a part of ISA
> > not being available or as the overall application not compatible with
> > the system it is running on. If the latter, option 2 above makes more
> > sense.
> 
> Hmm, I'm not sure I see the distinction in practice: you still have a binary
> application that cannot run on all CPUs in the system. Who cares if some of
> the instructions work?

The failure would be more predictable rather than the app running for a
while and randomly getting SIGKILL. If it only fails on execve or
sched_setaffinity, it may be easier to track down (well, there's the CPU
hotplug as well that can change the cpumask intersection outside the
user process control).

> > > > There's also the question on whether the kernel should allow an ELF32 to
> > > > be loaded (and potentially killed subsequently) if the user mask is not
> > > > correct on execve().
> > > 
> > > I don't see the point in distinguishing between "you did execve() on a core
> > > without 32-bit" and "you did execve() on a core with 32-bit and then
> > > migrated to a core without 32-bit".
> > 
> > In the context of option 2 above, its more about whether execve()
> > returns -ENOEXEC or the process gets a SIGKILL immediately.
> 
> I just don't see what we gain by returning -ENOEXEC except for extra code
> and behaviour in the ABI (and if you wanted consistentcy you'd also need
> to fail attempts to widen the affinity mask to include 64-bit-only cores
> from a 32-bit task).

The -ENOEXEC is more in line with the current behaviour not allowing
ELF32 on systems that are not fully symmetric. So basically you'd have
a global opt-in as sysctl and a per-application opt-in via the affinity
mask.

I do agree that it complicates the kernel implementation.

> In other words, I don't think the kernel needs to hold userspace's hand
> for an opt-in feature that requires userspace to handle scheduling for
> optimal power/performance _anyway_. Allowing the affinity to be set
> arbitrarily and then killing the task if it ends up trying to run on the
> wrong CPU is both simple and sufficient.

Fine by me if you want to keep things simple, less code to maintain.

However, it would be good to know if the Android kernel/user guys are
happy with this approach. If the Android kernel ends up carrying
additional patches for task placement, I'd question why we need to merge
this (partial) series at all.
Will Deacon Oct. 21, 2020, 5:19 p.m. UTC | #22
On Wed, Oct 21, 2020 at 05:18:37PM +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote:
> > On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote:
> > > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote:
> > > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> > > > > Anyway, if the task placement is entirely off the table, the next thing
> > > > > is asking applications to set their own mask and kill them if they do
> > > > > the wrong thing. Here I see two possibilities for killing an app:
> > > > > 
> > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU
> > > > 
> > > > That sounds fine to me. If we could do the exception return and take a
> > > > SIGILL, that's what we'd do, but we can't so we have to catch it before.
> > > 
> > > Indeed, the illegal ERET doesn't work for this scenario.
> > > 
> > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the
> > > > >    aarch32_mask
> > > > > 
> > > > > Option 1 is simpler but 2 would be slightly more consistent.
> > > > 
> > > > I disagree -- if we did this for something like fpsimd, then the consistent
> > > > behaviour would be to SIGILL on the cores without the instructions.
> > > 
> > > For fpsimd it makes sense since the main ISA is still available and the
> > > application may be able to do something with the signal. But here we
> > > can't do much since the entire AArch32 mode is not supported. That's why
> > > we went for SIGKILL instead of SIGILL but thinking of it, after execve()
> > > the signals are reset to SIG_DFL so SIGILL cannot be ignored.
> > > 
> > > I think it depends on whether you look at this fault as a part of ISA
> > > not being available or as the overall application not compatible with
> > > the system it is running on. If the latter, option 2 above makes more
> > > sense.
> > 
> > Hmm, I'm not sure I see the distinction in practice: you still have a binary
> > application that cannot run on all CPUs in the system. Who cares if some of
> > the instructions work?
> 
> The failure would be more predictable rather than the app running for a
> while and randomly getting SIGKILL. If it only fails on execve or
> sched_setaffinity, it may be easier to track down (well, there's the CPU
> hotplug as well that can change the cpumask intersection outside the
> user process control).

But it's half-baked, because the moment the 32-bit task changes its affinity
mask then you're back in the old situation. That's why I'm saying this
doesn't add anything, because the rest of the series is designed entirely
around delivering SIGKILL at the last minute rather than preventing us
getting to that situation in the first place. The execve() case feels to me
like we're considering doing something because we can, rather than because
it's actually useful.

> > > > > There's also the question on whether the kernel should allow an ELF32 to
> > > > > be loaded (and potentially killed subsequently) if the user mask is not
> > > > > correct on execve().
> > > > 
> > > > I don't see the point in distinguishing between "you did execve() on a core
> > > > without 32-bit" and "you did execve() on a core with 32-bit and then
> > > > migrated to a core without 32-bit".
> > > 
> > > In the context of option 2 above, its more about whether execve()
> > > returns -ENOEXEC or the process gets a SIGKILL immediately.
> > 
> > I just don't see what we gain by returning -ENOEXEC except for extra code
> > and behaviour in the ABI (and if you wanted consistentcy you'd also need
> > to fail attempts to widen the affinity mask to include 64-bit-only cores
> > from a 32-bit task).
> 
> The -ENOEXEC is more in line with the current behaviour not allowing
> ELF32 on systems that are not fully symmetric. So basically you'd have
> a global opt-in as sysctl and a per-application opt-in via the affinity
> mask.

I think it's a bit strong calling that an opt-in, as the application could
just happen to be using the right cpumask.

> I do agree that it complicates the kernel implementation.
> 
> > In other words, I don't think the kernel needs to hold userspace's hand
> > for an opt-in feature that requires userspace to handle scheduling for
> > optimal power/performance _anyway_. Allowing the affinity to be set
> > arbitrarily and then killing the task if it ends up trying to run on the
> > wrong CPU is both simple and sufficient.
> 
> Fine by me if you want to keep things simple, less code to maintain.
> 
> However, it would be good to know if the Android kernel/user guys are
> happy with this approach. If the Android kernel ends up carrying
> additional patches for task placement, I'd question why we need to merge
> this (partial) series at all.

Hmm, those folks aren't even on CC :(

Adding Suren and Marco...

Will
Will Deacon Oct. 21, 2020, 5:23 p.m. UTC | #23
On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote:
> On 10/21/20 16:23, Will Deacon wrote:
> > > > If a cpumask is easier to implement and easier to use, then I think that's
> > > > what we should do. It's also then dead easy to disable if necessary by
> > > > just returning 0. The only alternative I would prefer is not having to
> > > > expose this information altogether, but I'm not sure that figuring this
> > > > out from MIDR/REVIDR alone is reliable.
> > > 
> > > I did suggest this before, but I'll try gain. If we want to assume a custom
> > > bootloader and custom user space, we can make them provide the mask.
> > 
> > Who mentioned a custom bootloader? In the context of Android, we're
> 
> Custom bootloader as in a bootloader that needs to opt-in to enable the
> feature (pass the right cmdline param). Catalin suggested to make this a sysctl
> to allow also for runtime toggling. But the initial intention was to have this
> to enable it at cmdline.

Hmm, ok, I don't think allowing the cmdline to be specified means its a
custom bootloader.

> > talking about a user-space that already manages scheduling affinity.
> > 
> > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> > > a bool as it currently is. Or we can make it a cmdline parameter too.
> > > In both cases some admin (bootloader or init process) has to ensure to fill it
> > > correctly for the target platform. The bootloader should be able to read the
> > > registers to figure out the mask. So more weight to make it a cmdline param.
> > 
> > I think this is adding complexity for the sake of it. I'm much more in
> 
> I actually think it reduces complexity. No special ABI to generate the mask
> from the kernel. The same opt-in flag is the cpumask too.

Maybe I'm misunderstanding your proposal but having a cpumask instead of
a bool means you now have to consider policy on a per-cpu basis, which
adds an extra dimension to this. For example, do you allow that mask to
be changed at runtime so that differents sets of CPUs now support 32-bit?
Do you preserve it across hotplug?

Will
Qais Yousef Oct. 21, 2020, 7:57 p.m. UTC | #24
On 10/21/20 18:23, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote:
> > On 10/21/20 16:23, Will Deacon wrote:
> > > > > If a cpumask is easier to implement and easier to use, then I think that's
> > > > > what we should do. It's also then dead easy to disable if necessary by
> > > > > just returning 0. The only alternative I would prefer is not having to
> > > > > expose this information altogether, but I'm not sure that figuring this
> > > > > out from MIDR/REVIDR alone is reliable.
> > > > 
> > > > I did suggest this before, but I'll try gain. If we want to assume a custom
> > > > bootloader and custom user space, we can make them provide the mask.
> > > 
> > > Who mentioned a custom bootloader? In the context of Android, we're
> > 
> > Custom bootloader as in a bootloader that needs to opt-in to enable the
> > feature (pass the right cmdline param). Catalin suggested to make this a sysctl
> > to allow also for runtime toggling. But the initial intention was to have this
> > to enable it at cmdline.
> 
> Hmm, ok, I don't think allowing the cmdline to be specified means its a
> custom bootloader.

True it could be just added to chosen property in the DT file without any
bootloader changes.

Bad usage of English probably. I just meant the bootloader might need to be
made aware of the opt-in process too. So it can potentially co-operate more.

> > > talking about a user-space that already manages scheduling affinity.
> > > 
> > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> > > > a bool as it currently is. Or we can make it a cmdline parameter too.
> > > > In both cases some admin (bootloader or init process) has to ensure to fill it
> > > > correctly for the target platform. The bootloader should be able to read the
> > > > registers to figure out the mask. So more weight to make it a cmdline param.
> > > 
> > > I think this is adding complexity for the sake of it. I'm much more in
> > 
> > I actually think it reduces complexity. No special ABI to generate the mask
> > from the kernel. The same opt-in flag is the cpumask too.
> 
> Maybe I'm misunderstanding your proposal but having a cpumask instead of

What I meant is that if we change the requirement to opt-in from a boolean
switch

	sysctl.enable_32bit_asym=1

to require the bootloader/init scripts provide the mask of aarch32 capable cpus

	sysctl.asym_32bit_cpus=0xf0

This will achieve multiple things at the same time:

	* Defer cpus specification to platform designers who want to
	  enable this feature on their platform.

	* We don't need a separate API to export which cpus are 32bit capable.
	  They can read it directly from /proc/sys/kernel/asym_32bit_cpus.
	  When it's 0 it means the system is not asymmetric.

	* If/when we want to disable this support in the future. The sysctl
	  handler will just have to return 0 all the time and ignore all
	  writes.

So it's changing the way user space opts-in. The kernel will still treat it as
a boolean, and probably sanity check the cpus to make sure they match what it
sees.

> a bool means you now have to consider policy on a per-cpu basis, which
> adds an extra dimension to this. For example, do you allow that mask to
> be changed at runtime so that differents sets of CPUs now support 32-bit?
> Do you preserve it across hotplug?

I can see how cpumask word was confusing. I hope the above is clearer.
We don't change how the kernel works, but rather what the user space have to do
to opt-in. Hopefully hitting 2 birds with one stone :-)

Thanks!

--
Qais Yousef
Will Deacon Oct. 21, 2020, 8:26 p.m. UTC | #25
On Wed, Oct 21, 2020 at 08:57:36PM +0100, Qais Yousef wrote:
> On 10/21/20 18:23, Will Deacon wrote:
> > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote:
> > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> > > > > a bool as it currently is. Or we can make it a cmdline parameter too.
> > > > > In both cases some admin (bootloader or init process) has to ensure to fill it
> > > > > correctly for the target platform. The bootloader should be able to read the
> > > > > registers to figure out the mask. So more weight to make it a cmdline param.
> > > > 
> > > > I think this is adding complexity for the sake of it. I'm much more in
> > > 
> > > I actually think it reduces complexity. No special ABI to generate the mask
> > > from the kernel. The same opt-in flag is the cpumask too.
> > 
> > Maybe I'm misunderstanding your proposal but having a cpumask instead of
> 
> What I meant is that if we change the requirement to opt-in from a boolean
> switch
> 
> 	sysctl.enable_32bit_asym=1
> 
> to require the bootloader/init scripts provide the mask of aarch32 capable cpus
> 
> 	sysctl.asym_32bit_cpus=0xf0
> 
> This will achieve multiple things at the same time:
> 
> 	* Defer cpus specification to platform designers who want to
> 	  enable this feature on their platform.

What do you mean by this? The kernel will obviously have to check that
the mask is correct (i.e. it doesn't contain 64-bit only CPUs), at which
point it's a boolean in disguise.

> 	* We don't need a separate API to export which cpus are 32bit capable.
> 	  They can read it directly from /proc/sys/kernel/asym_32bit_cpus.
> 	  When it's 0 it means the system is not asymmetric.

I don't see how this is better than a separate cpumask for this purpose.
It feels like you're trying to overload the control and the identification,
but that just makes things confusing and hard to use as I now need to know
which logical CPUs correspond to which physical CPUs in order to set the
command-line.

> 	* If/when we want to disable this support in the future. The sysctl
> 	  handler will just have to return 0 all the time and ignore all
> 	  writes.

It can do that as a boolean too, right?

Will
Catalin Marinas Oct. 22, 2020, 8:16 a.m. UTC | #26
On Wed, Oct 21, 2020 at 09:26:27PM +0100, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 08:57:36PM +0100, Qais Yousef wrote:
> > On 10/21/20 18:23, Will Deacon wrote:
> > > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote:
> > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> > > > > > a bool as it currently is. Or we can make it a cmdline parameter too.
> > > > > > In both cases some admin (bootloader or init process) has to ensure to fill it
> > > > > > correctly for the target platform. The bootloader should be able to read the
> > > > > > registers to figure out the mask. So more weight to make it a cmdline param.
> > > > > 
> > > > > I think this is adding complexity for the sake of it. I'm much more in
> > > > 
> > > > I actually think it reduces complexity. No special ABI to generate the mask
> > > > from the kernel. The same opt-in flag is the cpumask too.
> > > 
> > > Maybe I'm misunderstanding your proposal but having a cpumask instead of
> > 
> > What I meant is that if we change the requirement to opt-in from a boolean
> > switch
> > 
> > 	sysctl.enable_32bit_asym=1
> > 
> > to require the bootloader/init scripts provide the mask of aarch32 capable cpus
> > 
> > 	sysctl.asym_32bit_cpus=0xf0
[...]
> > 	* We don't need a separate API to export which cpus are 32bit capable.
> > 	  They can read it directly from /proc/sys/kernel/asym_32bit_cpus.
> > 	  When it's 0 it means the system is not asymmetric.
> 
> I don't see how this is better than a separate cpumask for this purpose.
> It feels like you're trying to overload the control and the identification,
> but that just makes things confusing and hard to use as I now need to know
> which logical CPUs correspond to which physical CPUs in order to set the
> command-line.

I agree. Let's leave the identification to the kernel as it has access
to the CPUID registers and can provide the cpumask. The control in this
case doesn't need to be more than a boolean and its meaning is that the
user knows what it is doing.
Morten Rasmussen Oct. 22, 2020, 9:55 a.m. UTC | #27
On Wed, Oct 21, 2020 at 06:19:48PM +0100, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 05:18:37PM +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote:
> > > On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote:
> > > > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote:
> > > > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> > > > > > Anyway, if the task placement is entirely off the table, the next thing
> > > > > > is asking applications to set their own mask and kill them if they do
> > > > > > the wrong thing. Here I see two possibilities for killing an app:
> > > > > > 
> > > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU
> > > > > 
> > > > > That sounds fine to me. If we could do the exception return and take a
> > > > > SIGILL, that's what we'd do, but we can't so we have to catch it before.
> > > > 
> > > > Indeed, the illegal ERET doesn't work for this scenario.
> > > > 
> > > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the
> > > > > >    aarch32_mask
> > > > > > 
> > > > > > Option 1 is simpler but 2 would be slightly more consistent.
> > > > > 
> > > > > I disagree -- if we did this for something like fpsimd, then the consistent
> > > > > behaviour would be to SIGILL on the cores without the instructions.
> > > > 
> > > > For fpsimd it makes sense since the main ISA is still available and the
> > > > application may be able to do something with the signal. But here we
> > > > can't do much since the entire AArch32 mode is not supported. That's why
> > > > we went for SIGKILL instead of SIGILL but thinking of it, after execve()
> > > > the signals are reset to SIG_DFL so SIGILL cannot be ignored.
> > > > 
> > > > I think it depends on whether you look at this fault as a part of ISA
> > > > not being available or as the overall application not compatible with
> > > > the system it is running on. If the latter, option 2 above makes more
> > > > sense.
> > > 
> > > Hmm, I'm not sure I see the distinction in practice: you still have a binary
> > > application that cannot run on all CPUs in the system. Who cares if some of
> > > the instructions work?
> > 
> > The failure would be more predictable rather than the app running for a
> > while and randomly getting SIGKILL. If it only fails on execve or
> > sched_setaffinity, it may be easier to track down (well, there's the CPU
> > hotplug as well that can change the cpumask intersection outside the
> > user process control).

Migration between cpusets is another failure scenario where the app can
get SIGKILL randomly.

> But it's half-baked, because the moment the 32-bit task changes its affinity
> mask then you're back in the old situation. That's why I'm saying this
> doesn't add anything, because the rest of the series is designed entirely
> around delivering SIGKILL at the last minute rather than preventing us
> getting to that situation in the first place. The execve() case feels to me
> like we're considering doing something because we can, rather than because
> it's actually useful.

Agree.
Qais Yousef Oct. 22, 2020, 9:58 a.m. UTC | #28
On 10/22/20 09:16, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 09:26:27PM +0100, Will Deacon wrote:
> > On Wed, Oct 21, 2020 at 08:57:36PM +0100, Qais Yousef wrote:
> > > On 10/21/20 18:23, Will Deacon wrote:
> > > > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote:
> > > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of
> > > > > > > a bool as it currently is. Or we can make it a cmdline parameter too.
> > > > > > > In both cases some admin (bootloader or init process) has to ensure to fill it
> > > > > > > correctly for the target platform. The bootloader should be able to read the
> > > > > > > registers to figure out the mask. So more weight to make it a cmdline param.
> > > > > > 
> > > > > > I think this is adding complexity for the sake of it. I'm much more in
> > > > > 
> > > > > I actually think it reduces complexity. No special ABI to generate the mask
> > > > > from the kernel. The same opt-in flag is the cpumask too.
> > > > 
> > > > Maybe I'm misunderstanding your proposal but having a cpumask instead of
> > > 
> > > What I meant is that if we change the requirement to opt-in from a boolean
> > > switch
> > > 
> > > 	sysctl.enable_32bit_asym=1
> > > 
> > > to require the bootloader/init scripts provide the mask of aarch32 capable cpus
> > > 
> > > 	sysctl.asym_32bit_cpus=0xf0
> [...]
> > > 	* We don't need a separate API to export which cpus are 32bit capable.
> > > 	  They can read it directly from /proc/sys/kernel/asym_32bit_cpus.
> > > 	  When it's 0 it means the system is not asymmetric.
> > 
> > I don't see how this is better than a separate cpumask for this purpose.
> > It feels like you're trying to overload the control and the identification,
> > but that just makes things confusing and hard to use as I now need to know
> > which logical CPUs correspond to which physical CPUs in order to set the
> > command-line.

I tend to disagree with some of the statements. But I'll leave it at that.
Whatever makes the ship move :)

> I agree. Let's leave the identification to the kernel as it has access
> to the CPUID registers and can provide the cpumask. The control in this
> case doesn't need to be more than a boolean and its meaning is that the
> user knows what it is doing.

Thanks

--
Qais Yousef
Morten Rasmussen Oct. 22, 2020, 10:16 a.m. UTC | #29
On Wed, Oct 21, 2020 at 03:31:53PM +0100, Qais Yousef wrote:
> On 10/21/20 15:33, Morten Rasmussen wrote:
> > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > > one, though not as easy as automatic task placement by the scheduler (my
> > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > not a strong preference for any).
> > 
> > Automatic task placement by the scheduler would mean giving up the
> > requirement that the user-space affinity mask must always be honoured.
> > Is that on the table?
> > 
> > Killing aarch32 tasks with an empty intersection between the user-space
> > mask and aarch32_mask is not really "automatic" and would require the
> > aarch32 capability to be exposed anyway.
> 
> I just noticed this nasty corner case too.
> 
> 
> Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9
> 
> "If such a task had been bound to some subset of its cpuset using the
> sched_setaffinity() call, the task will be allowed to run on any CPU allowed in
> its new cpuset, negating the effect of the prior sched_setaffinity() call."
> 
> So user space must put the tasks into a valid cpuset to fix the problem. Or
> make the scheduler behave like the affinity is associated with a cpuset.
> 
> Can user space put the task into the correct cpuset without a race? Clone3
> syscall learnt to specify a cgroup to attach to when forking. Should we do the
> same for execve()?

Putting a task in a cpuset overrides any affinity mask applied by a
previous cpuset or sched_setaffinity() call. I wouldn't call it a corner
case though. Android user-space is exploiting it all the time on some
devices through the foreground, background, and top-app cgroups.

If a tasks fork() the child task will belong to the same cgroup
automatically. If you execve() you retain the previous affinity mask and
cgroup. So putting parent task about to execve() into aarch32 into a
cpuset with only aarch32 CPUs should be enough to never have the task or
any of its child tasks SIGKILLED.

A few simple experiments with fork() and execve() seems to confirm that.

I don't see any changes needed in the kernel. Changing cgroup through
clone could of course fail if user-space specifies an unsuitable cgroup.
Fixing that would be part of fixing the cpuset setup in user-space which
is required anyway.

Morten
Qais Yousef Oct. 22, 2020, 10:48 a.m. UTC | #30
On 10/22/20 12:16, Morten Rasmussen wrote:
> On Wed, Oct 21, 2020 at 03:31:53PM +0100, Qais Yousef wrote:
> > On 10/21/20 15:33, Morten Rasmussen wrote:
> > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > > > one, though not as easy as automatic task placement by the scheduler (my
> > > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > > not a strong preference for any).
> > > 
> > > Automatic task placement by the scheduler would mean giving up the
> > > requirement that the user-space affinity mask must always be honoured.
> > > Is that on the table?
> > > 
> > > Killing aarch32 tasks with an empty intersection between the user-space
> > > mask and aarch32_mask is not really "automatic" and would require the
> > > aarch32 capability to be exposed anyway.
> > 
> > I just noticed this nasty corner case too.
> > 
> > 
> > Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9
> > 
> > "If such a task had been bound to some subset of its cpuset using the
> > sched_setaffinity() call, the task will be allowed to run on any CPU allowed in
> > its new cpuset, negating the effect of the prior sched_setaffinity() call."
> > 
> > So user space must put the tasks into a valid cpuset to fix the problem. Or
> > make the scheduler behave like the affinity is associated with a cpuset.
> > 
> > Can user space put the task into the correct cpuset without a race? Clone3
> > syscall learnt to specify a cgroup to attach to when forking. Should we do the
> > same for execve()?
> 
> Putting a task in a cpuset overrides any affinity mask applied by a
> previous cpuset or sched_setaffinity() call. I wouldn't call it a corner
> case though. Android user-space is exploiting it all the time on some
> devices through the foreground, background, and top-app cgroups.

Yep. What I was referring to is that if we go the kernel fixing affinity
automatically route, that cpuset behavior will be problematic. In this case
fixing the affinity at the task level will not be enough because cpusets will
override it. So catering for that is another complication to take into account.

> If a tasks fork() the child task will belong to the same cgroup
> automatically. If you execve() you retain the previous affinity mask and
> cgroup. So putting parent task about to execve() into aarch32 into a
> cpuset with only aarch32 CPUs should be enough to never have the task or
> any of its child tasks SIGKILLED.
> 
> A few simple experiments with fork() and execve() seems to confirm that.

+1

This made me wonder what happens when SCHED_RESET_ON_FORK is set. It only
resets policty and priority. So we're good.

> I don't see any changes needed in the kernel. Changing cgroup through
> clone could of course fail if user-space specifies an unsuitable cgroup.
> Fixing that would be part of fixing the cpuset setup in user-space which
> is required anyway.

+1

Thanks

--
Qais Yousef
Qais Yousef Oct. 22, 2020, 1:47 p.m. UTC | #31
On 10/21/20 15:41, Will Deacon wrote:
> > We already expose MIDR and REVIDR via the current sysfs interface. We
> > can expand it to include _all_ the other ID_* regs currently available
> > to user via the MRS emulation and we won't have to debate what a new
> > interface would look like. The MRS emulation and the sysfs info should
> > probably match, though that means we need to expose the
> > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > 
> > I do agree that an AArch32 cpumask is an easier option both from the
> > kernel implementation perspective and from the application usability
> > one, though not as easy as automatic task placement by the scheduler (my
> > first preference, followed by the id_* regs and the aarch32 mask, though
> > not a strong preference for any).
> 
> If a cpumask is easier to implement and easier to use, then I think that's
> what we should do. It's also then dead easy to disable if necessary by
> just returning 0. The only alternative I would prefer is not having to
> expose this information altogether, but I'm not sure that figuring this
> out from MIDR/REVIDR alone is reliable.

So the mask idea is about adding a new

	/sys/devices/system/cpu/aarch32_cpus

?

I just need to make sure that Peter and Greg are happy with this arm64 specific
mask added to sysfs in this manner. Not sure if there's a precedent of archs
exporting special masks in sysfs.

Or maybe people had something else in mind about how his this mask should be
exported?

Thanks

--
Qais Yousef
Greg Kroah-Hartman Oct. 22, 2020, 1:55 p.m. UTC | #32
On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote:
> On 10/21/20 15:41, Will Deacon wrote:
> > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > can expand it to include _all_ the other ID_* regs currently available
> > > to user via the MRS emulation and we won't have to debate what a new
> > > interface would look like. The MRS emulation and the sysfs info should
> > > probably match, though that means we need to expose the
> > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > 
> > > I do agree that an AArch32 cpumask is an easier option both from the
> > > kernel implementation perspective and from the application usability
> > > one, though not as easy as automatic task placement by the scheduler (my
> > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > not a strong preference for any).
> > 
> > If a cpumask is easier to implement and easier to use, then I think that's
> > what we should do. It's also then dead easy to disable if necessary by
> > just returning 0. The only alternative I would prefer is not having to
> > expose this information altogether, but I'm not sure that figuring this
> > out from MIDR/REVIDR alone is reliable.
> 
> So the mask idea is about adding a new
> 
> 	/sys/devices/system/cpu/aarch32_cpus
> 
> ?

Is this a file, a directory, or what?  What's the contents?

Without any of that, I have no idea if it's "ok" or not...

thanks,

greg k-h
Catalin Marinas Oct. 22, 2020, 2:31 p.m. UTC | #33
On Thu, Oct 22, 2020 at 03:55:59PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote:
> > On 10/21/20 15:41, Will Deacon wrote:
> > > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > > can expand it to include _all_ the other ID_* regs currently available
> > > > to user via the MRS emulation and we won't have to debate what a new
> > > > interface would look like. The MRS emulation and the sysfs info should
> > > > probably match, though that means we need to expose the
> > > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > > 
> > > > I do agree that an AArch32 cpumask is an easier option both from the
> > > > kernel implementation perspective and from the application usability
> > > > one, though not as easy as automatic task placement by the scheduler (my
> > > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > > not a strong preference for any).
> > > 
> > > If a cpumask is easier to implement and easier to use, then I think that's
> > > what we should do. It's also then dead easy to disable if necessary by
> > > just returning 0. The only alternative I would prefer is not having to
> > > expose this information altogether, but I'm not sure that figuring this
> > > out from MIDR/REVIDR alone is reliable.
> > 
> > So the mask idea is about adding a new
> > 
> > 	/sys/devices/system/cpu/aarch32_cpus
> > 
> > ?
> 
> Is this a file, a directory, or what?  What's the contents?
> 
> Without any of that, I have no idea if it's "ok" or not...

I don't know what Qais thinks but a file matching the syntax of the
present/online/offline/possible files (e.g. 0-3) would look more
consistent.
Qais Yousef Oct. 22, 2020, 2:34 p.m. UTC | #34
On 10/22/20 15:31, Catalin Marinas wrote:
> On Thu, Oct 22, 2020 at 03:55:59PM +0200, Greg Kroah-Hartman wrote:
> > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote:
> > > On 10/21/20 15:41, Will Deacon wrote:
> > > > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > > > can expand it to include _all_ the other ID_* regs currently available
> > > > > to user via the MRS emulation and we won't have to debate what a new
> > > > > interface would look like. The MRS emulation and the sysfs info should
> > > > > probably match, though that means we need to expose the
> > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > > > 
> > > > > I do agree that an AArch32 cpumask is an easier option both from the
> > > > > kernel implementation perspective and from the application usability
> > > > > one, though not as easy as automatic task placement by the scheduler (my
> > > > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > > > not a strong preference for any).
> > > > 
> > > > If a cpumask is easier to implement and easier to use, then I think that's
> > > > what we should do. It's also then dead easy to disable if necessary by
> > > > just returning 0. The only alternative I would prefer is not having to
> > > > expose this information altogether, but I'm not sure that figuring this
> > > > out from MIDR/REVIDR alone is reliable.
> > > 
> > > So the mask idea is about adding a new
> > > 
> > > 	/sys/devices/system/cpu/aarch32_cpus
> > > 
> > > ?
> > 
> > Is this a file, a directory, or what?  What's the contents?
> > 
> > Without any of that, I have no idea if it's "ok" or not...
> 
> I don't know what Qais thinks but a file matching the syntax of the
> present/online/offline/possible files (e.g. 0-3) would look more
> consistent.

Yes. Sorry I thought it's obvious.

Thanks

--
Qais Yousef
Qais Yousef Oct. 26, 2020, 7:02 p.m. UTC | #35
On 10/22/20 15:55, Greg Kroah-Hartman wrote:
> On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote:
> > On 10/21/20 15:41, Will Deacon wrote:
> > > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > > can expand it to include _all_ the other ID_* regs currently available
> > > > to user via the MRS emulation and we won't have to debate what a new
> > > > interface would look like. The MRS emulation and the sysfs info should
> > > > probably match, though that means we need to expose the
> > > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > > 
> > > > I do agree that an AArch32 cpumask is an easier option both from the
> > > > kernel implementation perspective and from the application usability
> > > > one, though not as easy as automatic task placement by the scheduler (my
> > > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > > not a strong preference for any).
> > > 
> > > If a cpumask is easier to implement and easier to use, then I think that's
> > > what we should do. It's also then dead easy to disable if necessary by
> > > just returning 0. The only alternative I would prefer is not having to
> > > expose this information altogether, but I'm not sure that figuring this
> > > out from MIDR/REVIDR alone is reliable.
> > 
> > So the mask idea is about adding a new
> > 
> > 	/sys/devices/system/cpu/aarch32_cpus
> > 
> > ?
> 
> Is this a file, a directory, or what?  What's the contents?
> 
> Without any of that, I have no idea if it's "ok" or not...

Hopefully the below patch explains better. Note that I added the new attribute
to driver/base/cpu.c, but assuming we will still want to go down this route, we
will need a generic way for archs to add their attributes to
/sys/devices/system/cpu/.

Something like having a special define for archs to append their own
attributes list

	#define SYSFS_SYSTEM_CPU_ARCH_ATTRIBUTES

Or probably there's a way to add this file (attribute) dynamically from arch
code that I just didn't figure out how to do yet.

Thanks

--
Qais Yousef


---------->8------------

From 96dfdfdacb2a26a60ba19051e8c72e839eb5408b Mon Sep 17 00:00:00 2001
From: Qais Yousef <qais.yousef@arm.com>
Date: Mon, 26 Oct 2020 16:33:32 +0000
Subject: [PATCH] arm64: export aarch32_online mask in sysfs

This patch to be applied on top of arm64 Asymmetric AArch32 support.

It explores the option of exporting the AArch32 capable cpus as a mask
on sysfs.

This is to help drive the discussion on the API before sending the next
version which I yet to address some of the review comments.

The output looks like:

	# cat /sys/devices/system/cpu/aarch32_online
	0-5

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  7 +++++++
 arch/arm64/include/asm/cpufeature.h                |  2 ++
 arch/arm64/kernel/cpufeature.c                     |  8 ++++++++
 drivers/base/cpu.c                                 | 12 ++++++++++++
 4 files changed, 29 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index b555df825447..9ccb5c3f5ee3 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -36,6 +36,13 @@ Description:	CPU topology files that describe kernel limits related to
 
 		See Documentation/admin-guide/cputopology.rst for more information.
 
+What:		/sys/devices/system/cpu/aarch32_online
+Date:		October 2020
+Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
+Description:	CPU topology file that describes which cpus support AArch32 at
+		EL0. Only available on arm64.
+
+		The value is updated when a cpu becomes online then sticks.
 
 What:		/sys/devices/system/cpu/probe
 		/sys/devices/system/cpu/release
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 2b87f17b2bd4..edd18002ad81 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -380,6 +380,8 @@ cpucap_multi_entry_cap_matches(const struct arm64_cpu_capabilities *entry,
 	return false;
 }
 
+extern cpumask_t aarch32_el0_mask;
+
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
 extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
 extern struct static_key_false arm64_const_caps_ready;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0f7307c8ad80..662bbc2b15cd 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1723,6 +1723,13 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap)
 	return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT);
 }
 
+cpumask_t aarch32_el0_mask;
+static void cpu_enable_aarch32_el0(struct arm64_cpu_capabilities const *cap)
+{
+	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU))
+		cpumask_set_cpu(smp_processor_id(), &aarch32_el0_mask);
+}
+
 static const struct arm64_cpu_capabilities arm64_features[] = {
 	{
 		.desc = "GIC system register CPU interface",
@@ -1809,6 +1816,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 	{
 		.capability = ARM64_HAS_ASYM_32BIT_EL0,
 		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
+		.cpu_enable = cpu_enable_aarch32_el0,
 		.matches = has_cpuid_feature,
 		.sys_reg = SYS_ID_AA64PFR0_EL1,
 		.sign = FTR_UNSIGNED,
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d2136ab9b14a..569baacde508 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -459,6 +459,15 @@ EXPORT_SYMBOL_GPL(cpu_device_create);
 static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL);
 #endif
 
+#ifdef CONFIG_ARM64
+static ssize_t print_aarch32_online(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	return cpumap_print_to_pagebuf(true, buf, &aarch32_el0_mask);
+}
+static DEVICE_ATTR(aarch32_online, 0444, print_aarch32_online, NULL);
+#endif
+
 static struct attribute *cpu_root_attrs[] = {
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 	&dev_attr_probe.attr,
@@ -475,6 +484,9 @@ static struct attribute *cpu_root_attrs[] = {
 #endif
 #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 	&dev_attr_modalias.attr,
+#endif
+#ifdef CONFIG_ARM64
+	&dev_attr_aarch32_online.attr,
 #endif
 	NULL
 };
Greg Kroah-Hartman Oct. 26, 2020, 7:08 p.m. UTC | #36
On Mon, Oct 26, 2020 at 07:02:50PM +0000, Qais Yousef wrote:
> On 10/22/20 15:55, Greg Kroah-Hartman wrote:
> > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote:
> > > On 10/21/20 15:41, Will Deacon wrote:
> > > > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > > > can expand it to include _all_ the other ID_* regs currently available
> > > > > to user via the MRS emulation and we won't have to debate what a new
> > > > > interface would look like. The MRS emulation and the sysfs info should
> > > > > probably match, though that means we need to expose the
> > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > > > 
> > > > > I do agree that an AArch32 cpumask is an easier option both from the
> > > > > kernel implementation perspective and from the application usability
> > > > > one, though not as easy as automatic task placement by the scheduler (my
> > > > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > > > not a strong preference for any).
> > > > 
> > > > If a cpumask is easier to implement and easier to use, then I think that's
> > > > what we should do. It's also then dead easy to disable if necessary by
> > > > just returning 0. The only alternative I would prefer is not having to
> > > > expose this information altogether, but I'm not sure that figuring this
> > > > out from MIDR/REVIDR alone is reliable.
> > > 
> > > So the mask idea is about adding a new
> > > 
> > > 	/sys/devices/system/cpu/aarch32_cpus
> > > 
> > > ?
> > 
> > Is this a file, a directory, or what?  What's the contents?
> > 
> > Without any of that, I have no idea if it's "ok" or not...
> 
> Hopefully the below patch explains better. Note that I added the new attribute
> to driver/base/cpu.c, but assuming we will still want to go down this route, we
> will need a generic way for archs to add their attributes to
> /sys/devices/system/cpu/.
> 
> Something like having a special define for archs to append their own
> attributes list
> 
> 	#define SYSFS_SYSTEM_CPU_ARCH_ATTRIBUTES
> 
> Or probably there's a way to add this file (attribute) dynamically from arch
> code that I just didn't figure out how to do yet.

Please do that, sysfs files should not be present when the information
is not needed from them.  Look at the is_visible() callback for the
attribute for how to do it.

> 
> Thanks
> 
> --
> Qais Yousef
> 
> 
> ---------->8------------
> 
> >From 96dfdfdacb2a26a60ba19051e8c72e839eb5408b Mon Sep 17 00:00:00 2001
> From: Qais Yousef <qais.yousef@arm.com>
> Date: Mon, 26 Oct 2020 16:33:32 +0000
> Subject: [PATCH] arm64: export aarch32_online mask in sysfs
> 
> This patch to be applied on top of arm64 Asymmetric AArch32 support.
> 
> It explores the option of exporting the AArch32 capable cpus as a mask
> on sysfs.
> 
> This is to help drive the discussion on the API before sending the next
> version which I yet to address some of the review comments.
> 
> The output looks like:
> 
> 	# cat /sys/devices/system/cpu/aarch32_online
> 	0-5
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |  7 +++++++
>  arch/arm64/include/asm/cpufeature.h                |  2 ++
>  arch/arm64/kernel/cpufeature.c                     |  8 ++++++++
>  drivers/base/cpu.c                                 | 12 ++++++++++++
>  4 files changed, 29 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index b555df825447..9ccb5c3f5ee3 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -36,6 +36,13 @@ Description:	CPU topology files that describe kernel limits related to
>  
>  		See Documentation/admin-guide/cputopology.rst for more information.
>  
> +What:		/sys/devices/system/cpu/aarch32_online
> +Date:		October 2020
> +Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
> +Description:	CPU topology file that describes which cpus support AArch32 at
> +		EL0. Only available on arm64.
> +
> +		The value is updated when a cpu becomes online then sticks.

What does "then sticks" mean?



>  
>  What:		/sys/devices/system/cpu/probe
>  		/sys/devices/system/cpu/release
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 2b87f17b2bd4..edd18002ad81 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -380,6 +380,8 @@ cpucap_multi_entry_cap_matches(const struct arm64_cpu_capabilities *entry,
>  	return false;
>  }
>  
> +extern cpumask_t aarch32_el0_mask;
> +
>  extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>  extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
>  extern struct static_key_false arm64_const_caps_ready;
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 0f7307c8ad80..662bbc2b15cd 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1723,6 +1723,13 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap)
>  	return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT);
>  }
>  
> +cpumask_t aarch32_el0_mask;
> +static void cpu_enable_aarch32_el0(struct arm64_cpu_capabilities const *cap)
> +{
> +	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU))
> +		cpumask_set_cpu(smp_processor_id(), &aarch32_el0_mask);
> +}
> +
>  static const struct arm64_cpu_capabilities arm64_features[] = {
>  	{
>  		.desc = "GIC system register CPU interface",
> @@ -1809,6 +1816,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>  	{
>  		.capability = ARM64_HAS_ASYM_32BIT_EL0,
>  		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
> +		.cpu_enable = cpu_enable_aarch32_el0,
>  		.matches = has_cpuid_feature,
>  		.sys_reg = SYS_ID_AA64PFR0_EL1,
>  		.sign = FTR_UNSIGNED,
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index d2136ab9b14a..569baacde508 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -459,6 +459,15 @@ EXPORT_SYMBOL_GPL(cpu_device_create);
>  static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL);
>  #endif
>  
> +#ifdef CONFIG_ARM64
> +static ssize_t print_aarch32_online(struct device *dev,
> +				       struct device_attribute *attr, char *buf)
> +{
> +	return cpumap_print_to_pagebuf(true, buf, &aarch32_el0_mask);
> +}
> +static DEVICE_ATTR(aarch32_online, 0444, print_aarch32_online, NULL);

DEVICE_ATTR_RO()?

> +#endif

Hah, no, no arch-specific stuff in here, sorry.  Please do this properly
in your arch-specific code only.

thanks,

greg k-h
Qais Yousef Oct. 26, 2020, 7:18 p.m. UTC | #37
On 10/26/20 20:08, Greg Kroah-Hartman wrote:
> On Mon, Oct 26, 2020 at 07:02:50PM +0000, Qais Yousef wrote:
> > On 10/22/20 15:55, Greg Kroah-Hartman wrote:
> > > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote:
> > > > On 10/21/20 15:41, Will Deacon wrote:
> > > > > > We already expose MIDR and REVIDR via the current sysfs interface. We
> > > > > > can expand it to include _all_ the other ID_* regs currently available
> > > > > > to user via the MRS emulation and we won't have to debate what a new
> > > > > > interface would look like. The MRS emulation and the sysfs info should
> > > > > > probably match, though that means we need to expose the
> > > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't.
> > > > > > 
> > > > > > I do agree that an AArch32 cpumask is an easier option both from the
> > > > > > kernel implementation perspective and from the application usability
> > > > > > one, though not as easy as automatic task placement by the scheduler (my
> > > > > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > > > > not a strong preference for any).
> > > > > 
> > > > > If a cpumask is easier to implement and easier to use, then I think that's
> > > > > what we should do. It's also then dead easy to disable if necessary by
> > > > > just returning 0. The only alternative I would prefer is not having to
> > > > > expose this information altogether, but I'm not sure that figuring this
> > > > > out from MIDR/REVIDR alone is reliable.
> > > > 
> > > > So the mask idea is about adding a new
> > > > 
> > > > 	/sys/devices/system/cpu/aarch32_cpus
> > > > 
> > > > ?
> > > 
> > > Is this a file, a directory, or what?  What's the contents?
> > > 
> > > Without any of that, I have no idea if it's "ok" or not...
> > 
> > Hopefully the below patch explains better. Note that I added the new attribute
> > to driver/base/cpu.c, but assuming we will still want to go down this route, we
> > will need a generic way for archs to add their attributes to
> > /sys/devices/system/cpu/.
> > 
> > Something like having a special define for archs to append their own
> > attributes list
> > 
> > 	#define SYSFS_SYSTEM_CPU_ARCH_ATTRIBUTES
> > 
> > Or probably there's a way to add this file (attribute) dynamically from arch
> > code that I just didn't figure out how to do yet.
> 
> Please do that, sysfs files should not be present when the information
> is not needed from them.  Look at the is_visible() callback for the
> attribute for how to do it.

Okay, thanks for the hint. Will look at that.

> > 
> > Thanks
> > 
> > --
> > Qais Yousef
> > 
> > 
> > ---------->8------------
> > 
> > >From 96dfdfdacb2a26a60ba19051e8c72e839eb5408b Mon Sep 17 00:00:00 2001
> > From: Qais Yousef <qais.yousef@arm.com>
> > Date: Mon, 26 Oct 2020 16:33:32 +0000
> > Subject: [PATCH] arm64: export aarch32_online mask in sysfs
> > 
> > This patch to be applied on top of arm64 Asymmetric AArch32 support.
> > 
> > It explores the option of exporting the AArch32 capable cpus as a mask
> > on sysfs.
> > 
> > This is to help drive the discussion on the API before sending the next
> > version which I yet to address some of the review comments.
> > 
> > The output looks like:
> > 
> > 	# cat /sys/devices/system/cpu/aarch32_online
> > 	0-5
> > 
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >  Documentation/ABI/testing/sysfs-devices-system-cpu |  7 +++++++
> >  arch/arm64/include/asm/cpufeature.h                |  2 ++
> >  arch/arm64/kernel/cpufeature.c                     |  8 ++++++++
> >  drivers/base/cpu.c                                 | 12 ++++++++++++
> >  4 files changed, 29 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > index b555df825447..9ccb5c3f5ee3 100644
> > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > @@ -36,6 +36,13 @@ Description:	CPU topology files that describe kernel limits related to
> >  
> >  		See Documentation/admin-guide/cputopology.rst for more information.
> >  
> > +What:		/sys/devices/system/cpu/aarch32_online
> > +Date:		October 2020
> > +Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
> > +Description:	CPU topology file that describes which cpus support AArch32 at
> > +		EL0. Only available on arm64.
> > +
> > +		The value is updated when a cpu becomes online then sticks.
> 
> What does "then sticks" mean?

Was thinking like a sticky bit.

When a cpu becomes online and we discover that it is aarch32 capable, we set
the bit. But never clear it again if the cpu goes offline later.

I'll reword it.

> 
> 
> >  
> >  What:		/sys/devices/system/cpu/probe
> >  		/sys/devices/system/cpu/release
> > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> > index 2b87f17b2bd4..edd18002ad81 100644
> > --- a/arch/arm64/include/asm/cpufeature.h
> > +++ b/arch/arm64/include/asm/cpufeature.h
> > @@ -380,6 +380,8 @@ cpucap_multi_entry_cap_matches(const struct arm64_cpu_capabilities *entry,
> >  	return false;
> >  }
> >  
> > +extern cpumask_t aarch32_el0_mask;
> > +
> >  extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
> >  extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
> >  extern struct static_key_false arm64_const_caps_ready;
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 0f7307c8ad80..662bbc2b15cd 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1723,6 +1723,13 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap)
> >  	return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT);
> >  }
> >  
> > +cpumask_t aarch32_el0_mask;
> > +static void cpu_enable_aarch32_el0(struct arm64_cpu_capabilities const *cap)
> > +{
> > +	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU))
> > +		cpumask_set_cpu(smp_processor_id(), &aarch32_el0_mask);
> > +}
> > +
> >  static const struct arm64_cpu_capabilities arm64_features[] = {
> >  	{
> >  		.desc = "GIC system register CPU interface",
> > @@ -1809,6 +1816,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> >  	{
> >  		.capability = ARM64_HAS_ASYM_32BIT_EL0,
> >  		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
> > +		.cpu_enable = cpu_enable_aarch32_el0,
> >  		.matches = has_cpuid_feature,
> >  		.sys_reg = SYS_ID_AA64PFR0_EL1,
> >  		.sign = FTR_UNSIGNED,
> > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > index d2136ab9b14a..569baacde508 100644
> > --- a/drivers/base/cpu.c
> > +++ b/drivers/base/cpu.c
> > @@ -459,6 +459,15 @@ EXPORT_SYMBOL_GPL(cpu_device_create);
> >  static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL);
> >  #endif
> >  
> > +#ifdef CONFIG_ARM64
> > +static ssize_t print_aarch32_online(struct device *dev,
> > +				       struct device_attribute *attr, char *buf)
> > +{
> > +	return cpumap_print_to_pagebuf(true, buf, &aarch32_el0_mask);
> > +}
> > +static DEVICE_ATTR(aarch32_online, 0444, print_aarch32_online, NULL);
> 
> DEVICE_ATTR_RO()?

Indeed.

> 
> > +#endif
> 
> Hah, no, no arch-specific stuff in here, sorry.  Please do this properly
> in your arch-specific code only.

Of course. It was just to see this is okay. Let me figure out how to clean this
up.

Thanks!

--
Qais Yousef
diff mbox series

Patch

diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index f28853f80089..bfcbda6d6f35 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -166,7 +166,7 @@  infrastructure:
      +------------------------------+---------+---------+
      | EL1                          | [7-4]   |    n    |
      +------------------------------+---------+---------+
-     | EL0                          | [3-0]   |    n    |
+     | EL0                          | [3-0]   |    y    |
      +------------------------------+---------+---------+
 
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6f795c8221f4..0f7307c8ad80 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -221,7 +221,7 @@  static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL3_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL2_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
-	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
+	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
 	ARM64_FTR_END,
 };
 
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 93c55986ca7f..632b9d5b5230 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -231,25 +231,71 @@  static struct kobj_type cpuregs_kobj_type = {
  * future expansion without an ABI break.
  */
 #define kobj_to_cpuinfo(kobj)	container_of(kobj, struct cpuinfo_arm64, kobj)
-#define CPUREGS_ATTR_RO(_name, _field)						\
+#define CPUREGS_RAW_ATTR_RO(_name, _field)				\
 	static ssize_t _name##_show(struct kobject *kobj,			\
 			struct kobj_attribute *attr, char *buf)			\
 	{									\
 		struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj);		\
 										\
-		if (info->reg_midr)						\
-			return sprintf(buf, "0x%016x\n", info->reg_##_field);	\
-		else								\
+		if (info->reg_midr) {						\
+			u64 val = info->reg_##_field;				\
+			return sprintf(buf, "0x%016llx\n", val);		\
+		} else {							\
 			return 0;						\
+		}								\
 	}									\
 	static struct kobj_attribute cpuregs_attr_##_name = __ATTR_RO(_name)
 
-CPUREGS_ATTR_RO(midr_el1, midr);
-CPUREGS_ATTR_RO(revidr_el1, revidr);
+/*
+ * Expose the Sanitized or RAW user space visible fields of a sys_register
+ * based @cond.
+ *
+ * if (@cond)
+ *	expose raw register & user_mask
+ * else
+ *	expose sanitized register & user_mask
+ *
+ * user_mask is based on arm64_ftr_reg definition.
+ */
+#define CPUREGS_USER_ATTR_RO(_name, _raw_field, _san_id, cond)			\
+	static ssize_t _name##_show(struct kobject *kobj,			\
+			struct kobj_attribute *attr, char *buf)			\
+	{									\
+		u64 val = 0;							\
+										\
+		if (cond) {							\
+			struct arm64_ftr_reg *reg = get_arm64_ftr_reg(_san_id);	\
+			struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj);	\
+										\
+			if (!reg)						\
+				return 0;					\
+										\
+			if (info->reg_midr) {					\
+				val = info->reg_##_raw_field & reg->user_mask;	\
+				val |= reg->user_val;				\
+				return sprintf(buf, "0x%016llx\n", val);	\
+			} else {						\
+				return 0;					\
+			}							\
+		} else {							\
+			val = 0;						\
+			if (!emulate_sys_reg(_san_id, &val)) {			\
+				return sprintf(buf, "0x%016llx\n", val);	\
+			} else {						\
+				return 0;					\
+			}							\
+		}								\
+	}									\
+	static struct kobj_attribute cpuregs_attr_##_name = __ATTR_RO(_name)
+
+CPUREGS_RAW_ATTR_RO(midr_el1, midr);
+CPUREGS_RAW_ATTR_RO(revidr_el1, revidr);
+CPUREGS_USER_ATTR_RO(id_aa64pfr0, id_aa64pfr0, SYS_ID_AA64PFR0_EL1, system_supports_asym_32bit_el0());
 
 static struct attribute *cpuregs_id_attrs[] = {
 	&cpuregs_attr_midr_el1.attr,
 	&cpuregs_attr_revidr_el1.attr,
+	&cpuregs_attr_id_aa64pfr0.attr,
 	NULL
 };