Message ID | 20201021104611.2744565-5-qais.yousef@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for Asymmetric AArch32 systems | expand |
On 2020-10-21 11:46, Qais Yousef wrote: > So that userspace can detect if the cpu has aarch32 support at EL0. > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better > reflect > what it does. And fixed to accept both u64 and u32 without causing the > printf to print out a warning about mismatched type. This was caught > while testing to check the new CPUREGS_USER_ATTR_RO(). > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based > on a @cond to user space. The exported fields match the definition in > arm64_ftr_reg so that the content of a register exported via MRS and > sysfs are kept cohesive. > > The @cond in our case is that the system is asymmetric aarch32 and the > controlling sysctl.enable_asym_32bit is enabled. > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the > newly visible EL0 field in ID_AA64FPR0_EL1. > > Note that the MRS interface will still return the sanitized content > _only_. > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > --- > > Example output. I was surprised that the 2nd field (bits[7:4]) is > printed out > although it's set as FTR_HIDDEN. > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000012 > 0x0000000000000012 > 0x0000000000000011 > 0x0000000000000011 This looks like a terrible userspace interface. It exposes unrelated features, and doesn't expose the single useful information that the kernel has: the cpumask describing the CPUs supporting AArch32 at EL0. Why not expose this synthetic piece of information which requires very little effort from userspace and doesn't spit out unrelated stuff? Not to mention the discrepancy with what userspace gets while reading the same register via the MRS emulation. Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy, but I don't think this fits either. M.
On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > On 2020-10-21 11:46, Qais Yousef wrote: > > So that userspace can detect if the cpu has aarch32 support at EL0. > > > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect > > what it does. And fixed to accept both u64 and u32 without causing the > > printf to print out a warning about mismatched type. This was caught > > while testing to check the new CPUREGS_USER_ATTR_RO(). > > > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based > > on a @cond to user space. The exported fields match the definition in > > arm64_ftr_reg so that the content of a register exported via MRS and > > sysfs are kept cohesive. > > > > The @cond in our case is that the system is asymmetric aarch32 and the > > controlling sysctl.enable_asym_32bit is enabled. > > > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the > > newly visible EL0 field in ID_AA64FPR0_EL1. > > > > Note that the MRS interface will still return the sanitized content > > _only_. > > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > > --- > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > printed out > > although it's set as FTR_HIDDEN. > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000012 > > 0x0000000000000012 > > 0x0000000000000011 > > 0x0000000000000011 > > This looks like a terrible userspace interface. It's also not allowed, sorry. sysfs is "one value per file", which is NOT what is happening at all. This would be easy to see if there was a Documentation/ABI/ update, which is also required, was that here? thanks, greg k-h
On Wed, Oct 21, 2020 at 11:46:11AM +0100, Qais Yousef wrote: > So that userspace can detect if the cpu has aarch32 support at EL0. > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect > what it does. And fixed to accept both u64 and u32 without causing the > printf to print out a warning about mismatched type. This was caught > while testing to check the new CPUREGS_USER_ATTR_RO(). > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based > on a @cond to user space. The exported fields match the definition in > arm64_ftr_reg so that the content of a register exported via MRS and > sysfs are kept cohesive. > > The @cond in our case is that the system is asymmetric aarch32 and the > controlling sysctl.enable_asym_32bit is enabled. > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the > newly visible EL0 field in ID_AA64FPR0_EL1. > > Note that the MRS interface will still return the sanitized content > _only_. > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > --- > > Example output. I was surprised that the 2nd field (bits[7:4]) is printed out > although it's set as FTR_HIDDEN. > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000011 > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > 0x0000000000000011 > 0x0000000000000011 > 0x0000000000000012 > 0x0000000000000012 > 0x0000000000000011 > 0x0000000000000011 > > Documentation/arm64/cpu-feature-registers.rst | 2 +- > arch/arm64/kernel/cpufeature.c | 2 +- > arch/arm64/kernel/cpuinfo.c | 58 +++++++++++++++++-- > 3 files changed, 54 insertions(+), 8 deletions(-) > > diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst > index f28853f80089..bfcbda6d6f35 100644 > --- a/Documentation/arm64/cpu-feature-registers.rst > +++ b/Documentation/arm64/cpu-feature-registers.rst > @@ -166,7 +166,7 @@ infrastructure: > +------------------------------+---------+---------+ > | EL1 | [7-4] | n | > +------------------------------+---------+---------+ > - | EL0 | [3-0] | n | > + | EL0 | [3-0] | y | > +------------------------------+---------+---------+ > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 6f795c8221f4..0f7307c8ad80 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -221,7 +221,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL3_SHIFT, 4, 0), > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL2_SHIFT, 4, 0), > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY), > - ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY), > + ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY), > ARM64_FTR_END, > }; > > diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c > index 93c55986ca7f..632b9d5b5230 100644 > --- a/arch/arm64/kernel/cpuinfo.c > +++ b/arch/arm64/kernel/cpuinfo.c > @@ -231,25 +231,71 @@ static struct kobj_type cpuregs_kobj_type = { > * future expansion without an ABI break. > */ > #define kobj_to_cpuinfo(kobj) container_of(kobj, struct cpuinfo_arm64, kobj) > -#define CPUREGS_ATTR_RO(_name, _field) \ > +#define CPUREGS_RAW_ATTR_RO(_name, _field) \ > static ssize_t _name##_show(struct kobject *kobj, \ > struct kobj_attribute *attr, char *buf) \ > { \ > struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj); \ > \ > - if (info->reg_midr) \ > - return sprintf(buf, "0x%016x\n", info->reg_##_field); \ > - else \ > + if (info->reg_midr) { \ > + u64 val = info->reg_##_field; \ > + return sprintf(buf, "0x%016llx\n", val); \ Nit, for sysfs, use the new sysfs_emit() call instead of sprintf(). thanks, greg k-h
On 2020-10-21 12:25, Greg Kroah-Hartman wrote: > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: >> On 2020-10-21 11:46, Qais Yousef wrote: >> > So that userspace can detect if the cpu has aarch32 support at EL0. >> > >> > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect >> > what it does. And fixed to accept both u64 and u32 without causing the >> > printf to print out a warning about mismatched type. This was caught >> > while testing to check the new CPUREGS_USER_ATTR_RO(). >> > >> > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based >> > on a @cond to user space. The exported fields match the definition in >> > arm64_ftr_reg so that the content of a register exported via MRS and >> > sysfs are kept cohesive. >> > >> > The @cond in our case is that the system is asymmetric aarch32 and the >> > controlling sysctl.enable_asym_32bit is enabled. >> > >> > Update Documentation/arm64/cpu-feature-registers.rst to reflect the >> > newly visible EL0 field in ID_AA64FPR0_EL1. >> > >> > Note that the MRS interface will still return the sanitized content >> > _only_. >> > >> > Signed-off-by: Qais Yousef <qais.yousef@arm.com> >> > --- >> > >> > Example output. I was surprised that the 2nd field (bits[7:4]) is >> > printed out >> > although it's set as FTR_HIDDEN. >> > >> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 >> > 0x0000000000000011 >> > 0x0000000000000011 >> > 0x0000000000000011 >> > 0x0000000000000011 >> > 0x0000000000000011 >> > 0x0000000000000011 >> > >> > # echo 1 > /proc/sys/kernel/enable_asym_32bit >> > >> > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 >> > 0x0000000000000011 >> > 0x0000000000000011 >> > 0x0000000000000012 >> > 0x0000000000000012 >> > 0x0000000000000011 >> > 0x0000000000000011 >> >> This looks like a terrible userspace interface. > > It's also not allowed, sorry. sysfs is "one value per file", which is > NOT what is happening at all. I *think* Qais got that part right, though it is hard to tell without knowing how many CPUs this system has (cpu/cpu* is ambiguous). M.
On Wed, Oct 21, 2020 at 12:46:46PM +0100, Marc Zyngier wrote: > On 2020-10-21 12:25, Greg Kroah-Hartman wrote: > > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > > > On 2020-10-21 11:46, Qais Yousef wrote: > > > > So that userspace can detect if the cpu has aarch32 support at EL0. > > > > > > > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect > > > > what it does. And fixed to accept both u64 and u32 without causing the > > > > printf to print out a warning about mismatched type. This was caught > > > > while testing to check the new CPUREGS_USER_ATTR_RO(). > > > > > > > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based > > > > on a @cond to user space. The exported fields match the definition in > > > > arm64_ftr_reg so that the content of a register exported via MRS and > > > > sysfs are kept cohesive. > > > > > > > > The @cond in our case is that the system is asymmetric aarch32 and the > > > > controlling sysctl.enable_asym_32bit is enabled. > > > > > > > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the > > > > newly visible EL0 field in ID_AA64FPR0_EL1. > > > > > > > > Note that the MRS interface will still return the sanitized content > > > > _only_. > > > > > > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > > > > --- > > > > > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > > > printed out > > > > although it's set as FTR_HIDDEN. > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000012 > > > > 0x0000000000000012 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > > > This looks like a terrible userspace interface. > > > > It's also not allowed, sorry. sysfs is "one value per file", which is > > NOT what is happening at all. > > I *think* Qais got that part right, though it is hard to tell without > knowing how many CPUs this system has (cpu/cpu* is ambiguous). Ah, missed the '*' in the middle of that path, my fault. But without documentation, it's impossible to know... thanks, greg k-h
On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > On 2020-10-21 11:46, Qais Yousef wrote: > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > printed out > > although it's set as FTR_HIDDEN. > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000012 > > 0x0000000000000012 > > 0x0000000000000011 > > 0x0000000000000011 > > This looks like a terrible userspace interface. It exposes unrelated > features, Not sure why the EL1 field ended up in here, that's not relevant to the user. > and doesn't expose the single useful information that the kernel has: > the cpumask describing the CPUs supporting AArch32 at EL0. Why not expose > this synthetic piece of information which requires very little effort from > userspace and doesn't spit out unrelated stuff? I thought the whole idea is to try and avoid the "very little effort" part ;). > Not to mention the discrepancy with what userspace gets while reading > the same register via the MRS emulation. > > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy, > but I don't think this fits either. We already expose MIDR and REVIDR via the current sysfs interface. We can expand it to include _all_ the other ID_* regs currently available to user via the MRS emulation and we won't have to debate what a new interface would look like. The MRS emulation and the sysfs info should probably match, though that means we need to expose the ID_AA64PFR0_EL1.EL0 field which we currently don't. I do agree that an AArch32 cpumask is an easier option both from the kernel implementation perspective and from the application usability one, though not as easy as automatic task placement by the scheduler (my first preference, followed by the id_* regs and the aarch32 mask, though not a strong preference for any).
On 10/21/20 12:46, Marc Zyngier wrote: > On 2020-10-21 12:25, Greg Kroah-Hartman wrote: > > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > > > On 2020-10-21 11:46, Qais Yousef wrote: > > > > So that userspace can detect if the cpu has aarch32 support at EL0. > > > > > > > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect > > > > what it does. And fixed to accept both u64 and u32 without causing the > > > > printf to print out a warning about mismatched type. This was caught > > > > while testing to check the new CPUREGS_USER_ATTR_RO(). > > > > > > > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based > > > > on a @cond to user space. The exported fields match the definition in > > > > arm64_ftr_reg so that the content of a register exported via MRS and > > > > sysfs are kept cohesive. > > > > > > > > The @cond in our case is that the system is asymmetric aarch32 and the > > > > controlling sysctl.enable_asym_32bit is enabled. > > > > > > > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the > > > > newly visible EL0 field in ID_AA64FPR0_EL1. > > > > > > > > Note that the MRS interface will still return the sanitized content > > > > _only_. > > > > > > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > > > > --- > > > > > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > > > printed out > > > > although it's set as FTR_HIDDEN. > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000012 > > > > 0x0000000000000012 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > > > This looks like a terrible userspace interface. > > > > It's also not allowed, sorry. sysfs is "one value per file", which is > > NOT what is happening at all. > > I *think* Qais got that part right, though it is hard to tell without > knowing how many CPUs this system has (cpu/cpu* is ambiguous). Correct. This interface already exists for other registers and reads a single value for each CPU. I just added the ability to read a new register. Though it has a slightly different behavior: reads Sanitized vs RAW info. Thanks -- Qais Yousef
On 10/21/20 13:15, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > > On 2020-10-21 11:46, Qais Yousef wrote: > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > > printed out > > > although it's set as FTR_HIDDEN. ^^^^^^^^^ > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000012 > > > 0x0000000000000012 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > > This looks like a terrible userspace interface. It exposes unrelated > > features, > > Not sure why the EL1 field ended up in here, that's not relevant to the > user. I was surprised by this too. Not sure a bug at my end or a bug in the way arm64_ftr_reg stores reg->user_val and reg->user_mask. FWIW, I did highlight the problem above. It is something to fix indeed. Thanks -- Qais Yousef
On 10/21/20 13:28, Greg Kroah-Hartman wrote: > On Wed, Oct 21, 2020 at 11:46:11AM +0100, Qais Yousef wrote: > > So that userspace can detect if the cpu has aarch32 support at EL0. > > > > CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect > > what it does. And fixed to accept both u64 and u32 without causing the > > printf to print out a warning about mismatched type. This was caught > > while testing to check the new CPUREGS_USER_ATTR_RO(). > > > > The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based > > on a @cond to user space. The exported fields match the definition in > > arm64_ftr_reg so that the content of a register exported via MRS and > > sysfs are kept cohesive. > > > > The @cond in our case is that the system is asymmetric aarch32 and the > > controlling sysctl.enable_asym_32bit is enabled. > > > > Update Documentation/arm64/cpu-feature-registers.rst to reflect the > > newly visible EL0 field in ID_AA64FPR0_EL1. > > > > Note that the MRS interface will still return the sanitized content > > _only_. > > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > > --- > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is printed out > > although it's set as FTR_HIDDEN. > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000011 > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > 0x0000000000000011 > > 0x0000000000000011 > > 0x0000000000000012 > > 0x0000000000000012 > > 0x0000000000000011 > > 0x0000000000000011 > > > > Documentation/arm64/cpu-feature-registers.rst | 2 +- > > arch/arm64/kernel/cpufeature.c | 2 +- > > arch/arm64/kernel/cpuinfo.c | 58 +++++++++++++++++-- > > 3 files changed, 54 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst > > index f28853f80089..bfcbda6d6f35 100644 > > --- a/Documentation/arm64/cpu-feature-registers.rst > > +++ b/Documentation/arm64/cpu-feature-registers.rst > > @@ -166,7 +166,7 @@ infrastructure: > > +------------------------------+---------+---------+ > > | EL1 | [7-4] | n | > > +------------------------------+---------+---------+ > > - | EL0 | [3-0] | n | > > + | EL0 | [3-0] | y | > > +------------------------------+---------+---------+ > > > > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 6f795c8221f4..0f7307c8ad80 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -221,7 +221,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { > > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL3_SHIFT, 4, 0), > > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL2_SHIFT, 4, 0), > > ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY), > > - ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY), > > + ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY), > > ARM64_FTR_END, > > }; > > > > diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c > > index 93c55986ca7f..632b9d5b5230 100644 > > --- a/arch/arm64/kernel/cpuinfo.c > > +++ b/arch/arm64/kernel/cpuinfo.c > > @@ -231,25 +231,71 @@ static struct kobj_type cpuregs_kobj_type = { > > * future expansion without an ABI break. > > */ > > #define kobj_to_cpuinfo(kobj) container_of(kobj, struct cpuinfo_arm64, kobj) > > -#define CPUREGS_ATTR_RO(_name, _field) \ > > +#define CPUREGS_RAW_ATTR_RO(_name, _field) \ > > static ssize_t _name##_show(struct kobject *kobj, \ > > struct kobj_attribute *attr, char *buf) \ > > { \ > > struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj); \ > > \ > > - if (info->reg_midr) \ > > - return sprintf(buf, "0x%016x\n", info->reg_##_field); \ > > - else \ > > + if (info->reg_midr) { \ > > + u64 val = info->reg_##_field; \ > > + return sprintf(buf, "0x%016llx\n", val); \ > > Nit, for sysfs, use the new sysfs_emit() call instead of sprintf(). Will do. Thanks! -- Qais Yousef
On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > one, though not as easy as automatic task placement by the scheduler (my > first preference, followed by the id_* regs and the aarch32 mask, though > not a strong preference for any). Automatic task placement by the scheduler would mean giving up the requirement that the user-space affinity mask must always be honoured. Is that on the table? Killing aarch32 tasks with an empty intersection between the user-space mask and aarch32_mask is not really "automatic" and would require the aarch32 capability to be exposed anyway. Morten
On Wed, Oct 21, 2020 at 03:33:29PM +0200, Morten Rasmussen wrote: > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > one, though not as easy as automatic task placement by the scheduler (my > > first preference, followed by the id_* regs and the aarch32 mask, though > > not a strong preference for any). > > Automatic task placement by the scheduler would mean giving up the > requirement that the user-space affinity mask must always be honoured. > Is that on the table? I think Peter rejected it but I still find it a nicer interface from a dumb application perspective. It may interact badly with cpusets though (at least on Android). > Killing aarch32 tasks with an empty intersection between the > user-space mask and aarch32_mask is not really "automatic" and would > require the aarch32 capability to be exposed anyway. I agree, especially if overriding the user mask is not desirable. But if one doesn't play around with cpusets, 32-bit apps would run "fine" with the scheduler transparently placing them on the correct CPU. Anyway, if the task placement is entirely off the table, the next thing is asking applications to set their own mask and kill them if they do the wrong thing. Here I see two possibilities for killing an app: 1. When it ends up scheduled on a non-AArch32-capable CPU 2. If the user cpumask (bar the offline CPUs) is not a subset of the aarch32_mask Option 1 is simpler but 2 would be slightly more consistent. There's also the question on whether the kernel should allow an ELF32 to be loaded (and potentially killed subsequently) if the user mask is not correct on execve().
On 10/21/20 15:33, Morten Rasmussen wrote: > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > one, though not as easy as automatic task placement by the scheduler (my > > first preference, followed by the id_* regs and the aarch32 mask, though > > not a strong preference for any). > > Automatic task placement by the scheduler would mean giving up the > requirement that the user-space affinity mask must always be honoured. > Is that on the table? > > Killing aarch32 tasks with an empty intersection between the user-space > mask and aarch32_mask is not really "automatic" and would require the > aarch32 capability to be exposed anyway. I just noticed this nasty corner case too. Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9 "If such a task had been bound to some subset of its cpuset using the sched_setaffinity() call, the task will be allowed to run on any CPU allowed in its new cpuset, negating the effect of the prior sched_setaffinity() call." So user space must put the tasks into a valid cpuset to fix the problem. Or make the scheduler behave like the affinity is associated with a cpuset. Can user space put the task into the correct cpuset without a race? Clone3 syscall learnt to specify a cgroup to attach to when forking. Should we do the same for execve()? Thanks -- Qais Yousef
On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > > On 2020-10-21 11:46, Qais Yousef wrote: > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > > printed out > > > although it's set as FTR_HIDDEN. > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > 0x0000000000000012 > > > 0x0000000000000012 > > > 0x0000000000000011 > > > 0x0000000000000011 > > > > This looks like a terrible userspace interface. It exposes unrelated > > features, > > Not sure why the EL1 field ended up in here, that's not relevant to the > user. > > > and doesn't expose the single useful information that the kernel has: > > the cpumask describing the CPUs supporting AArch32 at EL0. Why not expose > > this synthetic piece of information which requires very little effort from > > userspace and doesn't spit out unrelated stuff? > > I thought the whole idea is to try and avoid the "very little effort" > part ;). > > > Not to mention the discrepancy with what userspace gets while reading > > the same register via the MRS emulation. > > > > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy, > > but I don't think this fits either. > > We already expose MIDR and REVIDR via the current sysfs interface. We > can expand it to include _all_ the other ID_* regs currently available > to user via the MRS emulation and we won't have to debate what a new > interface would look like. The MRS emulation and the sysfs info should > probably match, though that means we need to expose the > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > I do agree that an AArch32 cpumask is an easier option both from the > kernel implementation perspective and from the application usability > one, though not as easy as automatic task placement by the scheduler (my > first preference, followed by the id_* regs and the aarch32 mask, though > not a strong preference for any). If a cpumask is easier to implement and easier to use, then I think that's what we should do. It's also then dead easy to disable if necessary by just returning 0. The only alternative I would prefer is not having to expose this information altogether, but I'm not sure that figuring this out from MIDR/REVIDR alone is reliable. Will
On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 03:33:29PM +0200, Morten Rasmussen wrote: > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > > one, though not as easy as automatic task placement by the scheduler (my > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > not a strong preference for any). > > > > Automatic task placement by the scheduler would mean giving up the > > requirement that the user-space affinity mask must always be honoured. > > Is that on the table? > > I think Peter rejected it but I still find it a nicer interface from a > dumb application perspective. It may interact badly with cpusets though > (at least on Android). Agree that it would be nice for supporting legacy applications. Due to the cpuset interaction I think there is fair chance that user-space would want to know the aarch32 cpumask anyway though. > > > Killing aarch32 tasks with an empty intersection between the > > user-space mask and aarch32_mask is not really "automatic" and would > > require the aarch32 capability to be exposed anyway. > > I agree, especially if overriding the user mask is not desirable. But if > one doesn't play around with cpusets, 32-bit apps would run "fine" with > the scheduler transparently placing them on the correct CPU. Ruling out user-space setting affinity is another way of solving the problem ;-) > Anyway, if the task placement is entirely off the table, the next thing > is asking applications to set their own mask and kill them if they do > the wrong thing. Here I see two possibilities for killing an app: > > 1. When it ends up scheduled on a non-AArch32-capable CPU > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > aarch32_mask > > Option 1 is simpler but 2 would be slightly more consistent. I don't have strong preference. More consistent killing is probably nice for debugging purposes. If we go with 2, we would go round and kill all tasks in cpuset (both running and sleeping) if the cpuset mask was changed to not include aarch32 CPUs? > There's also the question on whether the kernel should allow an ELF32 to > be loaded (and potentially killed subsequently) if the user mask is not > correct on execve(). I wonder how many apps that would handle execve() failing? If we allow killing, the simplest solution if there is any doubt seems to be just to kill the task :-) Morten
On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 03:33:29PM +0200, Morten Rasmussen wrote: > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > > one, though not as easy as automatic task placement by the scheduler (my > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > not a strong preference for any). > > > > Automatic task placement by the scheduler would mean giving up the > > requirement that the user-space affinity mask must always be honoured. > > Is that on the table? > > I think Peter rejected it but I still find it a nicer interface from a > dumb application perspective. It may interact badly with cpusets though > (at least on Android). > > > Killing aarch32 tasks with an empty intersection between the > > user-space mask and aarch32_mask is not really "automatic" and would > > require the aarch32 capability to be exposed anyway. > > I agree, especially if overriding the user mask is not desirable. But if > one doesn't play around with cpusets, 32-bit apps would run "fine" with > the scheduler transparently placing them on the correct CPU. > > Anyway, if the task placement is entirely off the table, the next thing > is asking applications to set their own mask and kill them if they do > the wrong thing. Here I see two possibilities for killing an app: > > 1. When it ends up scheduled on a non-AArch32-capable CPU That sounds fine to me. If we could do the exception return and take a SIGILL, that's what we'd do, but we can't so we have to catch it before. > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > aarch32_mask > > Option 1 is simpler but 2 would be slightly more consistent. I disagree -- if we did this for something like fpsimd, then the consistent behaviour would be to SIGILL on the cores without the instructions. > There's also the question on whether the kernel should allow an ELF32 to > be loaded (and potentially killed subsequently) if the user mask is not > correct on execve(). I don't see the point in distinguishing between "you did execve() on a core without 32-bit" and "you did execve() on a core with 32-bit and then migrated to a core without 32-bit". Will
On 10/21/20 15:41, Will Deacon wrote: > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > > > On 2020-10-21 11:46, Qais Yousef wrote: > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > > > printed out > > > > although it's set as FTR_HIDDEN. > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > 0x0000000000000012 > > > > 0x0000000000000012 > > > > 0x0000000000000011 > > > > 0x0000000000000011 > > > > > > This looks like a terrible userspace interface. It exposes unrelated > > > features, > > > > Not sure why the EL1 field ended up in here, that's not relevant to the > > user. > > > > > and doesn't expose the single useful information that the kernel has: > > > the cpumask describing the CPUs supporting AArch32 at EL0. Why not expose > > > this synthetic piece of information which requires very little effort from > > > userspace and doesn't spit out unrelated stuff? > > > > I thought the whole idea is to try and avoid the "very little effort" > > part ;). > > > > > Not to mention the discrepancy with what userspace gets while reading > > > the same register via the MRS emulation. > > > > > > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy, > > > but I don't think this fits either. > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > can expand it to include _all_ the other ID_* regs currently available > > to user via the MRS emulation and we won't have to debate what a new > > interface would look like. The MRS emulation and the sysfs info should > > probably match, though that means we need to expose the > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > I do agree that an AArch32 cpumask is an easier option both from the > > kernel implementation perspective and from the application usability > > one, though not as easy as automatic task placement by the scheduler (my > > first preference, followed by the id_* regs and the aarch32 mask, though > > not a strong preference for any). > > If a cpumask is easier to implement and easier to use, then I think that's > what we should do. It's also then dead easy to disable if necessary by > just returning 0. The only alternative I would prefer is not having to > expose this information altogether, but I'm not sure that figuring this > out from MIDR/REVIDR alone is reliable. I did suggest this before, but I'll try gain. If we want to assume a custom bootloader and custom user space, we can make them provide the mask. For example, the new sysctl_enable_asym_32bit could be a cpumask instead of a bool as it currently is. Or we can make it a cmdline parameter too. In both cases some admin (bootloader or init process) has to ensure to fill it correctly for the target platform. The bootloader should be able to read the registers to figure out the mask. So more weight to make it a cmdline param. Then the rest of user space can easily parse the cpumask from /proc/sys/kernel/enable_asym_32bit or from /proc/cmdline. Thanks -- Qais Yousef
On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote: > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > > Anyway, if the task placement is entirely off the table, the next thing > > is asking applications to set their own mask and kill them if they do > > the wrong thing. Here I see two possibilities for killing an app: > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU > > That sounds fine to me. If we could do the exception return and take a > SIGILL, that's what we'd do, but we can't so we have to catch it before. Indeed, the illegal ERET doesn't work for this scenario. > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > > aarch32_mask > > > > Option 1 is simpler but 2 would be slightly more consistent. > > I disagree -- if we did this for something like fpsimd, then the consistent > behaviour would be to SIGILL on the cores without the instructions. For fpsimd it makes sense since the main ISA is still available and the application may be able to do something with the signal. But here we can't do much since the entire AArch32 mode is not supported. That's why we went for SIGKILL instead of SIGILL but thinking of it, after execve() the signals are reset to SIG_DFL so SIGILL cannot be ignored. I think it depends on whether you look at this fault as a part of ISA not being available or as the overall application not compatible with the system it is running on. If the latter, option 2 above makes more sense. > > There's also the question on whether the kernel should allow an ELF32 to > > be loaded (and potentially killed subsequently) if the user mask is not > > correct on execve(). > > I don't see the point in distinguishing between "you did execve() on a core > without 32-bit" and "you did execve() on a core with 32-bit and then > migrated to a core without 32-bit". In the context of option 2 above, its more about whether execve() returns -ENOEXEC or the process gets a SIGKILL immediately.
On Wed, Oct 21, 2020 at 04:03:13PM +0100, Qais Yousef wrote: > On 10/21/20 15:41, Will Deacon wrote: > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > > On Wed, Oct 21, 2020 at 12:09:58PM +0100, Marc Zyngier wrote: > > > > On 2020-10-21 11:46, Qais Yousef wrote: > > > > > Example output. I was surprised that the 2nd field (bits[7:4]) is > > > > > printed out > > > > > although it's set as FTR_HIDDEN. > > > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > > > > > > # echo 1 > /proc/sys/kernel/enable_asym_32bit > > > > > > > > > > # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > 0x0000000000000012 > > > > > 0x0000000000000012 > > > > > 0x0000000000000011 > > > > > 0x0000000000000011 > > > > > > > > This looks like a terrible userspace interface. It exposes unrelated > > > > features, > > > > > > Not sure why the EL1 field ended up in here, that's not relevant to the > > > user. > > > > > > > and doesn't expose the single useful information that the kernel has: > > > > the cpumask describing the CPUs supporting AArch32 at EL0. Why not expose > > > > this synthetic piece of information which requires very little effort from > > > > userspace and doesn't spit out unrelated stuff? > > > > > > I thought the whole idea is to try and avoid the "very little effort" > > > part ;). > > > > > > > Not to mention the discrepancy with what userspace gets while reading > > > > the same register via the MRS emulation. > > > > > > > > Granted, the cpumask doesn't fit the cpu*/regs/identification hierarchy, > > > > but I don't think this fits either. > > > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > can expand it to include _all_ the other ID_* regs currently available > > > to user via the MRS emulation and we won't have to debate what a new > > > interface would look like. The MRS emulation and the sysfs info should > > > probably match, though that means we need to expose the > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > kernel implementation perspective and from the application usability > > > one, though not as easy as automatic task placement by the scheduler (my > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > not a strong preference for any). > > > > If a cpumask is easier to implement and easier to use, then I think that's > > what we should do. It's also then dead easy to disable if necessary by > > just returning 0. The only alternative I would prefer is not having to > > expose this information altogether, but I'm not sure that figuring this > > out from MIDR/REVIDR alone is reliable. > > I did suggest this before, but I'll try gain. If we want to assume a custom > bootloader and custom user space, we can make them provide the mask. Who mentioned a custom bootloader? In the context of Android, we're talking about a user-space that already manages scheduling affinity. > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > a bool as it currently is. Or we can make it a cmdline parameter too. > In both cases some admin (bootloader or init process) has to ensure to fill it > correctly for the target platform. The bootloader should be able to read the > registers to figure out the mask. So more weight to make it a cmdline param. I think this is adding complexity for the sake of it. I'm much more in favour of keeping the implementation and ABI as simple as possible: expose the fact that the system is heterogenous, have an opt-in for userspace to say it can handle that and let it handle it. Will
On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote: > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > > > Anyway, if the task placement is entirely off the table, the next thing > > > is asking applications to set their own mask and kill them if they do > > > the wrong thing. Here I see two possibilities for killing an app: > > > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU > > > > That sounds fine to me. If we could do the exception return and take a > > SIGILL, that's what we'd do, but we can't so we have to catch it before. > > Indeed, the illegal ERET doesn't work for this scenario. > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > > > aarch32_mask > > > > > > Option 1 is simpler but 2 would be slightly more consistent. > > > > I disagree -- if we did this for something like fpsimd, then the consistent > > behaviour would be to SIGILL on the cores without the instructions. > > For fpsimd it makes sense since the main ISA is still available and the > application may be able to do something with the signal. But here we > can't do much since the entire AArch32 mode is not supported. That's why > we went for SIGKILL instead of SIGILL but thinking of it, after execve() > the signals are reset to SIG_DFL so SIGILL cannot be ignored. > > I think it depends on whether you look at this fault as a part of ISA > not being available or as the overall application not compatible with > the system it is running on. If the latter, option 2 above makes more > sense. Hmm, I'm not sure I see the distinction in practice: you still have a binary application that cannot run on all CPUs in the system. Who cares if some of the instructions work? > > > There's also the question on whether the kernel should allow an ELF32 to > > > be loaded (and potentially killed subsequently) if the user mask is not > > > correct on execve(). > > > > I don't see the point in distinguishing between "you did execve() on a core > > without 32-bit" and "you did execve() on a core with 32-bit and then > > migrated to a core without 32-bit". > > In the context of option 2 above, its more about whether execve() > returns -ENOEXEC or the process gets a SIGKILL immediately. I just don't see what we gain by returning -ENOEXEC except for extra code and behaviour in the ABI (and if you wanted consistentcy you'd also need to fail attempts to widen the affinity mask to include 64-bit-only cores from a 32-bit task). In other words, I don't think the kernel needs to hold userspace's hand for an opt-in feature that requires userspace to handle scheduling for optimal power/performance _anyway_. Allowing the affinity to be set arbitrarily and then killing the task if it ends up trying to run on the wrong CPU is both simple and sufficient. Will
On 10/21/20 16:23, Will Deacon wrote: > > > If a cpumask is easier to implement and easier to use, then I think that's > > > what we should do. It's also then dead easy to disable if necessary by > > > just returning 0. The only alternative I would prefer is not having to > > > expose this information altogether, but I'm not sure that figuring this > > > out from MIDR/REVIDR alone is reliable. > > > > I did suggest this before, but I'll try gain. If we want to assume a custom > > bootloader and custom user space, we can make them provide the mask. > > Who mentioned a custom bootloader? In the context of Android, we're Custom bootloader as in a bootloader that needs to opt-in to enable the feature (pass the right cmdline param). Catalin suggested to make this a sysctl to allow also for runtime toggling. But the initial intention was to have this to enable it at cmdline. > talking about a user-space that already manages scheduling affinity. > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > > a bool as it currently is. Or we can make it a cmdline parameter too. > > In both cases some admin (bootloader or init process) has to ensure to fill it > > correctly for the target platform. The bootloader should be able to read the > > registers to figure out the mask. So more weight to make it a cmdline param. > > I think this is adding complexity for the sake of it. I'm much more in I actually think it reduces complexity. No special ABI to generate the mask from the kernel. The same opt-in flag is the cpumask too. > favour of keeping the implementation and ABI as simple as possible: expose > the fact that the system is heterogenous, have an opt-in for userspace to > say it can handle that and let it handle it. It still has to do that. It's just the origin of the cpumask will change. So it really depends how you view the opt-in. Ie: it needs to be discovered vs the user space knows it exists and it just wants to enable it. So far we've been going in the latter direction AFAICT. My current implementation is terrible for discovery. I don't feel strongly about it anyway. Just an idea. I can understand the lack of appeal. Not sure if there's a none ugly solution yet :-) Thanks -- Qais Yousef
On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote: > On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote: > > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote: > > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > > > > Anyway, if the task placement is entirely off the table, the next thing > > > > is asking applications to set their own mask and kill them if they do > > > > the wrong thing. Here I see two possibilities for killing an app: > > > > > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU > > > > > > That sounds fine to me. If we could do the exception return and take a > > > SIGILL, that's what we'd do, but we can't so we have to catch it before. > > > > Indeed, the illegal ERET doesn't work for this scenario. > > > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > > > > aarch32_mask > > > > > > > > Option 1 is simpler but 2 would be slightly more consistent. > > > > > > I disagree -- if we did this for something like fpsimd, then the consistent > > > behaviour would be to SIGILL on the cores without the instructions. > > > > For fpsimd it makes sense since the main ISA is still available and the > > application may be able to do something with the signal. But here we > > can't do much since the entire AArch32 mode is not supported. That's why > > we went for SIGKILL instead of SIGILL but thinking of it, after execve() > > the signals are reset to SIG_DFL so SIGILL cannot be ignored. > > > > I think it depends on whether you look at this fault as a part of ISA > > not being available or as the overall application not compatible with > > the system it is running on. If the latter, option 2 above makes more > > sense. > > Hmm, I'm not sure I see the distinction in practice: you still have a binary > application that cannot run on all CPUs in the system. Who cares if some of > the instructions work? The failure would be more predictable rather than the app running for a while and randomly getting SIGKILL. If it only fails on execve or sched_setaffinity, it may be easier to track down (well, there's the CPU hotplug as well that can change the cpumask intersection outside the user process control). > > > > There's also the question on whether the kernel should allow an ELF32 to > > > > be loaded (and potentially killed subsequently) if the user mask is not > > > > correct on execve(). > > > > > > I don't see the point in distinguishing between "you did execve() on a core > > > without 32-bit" and "you did execve() on a core with 32-bit and then > > > migrated to a core without 32-bit". > > > > In the context of option 2 above, its more about whether execve() > > returns -ENOEXEC or the process gets a SIGKILL immediately. > > I just don't see what we gain by returning -ENOEXEC except for extra code > and behaviour in the ABI (and if you wanted consistentcy you'd also need > to fail attempts to widen the affinity mask to include 64-bit-only cores > from a 32-bit task). The -ENOEXEC is more in line with the current behaviour not allowing ELF32 on systems that are not fully symmetric. So basically you'd have a global opt-in as sysctl and a per-application opt-in via the affinity mask. I do agree that it complicates the kernel implementation. > In other words, I don't think the kernel needs to hold userspace's hand > for an opt-in feature that requires userspace to handle scheduling for > optimal power/performance _anyway_. Allowing the affinity to be set > arbitrarily and then killing the task if it ends up trying to run on the > wrong CPU is both simple and sufficient. Fine by me if you want to keep things simple, less code to maintain. However, it would be good to know if the Android kernel/user guys are happy with this approach. If the Android kernel ends up carrying additional patches for task placement, I'd question why we need to merge this (partial) series at all.
On Wed, Oct 21, 2020 at 05:18:37PM +0100, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote: > > On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote: > > > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote: > > > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > > > > > Anyway, if the task placement is entirely off the table, the next thing > > > > > is asking applications to set their own mask and kill them if they do > > > > > the wrong thing. Here I see two possibilities for killing an app: > > > > > > > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU > > > > > > > > That sounds fine to me. If we could do the exception return and take a > > > > SIGILL, that's what we'd do, but we can't so we have to catch it before. > > > > > > Indeed, the illegal ERET doesn't work for this scenario. > > > > > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > > > > > aarch32_mask > > > > > > > > > > Option 1 is simpler but 2 would be slightly more consistent. > > > > > > > > I disagree -- if we did this for something like fpsimd, then the consistent > > > > behaviour would be to SIGILL on the cores without the instructions. > > > > > > For fpsimd it makes sense since the main ISA is still available and the > > > application may be able to do something with the signal. But here we > > > can't do much since the entire AArch32 mode is not supported. That's why > > > we went for SIGKILL instead of SIGILL but thinking of it, after execve() > > > the signals are reset to SIG_DFL so SIGILL cannot be ignored. > > > > > > I think it depends on whether you look at this fault as a part of ISA > > > not being available or as the overall application not compatible with > > > the system it is running on. If the latter, option 2 above makes more > > > sense. > > > > Hmm, I'm not sure I see the distinction in practice: you still have a binary > > application that cannot run on all CPUs in the system. Who cares if some of > > the instructions work? > > The failure would be more predictable rather than the app running for a > while and randomly getting SIGKILL. If it only fails on execve or > sched_setaffinity, it may be easier to track down (well, there's the CPU > hotplug as well that can change the cpumask intersection outside the > user process control). But it's half-baked, because the moment the 32-bit task changes its affinity mask then you're back in the old situation. That's why I'm saying this doesn't add anything, because the rest of the series is designed entirely around delivering SIGKILL at the last minute rather than preventing us getting to that situation in the first place. The execve() case feels to me like we're considering doing something because we can, rather than because it's actually useful. > > > > > There's also the question on whether the kernel should allow an ELF32 to > > > > > be loaded (and potentially killed subsequently) if the user mask is not > > > > > correct on execve(). > > > > > > > > I don't see the point in distinguishing between "you did execve() on a core > > > > without 32-bit" and "you did execve() on a core with 32-bit and then > > > > migrated to a core without 32-bit". > > > > > > In the context of option 2 above, its more about whether execve() > > > returns -ENOEXEC or the process gets a SIGKILL immediately. > > > > I just don't see what we gain by returning -ENOEXEC except for extra code > > and behaviour in the ABI (and if you wanted consistentcy you'd also need > > to fail attempts to widen the affinity mask to include 64-bit-only cores > > from a 32-bit task). > > The -ENOEXEC is more in line with the current behaviour not allowing > ELF32 on systems that are not fully symmetric. So basically you'd have > a global opt-in as sysctl and a per-application opt-in via the affinity > mask. I think it's a bit strong calling that an opt-in, as the application could just happen to be using the right cpumask. > I do agree that it complicates the kernel implementation. > > > In other words, I don't think the kernel needs to hold userspace's hand > > for an opt-in feature that requires userspace to handle scheduling for > > optimal power/performance _anyway_. Allowing the affinity to be set > > arbitrarily and then killing the task if it ends up trying to run on the > > wrong CPU is both simple and sufficient. > > Fine by me if you want to keep things simple, less code to maintain. > > However, it would be good to know if the Android kernel/user guys are > happy with this approach. If the Android kernel ends up carrying > additional patches for task placement, I'd question why we need to merge > this (partial) series at all. Hmm, those folks aren't even on CC :( Adding Suren and Marco... Will
On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote: > On 10/21/20 16:23, Will Deacon wrote: > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > > what we should do. It's also then dead easy to disable if necessary by > > > > just returning 0. The only alternative I would prefer is not having to > > > > expose this information altogether, but I'm not sure that figuring this > > > > out from MIDR/REVIDR alone is reliable. > > > > > > I did suggest this before, but I'll try gain. If we want to assume a custom > > > bootloader and custom user space, we can make them provide the mask. > > > > Who mentioned a custom bootloader? In the context of Android, we're > > Custom bootloader as in a bootloader that needs to opt-in to enable the > feature (pass the right cmdline param). Catalin suggested to make this a sysctl > to allow also for runtime toggling. But the initial intention was to have this > to enable it at cmdline. Hmm, ok, I don't think allowing the cmdline to be specified means its a custom bootloader. > > talking about a user-space that already manages scheduling affinity. > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > > > a bool as it currently is. Or we can make it a cmdline parameter too. > > > In both cases some admin (bootloader or init process) has to ensure to fill it > > > correctly for the target platform. The bootloader should be able to read the > > > registers to figure out the mask. So more weight to make it a cmdline param. > > > > I think this is adding complexity for the sake of it. I'm much more in > > I actually think it reduces complexity. No special ABI to generate the mask > from the kernel. The same opt-in flag is the cpumask too. Maybe I'm misunderstanding your proposal but having a cpumask instead of a bool means you now have to consider policy on a per-cpu basis, which adds an extra dimension to this. For example, do you allow that mask to be changed at runtime so that differents sets of CPUs now support 32-bit? Do you preserve it across hotplug? Will
On 10/21/20 18:23, Will Deacon wrote: > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote: > > On 10/21/20 16:23, Will Deacon wrote: > > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > > > what we should do. It's also then dead easy to disable if necessary by > > > > > just returning 0. The only alternative I would prefer is not having to > > > > > expose this information altogether, but I'm not sure that figuring this > > > > > out from MIDR/REVIDR alone is reliable. > > > > > > > > I did suggest this before, but I'll try gain. If we want to assume a custom > > > > bootloader and custom user space, we can make them provide the mask. > > > > > > Who mentioned a custom bootloader? In the context of Android, we're > > > > Custom bootloader as in a bootloader that needs to opt-in to enable the > > feature (pass the right cmdline param). Catalin suggested to make this a sysctl > > to allow also for runtime toggling. But the initial intention was to have this > > to enable it at cmdline. > > Hmm, ok, I don't think allowing the cmdline to be specified means its a > custom bootloader. True it could be just added to chosen property in the DT file without any bootloader changes. Bad usage of English probably. I just meant the bootloader might need to be made aware of the opt-in process too. So it can potentially co-operate more. > > > talking about a user-space that already manages scheduling affinity. > > > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > > > > a bool as it currently is. Or we can make it a cmdline parameter too. > > > > In both cases some admin (bootloader or init process) has to ensure to fill it > > > > correctly for the target platform. The bootloader should be able to read the > > > > registers to figure out the mask. So more weight to make it a cmdline param. > > > > > > I think this is adding complexity for the sake of it. I'm much more in > > > > I actually think it reduces complexity. No special ABI to generate the mask > > from the kernel. The same opt-in flag is the cpumask too. > > Maybe I'm misunderstanding your proposal but having a cpumask instead of What I meant is that if we change the requirement to opt-in from a boolean switch sysctl.enable_32bit_asym=1 to require the bootloader/init scripts provide the mask of aarch32 capable cpus sysctl.asym_32bit_cpus=0xf0 This will achieve multiple things at the same time: * Defer cpus specification to platform designers who want to enable this feature on their platform. * We don't need a separate API to export which cpus are 32bit capable. They can read it directly from /proc/sys/kernel/asym_32bit_cpus. When it's 0 it means the system is not asymmetric. * If/when we want to disable this support in the future. The sysctl handler will just have to return 0 all the time and ignore all writes. So it's changing the way user space opts-in. The kernel will still treat it as a boolean, and probably sanity check the cpus to make sure they match what it sees. > a bool means you now have to consider policy on a per-cpu basis, which > adds an extra dimension to this. For example, do you allow that mask to > be changed at runtime so that differents sets of CPUs now support 32-bit? > Do you preserve it across hotplug? I can see how cpumask word was confusing. I hope the above is clearer. We don't change how the kernel works, but rather what the user space have to do to opt-in. Hopefully hitting 2 birds with one stone :-) Thanks! -- Qais Yousef
On Wed, Oct 21, 2020 at 08:57:36PM +0100, Qais Yousef wrote: > On 10/21/20 18:23, Will Deacon wrote: > > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote: > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > > > > > a bool as it currently is. Or we can make it a cmdline parameter too. > > > > > In both cases some admin (bootloader or init process) has to ensure to fill it > > > > > correctly for the target platform. The bootloader should be able to read the > > > > > registers to figure out the mask. So more weight to make it a cmdline param. > > > > > > > > I think this is adding complexity for the sake of it. I'm much more in > > > > > > I actually think it reduces complexity. No special ABI to generate the mask > > > from the kernel. The same opt-in flag is the cpumask too. > > > > Maybe I'm misunderstanding your proposal but having a cpumask instead of > > What I meant is that if we change the requirement to opt-in from a boolean > switch > > sysctl.enable_32bit_asym=1 > > to require the bootloader/init scripts provide the mask of aarch32 capable cpus > > sysctl.asym_32bit_cpus=0xf0 > > This will achieve multiple things at the same time: > > * Defer cpus specification to platform designers who want to > enable this feature on their platform. What do you mean by this? The kernel will obviously have to check that the mask is correct (i.e. it doesn't contain 64-bit only CPUs), at which point it's a boolean in disguise. > * We don't need a separate API to export which cpus are 32bit capable. > They can read it directly from /proc/sys/kernel/asym_32bit_cpus. > When it's 0 it means the system is not asymmetric. I don't see how this is better than a separate cpumask for this purpose. It feels like you're trying to overload the control and the identification, but that just makes things confusing and hard to use as I now need to know which logical CPUs correspond to which physical CPUs in order to set the command-line. > * If/when we want to disable this support in the future. The sysctl > handler will just have to return 0 all the time and ignore all > writes. It can do that as a boolean too, right? Will
On Wed, Oct 21, 2020 at 09:26:27PM +0100, Will Deacon wrote: > On Wed, Oct 21, 2020 at 08:57:36PM +0100, Qais Yousef wrote: > > On 10/21/20 18:23, Will Deacon wrote: > > > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote: > > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > > > > > > a bool as it currently is. Or we can make it a cmdline parameter too. > > > > > > In both cases some admin (bootloader or init process) has to ensure to fill it > > > > > > correctly for the target platform. The bootloader should be able to read the > > > > > > registers to figure out the mask. So more weight to make it a cmdline param. > > > > > > > > > > I think this is adding complexity for the sake of it. I'm much more in > > > > > > > > I actually think it reduces complexity. No special ABI to generate the mask > > > > from the kernel. The same opt-in flag is the cpumask too. > > > > > > Maybe I'm misunderstanding your proposal but having a cpumask instead of > > > > What I meant is that if we change the requirement to opt-in from a boolean > > switch > > > > sysctl.enable_32bit_asym=1 > > > > to require the bootloader/init scripts provide the mask of aarch32 capable cpus > > > > sysctl.asym_32bit_cpus=0xf0 [...] > > * We don't need a separate API to export which cpus are 32bit capable. > > They can read it directly from /proc/sys/kernel/asym_32bit_cpus. > > When it's 0 it means the system is not asymmetric. > > I don't see how this is better than a separate cpumask for this purpose. > It feels like you're trying to overload the control and the identification, > but that just makes things confusing and hard to use as I now need to know > which logical CPUs correspond to which physical CPUs in order to set the > command-line. I agree. Let's leave the identification to the kernel as it has access to the CPUID registers and can provide the cpumask. The control in this case doesn't need to be more than a boolean and its meaning is that the user knows what it is doing.
On Wed, Oct 21, 2020 at 06:19:48PM +0100, Will Deacon wrote: > On Wed, Oct 21, 2020 at 05:18:37PM +0100, Catalin Marinas wrote: > > On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote: > > > On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote: > > > > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote: > > > > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote: > > > > > > Anyway, if the task placement is entirely off the table, the next thing > > > > > > is asking applications to set their own mask and kill them if they do > > > > > > the wrong thing. Here I see two possibilities for killing an app: > > > > > > > > > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU > > > > > > > > > > That sounds fine to me. If we could do the exception return and take a > > > > > SIGILL, that's what we'd do, but we can't so we have to catch it before. > > > > > > > > Indeed, the illegal ERET doesn't work for this scenario. > > > > > > > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the > > > > > > aarch32_mask > > > > > > > > > > > > Option 1 is simpler but 2 would be slightly more consistent. > > > > > > > > > > I disagree -- if we did this for something like fpsimd, then the consistent > > > > > behaviour would be to SIGILL on the cores without the instructions. > > > > > > > > For fpsimd it makes sense since the main ISA is still available and the > > > > application may be able to do something with the signal. But here we > > > > can't do much since the entire AArch32 mode is not supported. That's why > > > > we went for SIGKILL instead of SIGILL but thinking of it, after execve() > > > > the signals are reset to SIG_DFL so SIGILL cannot be ignored. > > > > > > > > I think it depends on whether you look at this fault as a part of ISA > > > > not being available or as the overall application not compatible with > > > > the system it is running on. If the latter, option 2 above makes more > > > > sense. > > > > > > Hmm, I'm not sure I see the distinction in practice: you still have a binary > > > application that cannot run on all CPUs in the system. Who cares if some of > > > the instructions work? > > > > The failure would be more predictable rather than the app running for a > > while and randomly getting SIGKILL. If it only fails on execve or > > sched_setaffinity, it may be easier to track down (well, there's the CPU > > hotplug as well that can change the cpumask intersection outside the > > user process control). Migration between cpusets is another failure scenario where the app can get SIGKILL randomly. > But it's half-baked, because the moment the 32-bit task changes its affinity > mask then you're back in the old situation. That's why I'm saying this > doesn't add anything, because the rest of the series is designed entirely > around delivering SIGKILL at the last minute rather than preventing us > getting to that situation in the first place. The execve() case feels to me > like we're considering doing something because we can, rather than because > it's actually useful. Agree.
On 10/22/20 09:16, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 09:26:27PM +0100, Will Deacon wrote: > > On Wed, Oct 21, 2020 at 08:57:36PM +0100, Qais Yousef wrote: > > > On 10/21/20 18:23, Will Deacon wrote: > > > > On Wed, Oct 21, 2020 at 05:07:30PM +0100, Qais Yousef wrote: > > > > > > > For example, the new sysctl_enable_asym_32bit could be a cpumask instead of > > > > > > > a bool as it currently is. Or we can make it a cmdline parameter too. > > > > > > > In both cases some admin (bootloader or init process) has to ensure to fill it > > > > > > > correctly for the target platform. The bootloader should be able to read the > > > > > > > registers to figure out the mask. So more weight to make it a cmdline param. > > > > > > > > > > > > I think this is adding complexity for the sake of it. I'm much more in > > > > > > > > > > I actually think it reduces complexity. No special ABI to generate the mask > > > > > from the kernel. The same opt-in flag is the cpumask too. > > > > > > > > Maybe I'm misunderstanding your proposal but having a cpumask instead of > > > > > > What I meant is that if we change the requirement to opt-in from a boolean > > > switch > > > > > > sysctl.enable_32bit_asym=1 > > > > > > to require the bootloader/init scripts provide the mask of aarch32 capable cpus > > > > > > sysctl.asym_32bit_cpus=0xf0 > [...] > > > * We don't need a separate API to export which cpus are 32bit capable. > > > They can read it directly from /proc/sys/kernel/asym_32bit_cpus. > > > When it's 0 it means the system is not asymmetric. > > > > I don't see how this is better than a separate cpumask for this purpose. > > It feels like you're trying to overload the control and the identification, > > but that just makes things confusing and hard to use as I now need to know > > which logical CPUs correspond to which physical CPUs in order to set the > > command-line. I tend to disagree with some of the statements. But I'll leave it at that. Whatever makes the ship move :) > I agree. Let's leave the identification to the kernel as it has access > to the CPUID registers and can provide the cpumask. The control in this > case doesn't need to be more than a boolean and its meaning is that the > user knows what it is doing. Thanks -- Qais Yousef
On Wed, Oct 21, 2020 at 03:31:53PM +0100, Qais Yousef wrote: > On 10/21/20 15:33, Morten Rasmussen wrote: > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > > one, though not as easy as automatic task placement by the scheduler (my > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > not a strong preference for any). > > > > Automatic task placement by the scheduler would mean giving up the > > requirement that the user-space affinity mask must always be honoured. > > Is that on the table? > > > > Killing aarch32 tasks with an empty intersection between the user-space > > mask and aarch32_mask is not really "automatic" and would require the > > aarch32 capability to be exposed anyway. > > I just noticed this nasty corner case too. > > > Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9 > > "If such a task had been bound to some subset of its cpuset using the > sched_setaffinity() call, the task will be allowed to run on any CPU allowed in > its new cpuset, negating the effect of the prior sched_setaffinity() call." > > So user space must put the tasks into a valid cpuset to fix the problem. Or > make the scheduler behave like the affinity is associated with a cpuset. > > Can user space put the task into the correct cpuset without a race? Clone3 > syscall learnt to specify a cgroup to attach to when forking. Should we do the > same for execve()? Putting a task in a cpuset overrides any affinity mask applied by a previous cpuset or sched_setaffinity() call. I wouldn't call it a corner case though. Android user-space is exploiting it all the time on some devices through the foreground, background, and top-app cgroups. If a tasks fork() the child task will belong to the same cgroup automatically. If you execve() you retain the previous affinity mask and cgroup. So putting parent task about to execve() into aarch32 into a cpuset with only aarch32 CPUs should be enough to never have the task or any of its child tasks SIGKILLED. A few simple experiments with fork() and execve() seems to confirm that. I don't see any changes needed in the kernel. Changing cgroup through clone could of course fail if user-space specifies an unsuitable cgroup. Fixing that would be part of fixing the cpuset setup in user-space which is required anyway. Morten
On 10/22/20 12:16, Morten Rasmussen wrote: > On Wed, Oct 21, 2020 at 03:31:53PM +0100, Qais Yousef wrote: > > On 10/21/20 15:33, Morten Rasmussen wrote: > > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > not a strong preference for any). > > > > > > Automatic task placement by the scheduler would mean giving up the > > > requirement that the user-space affinity mask must always be honoured. > > > Is that on the table? > > > > > > Killing aarch32 tasks with an empty intersection between the user-space > > > mask and aarch32_mask is not really "automatic" and would require the > > > aarch32 capability to be exposed anyway. > > > > I just noticed this nasty corner case too. > > > > > > Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9 > > > > "If such a task had been bound to some subset of its cpuset using the > > sched_setaffinity() call, the task will be allowed to run on any CPU allowed in > > its new cpuset, negating the effect of the prior sched_setaffinity() call." > > > > So user space must put the tasks into a valid cpuset to fix the problem. Or > > make the scheduler behave like the affinity is associated with a cpuset. > > > > Can user space put the task into the correct cpuset without a race? Clone3 > > syscall learnt to specify a cgroup to attach to when forking. Should we do the > > same for execve()? > > Putting a task in a cpuset overrides any affinity mask applied by a > previous cpuset or sched_setaffinity() call. I wouldn't call it a corner > case though. Android user-space is exploiting it all the time on some > devices through the foreground, background, and top-app cgroups. Yep. What I was referring to is that if we go the kernel fixing affinity automatically route, that cpuset behavior will be problematic. In this case fixing the affinity at the task level will not be enough because cpusets will override it. So catering for that is another complication to take into account. > If a tasks fork() the child task will belong to the same cgroup > automatically. If you execve() you retain the previous affinity mask and > cgroup. So putting parent task about to execve() into aarch32 into a > cpuset with only aarch32 CPUs should be enough to never have the task or > any of its child tasks SIGKILLED. > > A few simple experiments with fork() and execve() seems to confirm that. +1 This made me wonder what happens when SCHED_RESET_ON_FORK is set. It only resets policty and priority. So we're good. > I don't see any changes needed in the kernel. Changing cgroup through > clone could of course fail if user-space specifies an unsuitable cgroup. > Fixing that would be part of fixing the cpuset setup in user-space which > is required anyway. +1 Thanks -- Qais Yousef
On 10/21/20 15:41, Will Deacon wrote: > > We already expose MIDR and REVIDR via the current sysfs interface. We > > can expand it to include _all_ the other ID_* regs currently available > > to user via the MRS emulation and we won't have to debate what a new > > interface would look like. The MRS emulation and the sysfs info should > > probably match, though that means we need to expose the > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > I do agree that an AArch32 cpumask is an easier option both from the > > kernel implementation perspective and from the application usability > > one, though not as easy as automatic task placement by the scheduler (my > > first preference, followed by the id_* regs and the aarch32 mask, though > > not a strong preference for any). > > If a cpumask is easier to implement and easier to use, then I think that's > what we should do. It's also then dead easy to disable if necessary by > just returning 0. The only alternative I would prefer is not having to > expose this information altogether, but I'm not sure that figuring this > out from MIDR/REVIDR alone is reliable. So the mask idea is about adding a new /sys/devices/system/cpu/aarch32_cpus ? I just need to make sure that Peter and Greg are happy with this arm64 specific mask added to sysfs in this manner. Not sure if there's a precedent of archs exporting special masks in sysfs. Or maybe people had something else in mind about how his this mask should be exported? Thanks -- Qais Yousef
On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote: > On 10/21/20 15:41, Will Deacon wrote: > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > can expand it to include _all_ the other ID_* regs currently available > > > to user via the MRS emulation and we won't have to debate what a new > > > interface would look like. The MRS emulation and the sysfs info should > > > probably match, though that means we need to expose the > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > kernel implementation perspective and from the application usability > > > one, though not as easy as automatic task placement by the scheduler (my > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > not a strong preference for any). > > > > If a cpumask is easier to implement and easier to use, then I think that's > > what we should do. It's also then dead easy to disable if necessary by > > just returning 0. The only alternative I would prefer is not having to > > expose this information altogether, but I'm not sure that figuring this > > out from MIDR/REVIDR alone is reliable. > > So the mask idea is about adding a new > > /sys/devices/system/cpu/aarch32_cpus > > ? Is this a file, a directory, or what? What's the contents? Without any of that, I have no idea if it's "ok" or not... thanks, greg k-h
On Thu, Oct 22, 2020 at 03:55:59PM +0200, Greg Kroah-Hartman wrote: > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote: > > On 10/21/20 15:41, Will Deacon wrote: > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > > can expand it to include _all_ the other ID_* regs currently available > > > > to user via the MRS emulation and we won't have to debate what a new > > > > interface would look like. The MRS emulation and the sysfs info should > > > > probably match, though that means we need to expose the > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > > kernel implementation perspective and from the application usability > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > not a strong preference for any). > > > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > what we should do. It's also then dead easy to disable if necessary by > > > just returning 0. The only alternative I would prefer is not having to > > > expose this information altogether, but I'm not sure that figuring this > > > out from MIDR/REVIDR alone is reliable. > > > > So the mask idea is about adding a new > > > > /sys/devices/system/cpu/aarch32_cpus > > > > ? > > Is this a file, a directory, or what? What's the contents? > > Without any of that, I have no idea if it's "ok" or not... I don't know what Qais thinks but a file matching the syntax of the present/online/offline/possible files (e.g. 0-3) would look more consistent.
On 10/22/20 15:31, Catalin Marinas wrote: > On Thu, Oct 22, 2020 at 03:55:59PM +0200, Greg Kroah-Hartman wrote: > > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote: > > > On 10/21/20 15:41, Will Deacon wrote: > > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > > > can expand it to include _all_ the other ID_* regs currently available > > > > > to user via the MRS emulation and we won't have to debate what a new > > > > > interface would look like. The MRS emulation and the sysfs info should > > > > > probably match, though that means we need to expose the > > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > > > kernel implementation perspective and from the application usability > > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > > not a strong preference for any). > > > > > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > > what we should do. It's also then dead easy to disable if necessary by > > > > just returning 0. The only alternative I would prefer is not having to > > > > expose this information altogether, but I'm not sure that figuring this > > > > out from MIDR/REVIDR alone is reliable. > > > > > > So the mask idea is about adding a new > > > > > > /sys/devices/system/cpu/aarch32_cpus > > > > > > ? > > > > Is this a file, a directory, or what? What's the contents? > > > > Without any of that, I have no idea if it's "ok" or not... > > I don't know what Qais thinks but a file matching the syntax of the > present/online/offline/possible files (e.g. 0-3) would look more > consistent. Yes. Sorry I thought it's obvious. Thanks -- Qais Yousef
On 10/22/20 15:55, Greg Kroah-Hartman wrote: > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote: > > On 10/21/20 15:41, Will Deacon wrote: > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > > can expand it to include _all_ the other ID_* regs currently available > > > > to user via the MRS emulation and we won't have to debate what a new > > > > interface would look like. The MRS emulation and the sysfs info should > > > > probably match, though that means we need to expose the > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > > kernel implementation perspective and from the application usability > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > not a strong preference for any). > > > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > what we should do. It's also then dead easy to disable if necessary by > > > just returning 0. The only alternative I would prefer is not having to > > > expose this information altogether, but I'm not sure that figuring this > > > out from MIDR/REVIDR alone is reliable. > > > > So the mask idea is about adding a new > > > > /sys/devices/system/cpu/aarch32_cpus > > > > ? > > Is this a file, a directory, or what? What's the contents? > > Without any of that, I have no idea if it's "ok" or not... Hopefully the below patch explains better. Note that I added the new attribute to driver/base/cpu.c, but assuming we will still want to go down this route, we will need a generic way for archs to add their attributes to /sys/devices/system/cpu/. Something like having a special define for archs to append their own attributes list #define SYSFS_SYSTEM_CPU_ARCH_ATTRIBUTES Or probably there's a way to add this file (attribute) dynamically from arch code that I just didn't figure out how to do yet. Thanks -- Qais Yousef ---------->8------------ From 96dfdfdacb2a26a60ba19051e8c72e839eb5408b Mon Sep 17 00:00:00 2001 From: Qais Yousef <qais.yousef@arm.com> Date: Mon, 26 Oct 2020 16:33:32 +0000 Subject: [PATCH] arm64: export aarch32_online mask in sysfs This patch to be applied on top of arm64 Asymmetric AArch32 support. It explores the option of exporting the AArch32 capable cpus as a mask on sysfs. This is to help drive the discussion on the API before sending the next version which I yet to address some of the review comments. The output looks like: # cat /sys/devices/system/cpu/aarch32_online 0-5 Signed-off-by: Qais Yousef <qais.yousef@arm.com> --- Documentation/ABI/testing/sysfs-devices-system-cpu | 7 +++++++ arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 8 ++++++++ drivers/base/cpu.c | 12 ++++++++++++ 4 files changed, 29 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index b555df825447..9ccb5c3f5ee3 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -36,6 +36,13 @@ Description: CPU topology files that describe kernel limits related to See Documentation/admin-guide/cputopology.rst for more information. +What: /sys/devices/system/cpu/aarch32_online +Date: October 2020 +Contact: Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org> +Description: CPU topology file that describes which cpus support AArch32 at + EL0. Only available on arm64. + + The value is updated when a cpu becomes online then sticks. What: /sys/devices/system/cpu/probe /sys/devices/system/cpu/release diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 2b87f17b2bd4..edd18002ad81 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -380,6 +380,8 @@ cpucap_multi_entry_cap_matches(const struct arm64_cpu_capabilities *entry, return false; } +extern cpumask_t aarch32_el0_mask; + extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; extern struct static_key_false arm64_const_caps_ready; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 0f7307c8ad80..662bbc2b15cd 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1723,6 +1723,13 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap) return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT); } +cpumask_t aarch32_el0_mask; +static void cpu_enable_aarch32_el0(struct arm64_cpu_capabilities const *cap) +{ + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) + cpumask_set_cpu(smp_processor_id(), &aarch32_el0_mask); +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -1809,6 +1816,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { { .capability = ARM64_HAS_ASYM_32BIT_EL0, .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, + .cpu_enable = cpu_enable_aarch32_el0, .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64PFR0_EL1, .sign = FTR_UNSIGNED, diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index d2136ab9b14a..569baacde508 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -459,6 +459,15 @@ EXPORT_SYMBOL_GPL(cpu_device_create); static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL); #endif +#ifdef CONFIG_ARM64 +static ssize_t print_aarch32_online(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return cpumap_print_to_pagebuf(true, buf, &aarch32_el0_mask); +} +static DEVICE_ATTR(aarch32_online, 0444, print_aarch32_online, NULL); +#endif + static struct attribute *cpu_root_attrs[] = { #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE &dev_attr_probe.attr, @@ -475,6 +484,9 @@ static struct attribute *cpu_root_attrs[] = { #endif #ifdef CONFIG_GENERIC_CPU_AUTOPROBE &dev_attr_modalias.attr, +#endif +#ifdef CONFIG_ARM64 + &dev_attr_aarch32_online.attr, #endif NULL };
On Mon, Oct 26, 2020 at 07:02:50PM +0000, Qais Yousef wrote: > On 10/22/20 15:55, Greg Kroah-Hartman wrote: > > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote: > > > On 10/21/20 15:41, Will Deacon wrote: > > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > > > can expand it to include _all_ the other ID_* regs currently available > > > > > to user via the MRS emulation and we won't have to debate what a new > > > > > interface would look like. The MRS emulation and the sysfs info should > > > > > probably match, though that means we need to expose the > > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > > > kernel implementation perspective and from the application usability > > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > > not a strong preference for any). > > > > > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > > what we should do. It's also then dead easy to disable if necessary by > > > > just returning 0. The only alternative I would prefer is not having to > > > > expose this information altogether, but I'm not sure that figuring this > > > > out from MIDR/REVIDR alone is reliable. > > > > > > So the mask idea is about adding a new > > > > > > /sys/devices/system/cpu/aarch32_cpus > > > > > > ? > > > > Is this a file, a directory, or what? What's the contents? > > > > Without any of that, I have no idea if it's "ok" or not... > > Hopefully the below patch explains better. Note that I added the new attribute > to driver/base/cpu.c, but assuming we will still want to go down this route, we > will need a generic way for archs to add their attributes to > /sys/devices/system/cpu/. > > Something like having a special define for archs to append their own > attributes list > > #define SYSFS_SYSTEM_CPU_ARCH_ATTRIBUTES > > Or probably there's a way to add this file (attribute) dynamically from arch > code that I just didn't figure out how to do yet. Please do that, sysfs files should not be present when the information is not needed from them. Look at the is_visible() callback for the attribute for how to do it. > > Thanks > > -- > Qais Yousef > > > ---------->8------------ > > >From 96dfdfdacb2a26a60ba19051e8c72e839eb5408b Mon Sep 17 00:00:00 2001 > From: Qais Yousef <qais.yousef@arm.com> > Date: Mon, 26 Oct 2020 16:33:32 +0000 > Subject: [PATCH] arm64: export aarch32_online mask in sysfs > > This patch to be applied on top of arm64 Asymmetric AArch32 support. > > It explores the option of exporting the AArch32 capable cpus as a mask > on sysfs. > > This is to help drive the discussion on the API before sending the next > version which I yet to address some of the review comments. > > The output looks like: > > # cat /sys/devices/system/cpu/aarch32_online > 0-5 > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > --- > Documentation/ABI/testing/sysfs-devices-system-cpu | 7 +++++++ > arch/arm64/include/asm/cpufeature.h | 2 ++ > arch/arm64/kernel/cpufeature.c | 8 ++++++++ > drivers/base/cpu.c | 12 ++++++++++++ > 4 files changed, 29 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu > index b555df825447..9ccb5c3f5ee3 100644 > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu > @@ -36,6 +36,13 @@ Description: CPU topology files that describe kernel limits related to > > See Documentation/admin-guide/cputopology.rst for more information. > > +What: /sys/devices/system/cpu/aarch32_online > +Date: October 2020 > +Contact: Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org> > +Description: CPU topology file that describes which cpus support AArch32 at > + EL0. Only available on arm64. > + > + The value is updated when a cpu becomes online then sticks. What does "then sticks" mean? > > What: /sys/devices/system/cpu/probe > /sys/devices/system/cpu/release > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > index 2b87f17b2bd4..edd18002ad81 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -380,6 +380,8 @@ cpucap_multi_entry_cap_matches(const struct arm64_cpu_capabilities *entry, > return false; > } > > +extern cpumask_t aarch32_el0_mask; > + > extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); > extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; > extern struct static_key_false arm64_const_caps_ready; > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 0f7307c8ad80..662bbc2b15cd 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -1723,6 +1723,13 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap) > return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT); > } > > +cpumask_t aarch32_el0_mask; > +static void cpu_enable_aarch32_el0(struct arm64_cpu_capabilities const *cap) > +{ > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) > + cpumask_set_cpu(smp_processor_id(), &aarch32_el0_mask); > +} > + > static const struct arm64_cpu_capabilities arm64_features[] = { > { > .desc = "GIC system register CPU interface", > @@ -1809,6 +1816,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { > { > .capability = ARM64_HAS_ASYM_32BIT_EL0, > .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, > + .cpu_enable = cpu_enable_aarch32_el0, > .matches = has_cpuid_feature, > .sys_reg = SYS_ID_AA64PFR0_EL1, > .sign = FTR_UNSIGNED, > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c > index d2136ab9b14a..569baacde508 100644 > --- a/drivers/base/cpu.c > +++ b/drivers/base/cpu.c > @@ -459,6 +459,15 @@ EXPORT_SYMBOL_GPL(cpu_device_create); > static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL); > #endif > > +#ifdef CONFIG_ARM64 > +static ssize_t print_aarch32_online(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return cpumap_print_to_pagebuf(true, buf, &aarch32_el0_mask); > +} > +static DEVICE_ATTR(aarch32_online, 0444, print_aarch32_online, NULL); DEVICE_ATTR_RO()? > +#endif Hah, no, no arch-specific stuff in here, sorry. Please do this properly in your arch-specific code only. thanks, greg k-h
On 10/26/20 20:08, Greg Kroah-Hartman wrote: > On Mon, Oct 26, 2020 at 07:02:50PM +0000, Qais Yousef wrote: > > On 10/22/20 15:55, Greg Kroah-Hartman wrote: > > > On Thu, Oct 22, 2020 at 02:47:52PM +0100, Qais Yousef wrote: > > > > On 10/21/20 15:41, Will Deacon wrote: > > > > > > We already expose MIDR and REVIDR via the current sysfs interface. We > > > > > > can expand it to include _all_ the other ID_* regs currently available > > > > > > to user via the MRS emulation and we won't have to debate what a new > > > > > > interface would look like. The MRS emulation and the sysfs info should > > > > > > probably match, though that means we need to expose the > > > > > > ID_AA64PFR0_EL1.EL0 field which we currently don't. > > > > > > > > > > > > I do agree that an AArch32 cpumask is an easier option both from the > > > > > > kernel implementation perspective and from the application usability > > > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > > > not a strong preference for any). > > > > > > > > > > If a cpumask is easier to implement and easier to use, then I think that's > > > > > what we should do. It's also then dead easy to disable if necessary by > > > > > just returning 0. The only alternative I would prefer is not having to > > > > > expose this information altogether, but I'm not sure that figuring this > > > > > out from MIDR/REVIDR alone is reliable. > > > > > > > > So the mask idea is about adding a new > > > > > > > > /sys/devices/system/cpu/aarch32_cpus > > > > > > > > ? > > > > > > Is this a file, a directory, or what? What's the contents? > > > > > > Without any of that, I have no idea if it's "ok" or not... > > > > Hopefully the below patch explains better. Note that I added the new attribute > > to driver/base/cpu.c, but assuming we will still want to go down this route, we > > will need a generic way for archs to add their attributes to > > /sys/devices/system/cpu/. > > > > Something like having a special define for archs to append their own > > attributes list > > > > #define SYSFS_SYSTEM_CPU_ARCH_ATTRIBUTES > > > > Or probably there's a way to add this file (attribute) dynamically from arch > > code that I just didn't figure out how to do yet. > > Please do that, sysfs files should not be present when the information > is not needed from them. Look at the is_visible() callback for the > attribute for how to do it. Okay, thanks for the hint. Will look at that. > > > > Thanks > > > > -- > > Qais Yousef > > > > > > ---------->8------------ > > > > >From 96dfdfdacb2a26a60ba19051e8c72e839eb5408b Mon Sep 17 00:00:00 2001 > > From: Qais Yousef <qais.yousef@arm.com> > > Date: Mon, 26 Oct 2020 16:33:32 +0000 > > Subject: [PATCH] arm64: export aarch32_online mask in sysfs > > > > This patch to be applied on top of arm64 Asymmetric AArch32 support. > > > > It explores the option of exporting the AArch32 capable cpus as a mask > > on sysfs. > > > > This is to help drive the discussion on the API before sending the next > > version which I yet to address some of the review comments. > > > > The output looks like: > > > > # cat /sys/devices/system/cpu/aarch32_online > > 0-5 > > > > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > > --- > > Documentation/ABI/testing/sysfs-devices-system-cpu | 7 +++++++ > > arch/arm64/include/asm/cpufeature.h | 2 ++ > > arch/arm64/kernel/cpufeature.c | 8 ++++++++ > > drivers/base/cpu.c | 12 ++++++++++++ > > 4 files changed, 29 insertions(+) > > > > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu > > index b555df825447..9ccb5c3f5ee3 100644 > > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu > > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu > > @@ -36,6 +36,13 @@ Description: CPU topology files that describe kernel limits related to > > > > See Documentation/admin-guide/cputopology.rst for more information. > > > > +What: /sys/devices/system/cpu/aarch32_online > > +Date: October 2020 > > +Contact: Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org> > > +Description: CPU topology file that describes which cpus support AArch32 at > > + EL0. Only available on arm64. > > + > > + The value is updated when a cpu becomes online then sticks. > > What does "then sticks" mean? Was thinking like a sticky bit. When a cpu becomes online and we discover that it is aarch32 capable, we set the bit. But never clear it again if the cpu goes offline later. I'll reword it. > > > > > > What: /sys/devices/system/cpu/probe > > /sys/devices/system/cpu/release > > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > > index 2b87f17b2bd4..edd18002ad81 100644 > > --- a/arch/arm64/include/asm/cpufeature.h > > +++ b/arch/arm64/include/asm/cpufeature.h > > @@ -380,6 +380,8 @@ cpucap_multi_entry_cap_matches(const struct arm64_cpu_capabilities *entry, > > return false; > > } > > > > +extern cpumask_t aarch32_el0_mask; > > + > > extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); > > extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; > > extern struct static_key_false arm64_const_caps_ready; > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 0f7307c8ad80..662bbc2b15cd 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -1723,6 +1723,13 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap) > > return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT); > > } > > > > +cpumask_t aarch32_el0_mask; > > +static void cpu_enable_aarch32_el0(struct arm64_cpu_capabilities const *cap) > > +{ > > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) > > + cpumask_set_cpu(smp_processor_id(), &aarch32_el0_mask); > > +} > > + > > static const struct arm64_cpu_capabilities arm64_features[] = { > > { > > .desc = "GIC system register CPU interface", > > @@ -1809,6 +1816,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { > > { > > .capability = ARM64_HAS_ASYM_32BIT_EL0, > > .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, > > + .cpu_enable = cpu_enable_aarch32_el0, > > .matches = has_cpuid_feature, > > .sys_reg = SYS_ID_AA64PFR0_EL1, > > .sign = FTR_UNSIGNED, > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c > > index d2136ab9b14a..569baacde508 100644 > > --- a/drivers/base/cpu.c > > +++ b/drivers/base/cpu.c > > @@ -459,6 +459,15 @@ EXPORT_SYMBOL_GPL(cpu_device_create); > > static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL); > > #endif > > > > +#ifdef CONFIG_ARM64 > > +static ssize_t print_aarch32_online(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + return cpumap_print_to_pagebuf(true, buf, &aarch32_el0_mask); > > +} > > +static DEVICE_ATTR(aarch32_online, 0444, print_aarch32_online, NULL); > > DEVICE_ATTR_RO()? Indeed. > > > +#endif > > Hah, no, no arch-specific stuff in here, sorry. Please do this properly > in your arch-specific code only. Of course. It was just to see this is okay. Let me figure out how to clean this up. Thanks! -- Qais Yousef
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst index f28853f80089..bfcbda6d6f35 100644 --- a/Documentation/arm64/cpu-feature-registers.rst +++ b/Documentation/arm64/cpu-feature-registers.rst @@ -166,7 +166,7 @@ infrastructure: +------------------------------+---------+---------+ | EL1 | [7-4] | n | +------------------------------+---------+---------+ - | EL0 | [3-0] | n | + | EL0 | [3-0] | y | +------------------------------+---------+---------+ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6f795c8221f4..0f7307c8ad80 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -221,7 +221,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL3_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL2_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY), + ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY), ARM64_FTR_END, }; diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index 93c55986ca7f..632b9d5b5230 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -231,25 +231,71 @@ static struct kobj_type cpuregs_kobj_type = { * future expansion without an ABI break. */ #define kobj_to_cpuinfo(kobj) container_of(kobj, struct cpuinfo_arm64, kobj) -#define CPUREGS_ATTR_RO(_name, _field) \ +#define CPUREGS_RAW_ATTR_RO(_name, _field) \ static ssize_t _name##_show(struct kobject *kobj, \ struct kobj_attribute *attr, char *buf) \ { \ struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj); \ \ - if (info->reg_midr) \ - return sprintf(buf, "0x%016x\n", info->reg_##_field); \ - else \ + if (info->reg_midr) { \ + u64 val = info->reg_##_field; \ + return sprintf(buf, "0x%016llx\n", val); \ + } else { \ return 0; \ + } \ } \ static struct kobj_attribute cpuregs_attr_##_name = __ATTR_RO(_name) -CPUREGS_ATTR_RO(midr_el1, midr); -CPUREGS_ATTR_RO(revidr_el1, revidr); +/* + * Expose the Sanitized or RAW user space visible fields of a sys_register + * based @cond. + * + * if (@cond) + * expose raw register & user_mask + * else + * expose sanitized register & user_mask + * + * user_mask is based on arm64_ftr_reg definition. + */ +#define CPUREGS_USER_ATTR_RO(_name, _raw_field, _san_id, cond) \ + static ssize_t _name##_show(struct kobject *kobj, \ + struct kobj_attribute *attr, char *buf) \ + { \ + u64 val = 0; \ + \ + if (cond) { \ + struct arm64_ftr_reg *reg = get_arm64_ftr_reg(_san_id); \ + struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj); \ + \ + if (!reg) \ + return 0; \ + \ + if (info->reg_midr) { \ + val = info->reg_##_raw_field & reg->user_mask; \ + val |= reg->user_val; \ + return sprintf(buf, "0x%016llx\n", val); \ + } else { \ + return 0; \ + } \ + } else { \ + val = 0; \ + if (!emulate_sys_reg(_san_id, &val)) { \ + return sprintf(buf, "0x%016llx\n", val); \ + } else { \ + return 0; \ + } \ + } \ + } \ + static struct kobj_attribute cpuregs_attr_##_name = __ATTR_RO(_name) + +CPUREGS_RAW_ATTR_RO(midr_el1, midr); +CPUREGS_RAW_ATTR_RO(revidr_el1, revidr); +CPUREGS_USER_ATTR_RO(id_aa64pfr0, id_aa64pfr0, SYS_ID_AA64PFR0_EL1, system_supports_asym_32bit_el0()); static struct attribute *cpuregs_id_attrs[] = { &cpuregs_attr_midr_el1.attr, &cpuregs_attr_revidr_el1.attr, + &cpuregs_attr_id_aa64pfr0.attr, NULL };
So that userspace can detect if the cpu has aarch32 support at EL0. CPUREGS_ATTR_RO() was renamed to CPUREGS_RAW_ATTR_RO() to better reflect what it does. And fixed to accept both u64 and u32 without causing the printf to print out a warning about mismatched type. This was caught while testing to check the new CPUREGS_USER_ATTR_RO(). The new CPUREGS_USER_ATTR_RO() exports a Sanitised or RAW sys_reg based on a @cond to user space. The exported fields match the definition in arm64_ftr_reg so that the content of a register exported via MRS and sysfs are kept cohesive. The @cond in our case is that the system is asymmetric aarch32 and the controlling sysctl.enable_asym_32bit is enabled. Update Documentation/arm64/cpu-feature-registers.rst to reflect the newly visible EL0 field in ID_AA64FPR0_EL1. Note that the MRS interface will still return the sanitized content _only_. Signed-off-by: Qais Yousef <qais.yousef@arm.com> --- Example output. I was surprised that the 2nd field (bits[7:4]) is printed out although it's set as FTR_HIDDEN. # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 0x0000000000000011 0x0000000000000011 0x0000000000000011 0x0000000000000011 0x0000000000000011 0x0000000000000011 # echo 1 > /proc/sys/kernel/enable_asym_32bit # cat /sys/devices/system/cpu/cpu*/regs/identification/id_aa64pfr0 0x0000000000000011 0x0000000000000011 0x0000000000000012 0x0000000000000012 0x0000000000000011 0x0000000000000011 Documentation/arm64/cpu-feature-registers.rst | 2 +- arch/arm64/kernel/cpufeature.c | 2 +- arch/arm64/kernel/cpuinfo.c | 58 +++++++++++++++++-- 3 files changed, 54 insertions(+), 8 deletions(-)