Message ID | 20210610021229.3605241-1-pcc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis | expand |
On Wed, Jun 09, 2021 at 07:12:29PM -0700, Peter Collingbourne wrote: > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > index 9df3feeee890..545ef900e7ce 100644 > --- a/arch/arm64/include/asm/processor.h > +++ b/arch/arm64/include/asm/processor.h > @@ -159,6 +159,7 @@ struct thread_struct { > #define SCTLR_USER_MASK \ > (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB | \ > SCTLR_EL1_TCF0_MASK) > +#define SCTLR_USER_DYNAMIC_TCF (1ULL << 63) Even if you called it "USER", it still gives the impression that it's some real hardware bit. Who knows, in a few years time it may be allocated to a real feature. I also don't think this logic should be added to processor.[ch], just keep it within mte.c. So while it's convenient to add something to this field, given that it's shared with ptrauth, it's pretty fragile long term. I'd add the information about the dynamic mode to a different field. We could rename gcr_user_excl to mte_ctrl or something and store a few bits in there in addition to GCR_EL1.Excl (with corresponding masks etc.) > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c > index 125a10e413e9..f244bc96464c 100644 > --- a/arch/arm64/kernel/mte.c > +++ b/arch/arm64/kernel/mte.c [...] > @@ -283,6 +288,9 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg) > return -EINVAL; > } > > + if (arg & PR_MTE_DYNAMIC_TCF) > + sctlr |= SCTLR_USER_DYNAMIC_TCF; Please move this to thread.mte_ctrl and have mte_thread_switch() update thread.sctlr_user (which would be set in hardware subsequently in __switch_to()). [...] > @@ -453,3 +464,60 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request, > > return ret; > } > + > +static ssize_t mte_upgrade_async_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + u32 val = per_cpu(mte_upgrade_async, dev->id) >> SCTLR_EL1_TCF0_SHIFT; > + > + return sysfs_emit(buf, "%u\n", val); > +} > + > +static int sync_sctlr(void *arg) > +{ > + update_sctlr_el1(current->thread.sctlr_user); > + return 0; > +} This should update thread.sctlr_user based on thread.mte_ctrl before calling update_sctlr_el1() (i.e. keep the logic to this file). > + > +static ssize_t mte_upgrade_async_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + ssize_t ret; > + u32 val; > + u64 tcf; > + > + ret = kstrtou32(buf, 0, &val); > + if (ret < 0) > + return ret; > + > + tcf = ((u64)val) << SCTLR_EL1_TCF0_SHIFT; > + if (tcf != SCTLR_EL1_TCF0_NONE && tcf != SCTLR_EL1_TCF0_SYNC && > + tcf != SCTLR_EL1_TCF0_ASYNC) > + return -EINVAL; > + > + device_lock(dev); > + per_cpu(mte_upgrade_async, dev->id) = tcf; > + > + ret = stop_machine(sync_sctlr, 0, cpumask_of(dev->id)); Do we really need a stop_machine() here? That's really heavy and we don't need such synchronisation. An smp_call_function() should do or just leave it until the next context switch on the corresponding CPUs. [...] > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index b4bb67f17a2c..27a12c53529d 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -527,8 +527,19 @@ static void erratum_1418040_thread_switch(struct task_struct *prev, > write_sysreg(val, cntkctl_el1); > } > > -static void update_sctlr_el1(u64 sctlr) > +DECLARE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); > + > +void update_sctlr_el1(u64 sctlr) > { > + if (sctlr & SCTLR_USER_DYNAMIC_TCF) { > + sctlr &= ~SCTLR_USER_DYNAMIC_TCF; > + > + if ((sctlr & SCTLR_EL1_TCF0_MASK) == SCTLR_EL1_TCF0_ASYNC) { > + sctlr &= ~SCTLR_EL1_TCF0_MASK; > + sctlr |= __this_cpu_read(mte_upgrade_async); > + } > + } As per my comments above, please move this logic to mte.c. [...] > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index dcd7041b2b07..20c1503c8cdd 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -56,6 +56,8 @@ > > DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); > EXPORT_PER_CPU_SYMBOL(cpu_number); > +DEFINE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); > +EXPORT_PER_CPU_SYMBOL(mte_upgrade_async); > > /* > * as from 2.5, kernels no longer have an init_tasks structure > @@ -649,6 +651,27 @@ static void __init acpi_parse_and_init_cpus(void) > #define acpi_parse_and_init_cpus(...) do { } while (0) > #endif > > +/* > + * Store the default values for per-CPU properties typically read from DT or > + * ACPI to per-CPU variables. > + */ > +static void __init set_default_cpu_properties(int cpu) > +{ > + per_cpu(mte_upgrade_async, cpu) = SCTLR_EL1_TCF0_ASYNC; > +} > + > +/* > + * Read per-CPU properties from the device tree and store them in per-CPU > + * variables for efficient access later. > + */ > +static void __init of_read_cpu_properties(int cpu, struct device_node *dn) > +{ > + u32 prop; > + > + if (of_property_read_u32(dn, "mte-upgrade-async", &prop) == 0) > + per_cpu(mte_upgrade_async, cpu) = ((u64)prop) << SCTLR_EL1_TCF0_SHIFT; > +} I don't think we should introduce the DT description at all, at least not at this stage. I'm not entirely convinced async == sync in terms of performance on certain CPU/SoC implementations. There's probably some small difference between them, depending on the benchmark. We don't have a concept of what an acceptable margin is, so baking it into the DT doesn't make sense. Do mobile vendors normally have some of their own software running on the SoCs? They could use some startup code to program the sysfs interface, maybe based on the MIDR. > static const char *ipi_types[NR_IPI] __tracepoint_string = { > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index 18a9f59dc067..4dab44732814 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -242,6 +242,8 @@ struct prctl_mm_map { > /* MTE tag inclusion mask */ > # define PR_MTE_TAG_SHIFT 3 > # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) > +/* Enable dynamic upgrading of MTE tag check fault mode */ > +# define PR_MTE_DYNAMIC_TCF (1UL << 19) Once we agreed on the implementation, please add some documentation on this new control, as we did with the others.
On Fri, Jun 11, 2021 at 7:49 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Jun 09, 2021 at 07:12:29PM -0700, Peter Collingbourne wrote: > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > > index 9df3feeee890..545ef900e7ce 100644 > > --- a/arch/arm64/include/asm/processor.h > > +++ b/arch/arm64/include/asm/processor.h > > @@ -159,6 +159,7 @@ struct thread_struct { > > #define SCTLR_USER_MASK \ > > (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB | \ > > SCTLR_EL1_TCF0_MASK) > > +#define SCTLR_USER_DYNAMIC_TCF (1ULL << 63) > > Even if you called it "USER", it still gives the impression that it's > some real hardware bit. Who knows, in a few years time it may be > allocated to a real feature. If bit 63 ends up being allocated to a bit that we want to allow userspace control over then we can always just move this to another bit. There are plenty to choose from that I don't think we will ever allow user control over, e.g. EIS. > I also don't think this logic should be added to processor.[ch], just > keep it within mte.c. > > So while it's convenient to add something to this field, given that it's > shared with ptrauth, it's pretty fragile long term. I'd add the > information about the dynamic mode to a different field. We could rename > gcr_user_excl to mte_ctrl or something and store a few bits in there in > addition to GCR_EL1.Excl (with corresponding masks etc.) I do take your point that it's somewhat awkward to commingle the SCTLR bits and the dynamic TCF setting here, but I'm not sure that it's overall better to move the bits to a renamed gcr_user_excl field. The consequence would be that we need two copies of the TCF setting in thread_struct and they will need to be kept in sync and leads to an implicit ordering dependency between the code dealing with the two fields on context switch. i.e. either way things are a bit awkward and I think I'd prefer the way where the code is simpler as a result of less data in the data structures. We can make this more maintainable by adding a static_assert that SCTLR_USER_DYNAMIC_TCF doesn't overlap with any of the bits in SCTLR_USER_MASK, as I've done in v3. Let me know what you think and if you still disagree then I can try to make this look more like you suggested. > > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c > > index 125a10e413e9..f244bc96464c 100644 > > --- a/arch/arm64/kernel/mte.c > > +++ b/arch/arm64/kernel/mte.c > [...] > > @@ -283,6 +288,9 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg) > > return -EINVAL; > > } > > > > + if (arg & PR_MTE_DYNAMIC_TCF) > > + sctlr |= SCTLR_USER_DYNAMIC_TCF; > > Please move this to thread.mte_ctrl and have mte_thread_switch() update > thread.sctlr_user (which would be set in hardware subsequently in > __switch_to()). > > [...] > > @@ -453,3 +464,60 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request, > > > > return ret; > > } > > + > > +static ssize_t mte_upgrade_async_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + u32 val = per_cpu(mte_upgrade_async, dev->id) >> SCTLR_EL1_TCF0_SHIFT; > > + > > + return sysfs_emit(buf, "%u\n", val); > > +} > > + > > +static int sync_sctlr(void *arg) > > +{ > > + update_sctlr_el1(current->thread.sctlr_user); > > + return 0; > > +} > > This should update thread.sctlr_user based on thread.mte_ctrl before > calling update_sctlr_el1() (i.e. keep the logic to this file). > > > + > > +static ssize_t mte_upgrade_async_store(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + ssize_t ret; > > + u32 val; > > + u64 tcf; > > + > > + ret = kstrtou32(buf, 0, &val); > > + if (ret < 0) > > + return ret; > > + > > + tcf = ((u64)val) << SCTLR_EL1_TCF0_SHIFT; > > + if (tcf != SCTLR_EL1_TCF0_NONE && tcf != SCTLR_EL1_TCF0_SYNC && > > + tcf != SCTLR_EL1_TCF0_ASYNC) > > + return -EINVAL; > > + > > + device_lock(dev); > > + per_cpu(mte_upgrade_async, dev->id) = tcf; > > + > > + ret = stop_machine(sync_sctlr, 0, cpumask_of(dev->id)); > > Do we really need a stop_machine() here? That's really heavy and we > don't need such synchronisation. An smp_call_function() should do or > just leave it until the next context switch on the corresponding CPUs. Looks like smp_call_function_single() should work here; done in v3. > [...] > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > index b4bb67f17a2c..27a12c53529d 100644 > > --- a/arch/arm64/kernel/process.c > > +++ b/arch/arm64/kernel/process.c > > @@ -527,8 +527,19 @@ static void erratum_1418040_thread_switch(struct task_struct *prev, > > write_sysreg(val, cntkctl_el1); > > } > > > > -static void update_sctlr_el1(u64 sctlr) > > +DECLARE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); > > + > > +void update_sctlr_el1(u64 sctlr) > > { > > + if (sctlr & SCTLR_USER_DYNAMIC_TCF) { > > + sctlr &= ~SCTLR_USER_DYNAMIC_TCF; > > + > > + if ((sctlr & SCTLR_EL1_TCF0_MASK) == SCTLR_EL1_TCF0_ASYNC) { > > + sctlr &= ~SCTLR_EL1_TCF0_MASK; > > + sctlr |= __this_cpu_read(mte_upgrade_async); > > + } > > + } > > As per my comments above, please move this logic to mte.c. > > [...] > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index dcd7041b2b07..20c1503c8cdd 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -56,6 +56,8 @@ > > > > DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); > > EXPORT_PER_CPU_SYMBOL(cpu_number); > > +DEFINE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); > > +EXPORT_PER_CPU_SYMBOL(mte_upgrade_async); > > > > /* > > * as from 2.5, kernels no longer have an init_tasks structure > > @@ -649,6 +651,27 @@ static void __init acpi_parse_and_init_cpus(void) > > #define acpi_parse_and_init_cpus(...) do { } while (0) > > #endif > > > > +/* > > + * Store the default values for per-CPU properties typically read from DT or > > + * ACPI to per-CPU variables. > > + */ > > +static void __init set_default_cpu_properties(int cpu) > > +{ > > + per_cpu(mte_upgrade_async, cpu) = SCTLR_EL1_TCF0_ASYNC; > > +} > > + > > +/* > > + * Read per-CPU properties from the device tree and store them in per-CPU > > + * variables for efficient access later. > > + */ > > +static void __init of_read_cpu_properties(int cpu, struct device_node *dn) > > +{ > > + u32 prop; > > + > > + if (of_property_read_u32(dn, "mte-upgrade-async", &prop) == 0) > > + per_cpu(mte_upgrade_async, cpu) = ((u64)prop) << SCTLR_EL1_TCF0_SHIFT; > > +} > > I don't think we should introduce the DT description at all, at least > not at this stage. I'm not entirely convinced async == sync in terms of > performance on certain CPU/SoC implementations. There's probably some > small difference between them, depending on the benchmark. We don't have > a concept of what an acceptable margin is, so baking it into the DT > doesn't make sense. > > Do mobile vendors normally have some of their own software running on > the SoCs? They could use some startup code to program the sysfs > interface, maybe based on the MIDR. Yes, it should be possible. On Android it's possible for the vendor to hook into the init sequence and do things like write to sysfs. So there's a way for the vendor to configure this on Android at least. Therefore I've removed the DT stuff in v3. > > static const char *ipi_types[NR_IPI] __tracepoint_string = { > > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > > index 18a9f59dc067..4dab44732814 100644 > > --- a/include/uapi/linux/prctl.h > > +++ b/include/uapi/linux/prctl.h > > @@ -242,6 +242,8 @@ struct prctl_mm_map { > > /* MTE tag inclusion mask */ > > # define PR_MTE_TAG_SHIFT 3 > > # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) > > +/* Enable dynamic upgrading of MTE tag check fault mode */ > > +# define PR_MTE_DYNAMIC_TCF (1UL << 19) > > Once we agreed on the implementation, please add some documentation on > this new control, as we did with the others. Added in v3. Peter
On Fri, Jun 11, 2021 at 02:50:03PM -0700, Peter Collingbourne wrote: > On Fri, Jun 11, 2021 at 7:49 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Jun 09, 2021 at 07:12:29PM -0700, Peter Collingbourne wrote: > > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > > > index 9df3feeee890..545ef900e7ce 100644 > > > --- a/arch/arm64/include/asm/processor.h > > > +++ b/arch/arm64/include/asm/processor.h > > > @@ -159,6 +159,7 @@ struct thread_struct { > > > #define SCTLR_USER_MASK \ > > > (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB | \ > > > SCTLR_EL1_TCF0_MASK) > > > +#define SCTLR_USER_DYNAMIC_TCF (1ULL << 63) > > > > Even if you called it "USER", it still gives the impression that it's > > some real hardware bit. Who knows, in a few years time it may be > > allocated to a real feature. > > If bit 63 ends up being allocated to a bit that we want to allow > userspace control over then we can always just move this to another > bit. There are plenty to choose from that I don't think we will ever > allow user control over, e.g. EIS. > > > I also don't think this logic should be added to processor.[ch], just > > keep it within mte.c. > > > > So while it's convenient to add something to this field, given that it's > > shared with ptrauth, it's pretty fragile long term. I'd add the > > information about the dynamic mode to a different field. We could rename > > gcr_user_excl to mte_ctrl or something and store a few bits in there in > > addition to GCR_EL1.Excl (with corresponding masks etc.) > > I do take your point that it's somewhat awkward to commingle the SCTLR > bits and the dynamic TCF setting here, but I'm not sure that it's > overall better to move the bits to a renamed gcr_user_excl field. The > consequence would be that we need two copies of the TCF setting in > thread_struct and they will need to be kept in sync and leads to an > implicit ordering dependency between the code dealing with the two > fields on context switch. I haven't checked v3 yet but I don't understand what the ordering problem is. gcr_user_excl is also part of thread_struct and it shouldn't change while the thread is in the middle of a context switch. > We can make this more maintainable by adding a static_assert that > SCTLR_USER_DYNAMIC_TCF doesn't overlap with any of the bits in > SCTLR_USER_MASK, as I've done in v3. > > Let me know what you think and if you still disagree then I can try to > make this look more like you suggested. I just don't like adding software bits to the sctlr field. Who knows, we may need to add some more for MTE, maybe other features would do something similar and it's not maintainable. > > > +static ssize_t mte_upgrade_async_store(struct device *dev, > > > + struct device_attribute *attr, > > > + const char *buf, size_t count) > > > +{ > > > + ssize_t ret; > > > + u32 val; > > > + u64 tcf; > > > + > > > + ret = kstrtou32(buf, 0, &val); > > > + if (ret < 0) > > > + return ret; > > > + > > > + tcf = ((u64)val) << SCTLR_EL1_TCF0_SHIFT; > > > + if (tcf != SCTLR_EL1_TCF0_NONE && tcf != SCTLR_EL1_TCF0_SYNC && > > > + tcf != SCTLR_EL1_TCF0_ASYNC) > > > + return -EINVAL; > > > + > > > + device_lock(dev); > > > + per_cpu(mte_upgrade_async, dev->id) = tcf; > > > + > > > + ret = stop_machine(sync_sctlr, 0, cpumask_of(dev->id)); > > > > Do we really need a stop_machine() here? That's really heavy and we > > don't need such synchronisation. An smp_call_function() should do or > > just leave it until the next context switch on the corresponding CPUs. > > Looks like smp_call_function_single() should work here; done in v3. Why "single"? Don't you want this executed on all CPUs?
On Mon, Jun 14, 2021 at 10:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Fri, Jun 11, 2021 at 02:50:03PM -0700, Peter Collingbourne wrote: > > On Fri, Jun 11, 2021 at 7:49 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Wed, Jun 09, 2021 at 07:12:29PM -0700, Peter Collingbourne wrote: > > > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h > > > > index 9df3feeee890..545ef900e7ce 100644 > > > > --- a/arch/arm64/include/asm/processor.h > > > > +++ b/arch/arm64/include/asm/processor.h > > > > @@ -159,6 +159,7 @@ struct thread_struct { > > > > #define SCTLR_USER_MASK \ > > > > (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB | \ > > > > SCTLR_EL1_TCF0_MASK) > > > > +#define SCTLR_USER_DYNAMIC_TCF (1ULL << 63) > > > > > > Even if you called it "USER", it still gives the impression that it's > > > some real hardware bit. Who knows, in a few years time it may be > > > allocated to a real feature. > > > > If bit 63 ends up being allocated to a bit that we want to allow > > userspace control over then we can always just move this to another > > bit. There are plenty to choose from that I don't think we will ever > > allow user control over, e.g. EIS. > > > > > I also don't think this logic should be added to processor.[ch], just > > > keep it within mte.c. > > > > > > So while it's convenient to add something to this field, given that it's > > > shared with ptrauth, it's pretty fragile long term. I'd add the > > > information about the dynamic mode to a different field. We could rename > > > gcr_user_excl to mte_ctrl or something and store a few bits in there in > > > addition to GCR_EL1.Excl (with corresponding masks etc.) > > > > I do take your point that it's somewhat awkward to commingle the SCTLR > > bits and the dynamic TCF setting here, but I'm not sure that it's > > overall better to move the bits to a renamed gcr_user_excl field. The > > consequence would be that we need two copies of the TCF setting in > > thread_struct and they will need to be kept in sync and leads to an > > implicit ordering dependency between the code dealing with the two > > fields on context switch. > > I haven't checked v3 yet but I don't understand what the ordering > problem is. gcr_user_excl is also part of thread_struct and it shouldn't > change while the thread is in the middle of a context switch. It's not due to multithreading but more to do with making the code less readable by adding constraints on the code, i.e. with your scheme the call to mte_thread_switch() would implicitly need to happen before update_sctlr_el1() whereas there was no such constraint before. > > We can make this more maintainable by adding a static_assert that > > SCTLR_USER_DYNAMIC_TCF doesn't overlap with any of the bits in > > SCTLR_USER_MASK, as I've done in v3. > > > > Let me know what you think and if you still disagree then I can try to > > make this look more like you suggested. > > I just don't like adding software bits to the sctlr field. Who knows, we > may need to add some more for MTE, maybe other features would do > something similar and it's not maintainable. Oh, that's unfortunate. As you might imagine I still disagree that it's less maintainable but I guess it's not too important so I'll change the code as you requested. Peter
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 9df3feeee890..545ef900e7ce 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -159,6 +159,7 @@ struct thread_struct { #define SCTLR_USER_MASK \ (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB | \ SCTLR_EL1_TCF0_MASK) +#define SCTLR_USER_DYNAMIC_TCF (1ULL << 63) static inline void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size) @@ -251,6 +252,7 @@ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); +void update_sctlr_el1(u64 sctlr); void set_task_sctlr_el1(u64 sctlr); /* Thread switching */ diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 125a10e413e9..f244bc96464c 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -4,11 +4,13 @@ */ #include <linux/bitops.h> +#include <linux/cpu.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/prctl.h> #include <linux/sched.h> #include <linux/sched/mm.h> +#include <linux/stop_machine.h> #include <linux/string.h> #include <linux/swap.h> #include <linux/swapops.h> @@ -26,6 +28,8 @@ u64 gcr_kernel_excl __ro_after_init; static bool report_fault_once = true; +DECLARE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); + #ifdef CONFIG_KASAN_HW_TAGS /* Whether the MTE asynchronous mode is enabled. */ DEFINE_STATIC_KEY_FALSE(mte_async_mode); @@ -262,7 +266,8 @@ void mte_suspend_exit(void) long set_mte_ctrl(struct task_struct *task, unsigned long arg) { - u64 sctlr = task->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK; + u64 sctlr = task->thread.sctlr_user & + ~(SCTLR_EL1_TCF0_MASK | SCTLR_USER_DYNAMIC_TCF); u64 gcr_excl = ~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) & SYS_GCR_EL1_EXCL_MASK; @@ -283,6 +288,9 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg) return -EINVAL; } + if (arg & PR_MTE_DYNAMIC_TCF) + sctlr |= SCTLR_USER_DYNAMIC_TCF; + if (task != current) { task->thread.sctlr_user = sctlr; task->thread.gcr_user_excl = gcr_excl; @@ -316,6 +324,9 @@ long get_mte_ctrl(struct task_struct *task) break; } + if (task->thread.sctlr_user & SCTLR_USER_DYNAMIC_TCF) + ret |= PR_MTE_DYNAMIC_TCF; + return ret; } @@ -453,3 +464,60 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request, return ret; } + +static ssize_t mte_upgrade_async_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u32 val = per_cpu(mte_upgrade_async, dev->id) >> SCTLR_EL1_TCF0_SHIFT; + + return sysfs_emit(buf, "%u\n", val); +} + +static int sync_sctlr(void *arg) +{ + update_sctlr_el1(current->thread.sctlr_user); + return 0; +} + +static ssize_t mte_upgrade_async_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + u32 val; + u64 tcf; + + ret = kstrtou32(buf, 0, &val); + if (ret < 0) + return ret; + + tcf = ((u64)val) << SCTLR_EL1_TCF0_SHIFT; + if (tcf != SCTLR_EL1_TCF0_NONE && tcf != SCTLR_EL1_TCF0_SYNC && + tcf != SCTLR_EL1_TCF0_ASYNC) + return -EINVAL; + + device_lock(dev); + per_cpu(mte_upgrade_async, dev->id) = tcf; + + ret = stop_machine(sync_sctlr, 0, cpumask_of(dev->id)); + if (ret == 0) + ret = count; + device_unlock(dev); + + return ret; +} +static DEVICE_ATTR_RW(mte_upgrade_async); + +static void register_mte_upgrade_async_sysctl(void) +{ + unsigned int cpu; + + if (!system_supports_mte()) + return; + + for_each_possible_cpu(cpu) { + device_create_file(get_cpu_device(cpu), + &dev_attr_mte_upgrade_async); + } +} +subsys_initcall(register_mte_upgrade_async_sysctl); diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index b4bb67f17a2c..27a12c53529d 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -527,8 +527,19 @@ static void erratum_1418040_thread_switch(struct task_struct *prev, write_sysreg(val, cntkctl_el1); } -static void update_sctlr_el1(u64 sctlr) +DECLARE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); + +void update_sctlr_el1(u64 sctlr) { + if (sctlr & SCTLR_USER_DYNAMIC_TCF) { + sctlr &= ~SCTLR_USER_DYNAMIC_TCF; + + if ((sctlr & SCTLR_EL1_TCF0_MASK) == SCTLR_EL1_TCF0_ASYNC) { + sctlr &= ~SCTLR_EL1_TCF0_MASK; + sctlr |= __this_cpu_read(mte_upgrade_async); + } + } + /* * EnIA must not be cleared while in the kernel as this is necessary for * in-kernel PAC. It will be cleared on kernel exit if needed. @@ -659,7 +670,7 @@ long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg) return -EINVAL; if (system_supports_mte()) - valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK; + valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK | PR_MTE_DYNAMIC_TCF; if (arg & ~valid_mask) return -EINVAL; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index dcd7041b2b07..20c1503c8cdd 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -56,6 +56,8 @@ DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); EXPORT_PER_CPU_SYMBOL(cpu_number); +DEFINE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); +EXPORT_PER_CPU_SYMBOL(mte_upgrade_async); /* * as from 2.5, kernels no longer have an init_tasks structure @@ -649,6 +651,27 @@ static void __init acpi_parse_and_init_cpus(void) #define acpi_parse_and_init_cpus(...) do { } while (0) #endif +/* + * Store the default values for per-CPU properties typically read from DT or + * ACPI to per-CPU variables. + */ +static void __init set_default_cpu_properties(int cpu) +{ + per_cpu(mte_upgrade_async, cpu) = SCTLR_EL1_TCF0_ASYNC; +} + +/* + * Read per-CPU properties from the device tree and store them in per-CPU + * variables for efficient access later. + */ +static void __init of_read_cpu_properties(int cpu, struct device_node *dn) +{ + u32 prop; + + if (of_property_read_u32(dn, "mte-upgrade-async", &prop) == 0) + per_cpu(mte_upgrade_async, cpu) = ((u64)prop) << SCTLR_EL1_TCF0_SHIFT; +} + /* * Enumerate the possible CPU set from the device tree and build the * cpu logical map array containing MPIDR values related to logical @@ -774,6 +797,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) for_each_possible_cpu(cpu) { per_cpu(cpu_number, cpu) = cpu; + set_default_cpu_properties(cpu); if (cpu == smp_processor_id()) continue; @@ -789,6 +813,16 @@ void __init smp_prepare_cpus(unsigned int max_cpus) set_cpu_present(cpu, true); numa_store_cpu_info(cpu); } + + if (acpi_disabled) { + struct device_node *dn; + int cpu = 0; + + for_each_of_cpu_node(dn) { + of_read_cpu_properties(cpu, dn); + cpu++; + } + } } static const char *ipi_types[NR_IPI] __tracepoint_string = { diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 18a9f59dc067..4dab44732814 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -242,6 +242,8 @@ struct prctl_mm_map { /* MTE tag inclusion mask */ # define PR_MTE_TAG_SHIFT 3 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) +/* Enable dynamic upgrading of MTE tag check fault mode */ +# define PR_MTE_DYNAMIC_TCF (1UL << 19) /* Control reclaim behavior when allocating memory */ #define PR_SET_IO_FLUSHER 57
On some CPUs the performance of MTE in synchronous mode is the same as that of asynchronous mode. This makes it worthwhile to enable synchronous mode on those CPUs when asynchronous mode is requested, in order to gain the error detection benefits of synchronous mode without the performance downsides. Therefore, make it possible for user programs to opt into upgrading to synchronous mode on those CPUs via a new prctl flag. The flag is orthogonal to the existing TCF modes in order to accommodate upgrading from other TCF modes in the future. The feature is controlled on a per-CPU basis via a new device tree node, but it may also be controlled via sysfs. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8 --- v2: - make it an opt-in behavior - change the format of the device tree node - also allow controlling the feature via sysfs arch/arm64/include/asm/processor.h | 2 + arch/arm64/kernel/mte.c | 70 +++++++++++++++++++++++++++++- arch/arm64/kernel/process.c | 15 ++++++- arch/arm64/kernel/smp.c | 34 +++++++++++++++ include/uapi/linux/prctl.h | 2 + 5 files changed, 120 insertions(+), 3 deletions(-)