Message ID | 20210615203807.3582804-1-pcc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis | expand |
Hi Peter, On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > On some CPUs the performance of MTE in synchronous mode is similar > to that of asynchronous mode. This makes it worthwhile to enable > synchronous mode on those CPUs when asynchronous mode is requested, > in order to gain the error detection benefits of synchronous mode > without the performance downsides. Therefore, make it possible for > user programs to opt into upgrading to synchronous mode on those CPUs > via a new prctl flag. The flag is orthogonal to the existing TCF modes > in order to accommodate upgrading from other TCF modes in the future. > > The feature is controlled on a per-CPU basis via sysfs. > > Signed-off-by: Peter Collingbourne <pcc@google.com> > Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8 > --- > v5: > - updated documentation > - address some nits in mte.c > > v4: > - switch to new mte_ctrl field > - make register_mte_upgrade_async_sysctl return an int > - change the sysctl to take 0 or 1 instead of raw TCF values > - "same as" -> "similar to" > > v3: > - drop the device tree support > - add documentation > - add static_assert to ensure no overlap with real HW bits > - move per-CPU variable initialization to mte.c > - use smp_call_function_single instead of stop_machine > > v2: > - make it an opt-in behavior > - change the format of the device tree node > - also allow controlling the feature via sysfs > > .../arm64/memory-tagging-extension.rst | 20 +++ > arch/arm64/include/asm/mte.h | 4 + > arch/arm64/include/asm/processor.h | 14 +- > arch/arm64/kernel/asm-offsets.c | 2 +- > arch/arm64/kernel/entry.S | 4 +- > arch/arm64/kernel/mte.c | 151 ++++++++++++++---- > arch/arm64/kernel/process.c | 2 +- > include/uapi/linux/prctl.h | 2 + > 8 files changed, 162 insertions(+), 37 deletions(-) > > diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst > index b540178a93f8..5375d5efd18c 100644 > --- a/Documentation/arm64/memory-tagging-extension.rst > +++ b/Documentation/arm64/memory-tagging-extension.rst > @@ -120,6 +120,25 @@ in the ``PR_MTE_TAG_MASK`` bit-field. > interface provides an include mask. An include mask of ``0`` (exclusion > mask ``0xffff``) results in the CPU always generating tag ``0``. > > +Upgrading to stricter tag checking modes > +---------------------------------------- > + > +On some CPUs the performance of MTE in stricter tag checking modes > +is similar to that of less strict tag checking modes. This makes it > +worthwhile to enable stricter checks on those CPUs when a less strict > +checking mode is requested, in order to gain the error detection > +benefits of the stricter checks without the performance downsides. To > +opt into upgrading to a stricter checking mode on those CPUs, the user > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > + > +This feature is currently only supported for upgrading from > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > +to synchronous mode, a privileged user may write the value ``1`` to > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > +upgrading they may write the value ``0``. By default the feature is > +disabled on all CPUs. > + > Initial process state > --------------------- > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > - ``PSTATE.TCO`` set to 0 > - ``PROT_MTE`` not set on any of the initial memory maps Something about this doesn't sit right with me, as we're mixing a per-task interface with a per-cpu interface for selecting async/sync MTE and the priorities are somewhat confusing. I think a better interface would be for the sysfs entry for each CPU to allow selection between: task : Honour the prctl() (current behaviour) async : Force async for tasks using MTE sync : Force sync for tasks using MTE none : MTE disabled i.e. the per-cpu setting is an override. Would that give you what you need? Will
On Thu, Jun 17, 2021 at 2:58 PM Will Deacon <will@kernel.org> wrote: > > Hi Peter, > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > On some CPUs the performance of MTE in synchronous mode is similar > > to that of asynchronous mode. This makes it worthwhile to enable > > synchronous mode on those CPUs when asynchronous mode is requested, > > in order to gain the error detection benefits of synchronous mode > > without the performance downsides. Therefore, make it possible for > > user programs to opt into upgrading to synchronous mode on those CPUs > > via a new prctl flag. The flag is orthogonal to the existing TCF modes > > in order to accommodate upgrading from other TCF modes in the future. > > > > The feature is controlled on a per-CPU basis via sysfs. > > > > Signed-off-by: Peter Collingbourne <pcc@google.com> > > Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8 > > --- > > v5: > > - updated documentation > > - address some nits in mte.c > > > > v4: > > - switch to new mte_ctrl field > > - make register_mte_upgrade_async_sysctl return an int > > - change the sysctl to take 0 or 1 instead of raw TCF values > > - "same as" -> "similar to" > > > > v3: > > - drop the device tree support > > - add documentation > > - add static_assert to ensure no overlap with real HW bits > > - move per-CPU variable initialization to mte.c > > - use smp_call_function_single instead of stop_machine > > > > v2: > > - make it an opt-in behavior > > - change the format of the device tree node > > - also allow controlling the feature via sysfs > > > > .../arm64/memory-tagging-extension.rst | 20 +++ > > arch/arm64/include/asm/mte.h | 4 + > > arch/arm64/include/asm/processor.h | 14 +- > > arch/arm64/kernel/asm-offsets.c | 2 +- > > arch/arm64/kernel/entry.S | 4 +- > > arch/arm64/kernel/mte.c | 151 ++++++++++++++---- > > arch/arm64/kernel/process.c | 2 +- > > include/uapi/linux/prctl.h | 2 + > > 8 files changed, 162 insertions(+), 37 deletions(-) > > > > diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst > > index b540178a93f8..5375d5efd18c 100644 > > --- a/Documentation/arm64/memory-tagging-extension.rst > > +++ b/Documentation/arm64/memory-tagging-extension.rst > > @@ -120,6 +120,25 @@ in the ``PR_MTE_TAG_MASK`` bit-field. > > interface provides an include mask. An include mask of ``0`` (exclusion > > mask ``0xffff``) results in the CPU always generating tag ``0``. > > > > +Upgrading to stricter tag checking modes > > +---------------------------------------- > > + > > +On some CPUs the performance of MTE in stricter tag checking modes > > +is similar to that of less strict tag checking modes. This makes it > > +worthwhile to enable stricter checks on those CPUs when a less strict > > +checking mode is requested, in order to gain the error detection > > +benefits of the stricter checks without the performance downsides. To > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > + > > +This feature is currently only supported for upgrading from > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > +to synchronous mode, a privileged user may write the value ``1`` to > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > +upgrading they may write the value ``0``. By default the feature is > > +disabled on all CPUs. > > + > > Initial process state > > --------------------- > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > - ``PSTATE.TCO`` set to 0 > > - ``PROT_MTE`` not set on any of the initial memory maps > > Something about this doesn't sit right with me, as we're mixing a per-task > interface with a per-cpu interface for selecting async/sync MTE and the > priorities are somewhat confusing. > > I think a better interface would be for the sysfs entry for each CPU to > allow selection between: > > task : Honour the prctl() (current behaviour) > async : Force async for tasks using MTE > sync : Force sync for tasks using MTE > none : MTE disabled > > i.e. the per-cpu setting is an override. > > Would that give you what you need? The approach in v1 of the patch [1] was that the per-CPU setting (at that time a DT attribute) unconditionally overrode the TCF setting provided by the user, so in that respect it's similar to what you proposed, however Catalin and Vincenzo considered it to be an ABI break, which I don't 100% agree with, but I think it's a fair enough criticism. I also don't think the setting you've mentioned provides enough flexibility, especially when asymmetric support is added [2], and in some cases can force a *downgrade* of the TCF setting to a weaker one, which doesn't seem desirable. For example sync -> async weakens security and async/sync -> none does the same as well as being more clearly an ABI break. Peter [1] https://lore.kernel.org/linux-arm-kernel/20210602232445.3829248-1-pcc@google.com/ [2] https://lore.kernel.org/linux-arm-kernel/CAMn1gO7b0qMYMFz45eKdjNQV24V9YH9nqDgUpSbX5WJfkaJzCg@mail.gmail.com/
On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > +Upgrading to stricter tag checking modes > > +---------------------------------------- > > + > > +On some CPUs the performance of MTE in stricter tag checking modes > > +is similar to that of less strict tag checking modes. This makes it > > +worthwhile to enable stricter checks on those CPUs when a less strict > > +checking mode is requested, in order to gain the error detection > > +benefits of the stricter checks without the performance downsides. To > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > + > > +This feature is currently only supported for upgrading from > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > +to synchronous mode, a privileged user may write the value ``1`` to > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > +upgrading they may write the value ``0``. By default the feature is > > +disabled on all CPUs. > > + > > Initial process state > > --------------------- > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > - ``PSTATE.TCO`` set to 0 > > - ``PROT_MTE`` not set on any of the initial memory maps > > Something about this doesn't sit right with me, as we're mixing a per-task > interface with a per-cpu interface for selecting async/sync MTE and the > priorities are somewhat confusing. > > I think a better interface would be for the sysfs entry for each CPU to > allow selection between: > > task : Honour the prctl() (current behaviour) > async : Force async for tasks using MTE > sync : Force sync for tasks using MTE > none : MTE disabled > > i.e. the per-cpu setting is an override. As Peter mentioned, forcing it is a potential ABI break, so such feature would need backporting to 5.10. There's also a minor use-case that came up in the early discussions - an app may want to use async mode only for reporting but forcing it to sync would break such application (since sync mode prevents the faulty access from taking place). So I'd rather leave it up to the user task to decide whether its choice can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this purpose (or a different name if you have a better suggestion). I think the other important question is whether we go for an override style or an upgrade one. Peter chose the latter, though I think an override is simpler to understand. BTW, I like the idea of using strings in the sysfs interface than numbers.
On Fri, Jun 18, 2021 at 8:10 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > +Upgrading to stricter tag checking modes > > > +---------------------------------------- > > > + > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > +is similar to that of less strict tag checking modes. This makes it > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > +checking mode is requested, in order to gain the error detection > > > +benefits of the stricter checks without the performance downsides. To > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > + > > > +This feature is currently only supported for upgrading from > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > +upgrading they may write the value ``0``. By default the feature is > > > +disabled on all CPUs. > > > + > > > Initial process state > > > --------------------- > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > - ``PSTATE.TCO`` set to 0 > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > interface with a per-cpu interface for selecting async/sync MTE and the > > priorities are somewhat confusing. > > > > I think a better interface would be for the sysfs entry for each CPU to > > allow selection between: > > > > task : Honour the prctl() (current behaviour) > > async : Force async for tasks using MTE > > sync : Force sync for tasks using MTE > > none : MTE disabled > > > > i.e. the per-cpu setting is an override. > > As Peter mentioned, forcing it is a potential ABI break, so such feature > would need backporting to 5.10. There's also a minor use-case that came > up in the early discussions - an app may want to use async mode only for > reporting but forcing it to sync would break such application (since > sync mode prevents the faulty access from taking place). > > So I'd rather leave it up to the user task to decide whether its choice > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > purpose (or a different name if you have a better suggestion). > > I think the other important question is whether we go for an override > style or an upgrade one. Peter chose the latter, though I think an > override is simpler to understand. > > BTW, I like the idea of using strings in the sysfs interface than > numbers. Agreed on the strings in the sysfs interface; done in v6. Peter
On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > +Upgrading to stricter tag checking modes > > > +---------------------------------------- > > > + > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > +is similar to that of less strict tag checking modes. This makes it > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > +checking mode is requested, in order to gain the error detection > > > +benefits of the stricter checks without the performance downsides. To > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > + > > > +This feature is currently only supported for upgrading from > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > +upgrading they may write the value ``0``. By default the feature is > > > +disabled on all CPUs. > > > + > > > Initial process state > > > --------------------- > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > - ``PSTATE.TCO`` set to 0 > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > interface with a per-cpu interface for selecting async/sync MTE and the > > priorities are somewhat confusing. > > > > I think a better interface would be for the sysfs entry for each CPU to > > allow selection between: > > > > task : Honour the prctl() (current behaviour) > > async : Force async for tasks using MTE > > sync : Force sync for tasks using MTE > > none : MTE disabled > > > > i.e. the per-cpu setting is an override. > > As Peter mentioned, forcing it is a potential ABI break, so such feature > would need backporting to 5.10. There's also a minor use-case that came > up in the early discussions - an app may want to use async mode only for > reporting but forcing it to sync would break such application (since > sync mode prevents the faulty access from taking place). Why is it an ABI break? The default will be "task" which behaves exactly as things do today. If the policy is explicitly changed by userspace, then you get new behaviour. I don't really see why this is different to e.g. writing /proc/sys/vm/mmap_min_addr and having some applications fail because they rely on the default setting. > So I'd rather leave it up to the user task to decide whether its choice > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > purpose (or a different name if you have a better suggestion). I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an extra bit of logic to go and set it, but then what? It doesn't know which behaviour it's getting, so it just feels like an extra hoop to jump through without actually adding anything useful. Will
On Mon, Jun 21, 2021 at 01:39:37PM +0100, Will Deacon wrote: > On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > > +Upgrading to stricter tag checking modes > > > > +---------------------------------------- > > > > + > > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > > +is similar to that of less strict tag checking modes. This makes it > > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > > +checking mode is requested, in order to gain the error detection > > > > +benefits of the stricter checks without the performance downsides. To > > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > > + > > > > +This feature is currently only supported for upgrading from > > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > > +upgrading they may write the value ``0``. By default the feature is > > > > +disabled on all CPUs. > > > > + > > > > Initial process state > > > > --------------------- > > > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > > - ``PSTATE.TCO`` set to 0 > > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > > interface with a per-cpu interface for selecting async/sync MTE and the > > > priorities are somewhat confusing. > > > > > > I think a better interface would be for the sysfs entry for each CPU to > > > allow selection between: > > > > > > task : Honour the prctl() (current behaviour) > > > async : Force async for tasks using MTE > > > sync : Force sync for tasks using MTE > > > none : MTE disabled > > > > > > i.e. the per-cpu setting is an override. > > > > As Peter mentioned, forcing it is a potential ABI break, so such feature > > would need backporting to 5.10. There's also a minor use-case that came > > up in the early discussions - an app may want to use async mode only for > > reporting but forcing it to sync would break such application (since > > sync mode prevents the faulty access from taking place). > > Why is it an ABI break? The default will be "task" which behaves exactly as > things do today. If the policy is explicitly changed by userspace, then you > get new behaviour. I don't really see why this is different to e.g. writing > /proc/sys/vm/mmap_min_addr and having some applications fail because they > rely on the default setting. That's slightly different from mmap_min_addr, it depends on the user expectations. Most applications have no interest in a MAP_FIXED of address 0, so they wouldn't notice a non-zero setting. The semantics of MTE TCF none/sync/async are different and an application may have different expectations. For example, it may not want any tag checking, just being able to read/write tags. Or it may want just some lightweight monitoring with a simple restart after an async signal (sync requires either emulating the access or setting the PSTATE.TCO bit). Forcing the TCF via sysfs may be seen as a user problem but that's pretty much rendering the application choice of the tag check mode useless since an admin could override it. > > So I'd rather leave it up to the user task to decide whether its choice > > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > > purpose (or a different name if you have a better suggestion). > > I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an > extra bit of logic to go and set it, but then what? It doesn't know which > behaviour it's getting, so it just feels like an extra hoop to jump through > without actually adding anything useful. Well, without this additional bit, an application can't rely on the mode it requested. With an additional forced tagged address enable, we might as well remove the prctl() altogether (well, that wouldn't be a bad thing). Given that there are no real users of MTE yet, we have some choice of tweaking the ABI, backporting to 5.10. The question: is the expectation that the sysfs forcing of TCF is limited to deployments where the user space is tightly controlled (e.g. Android with most apps starting from zygote) or we allow it to become more of a general hint of what's the fastest check on a CPU? If the former, I'm fine with forcing without any additional bit, though I'd still prefer the opt-in. For the latter, I'd like some wider discussion with non-Android folk on what they expect from the TCF setting. Otherwise simply using PROT_MTE would may lead to tag check faults.
On Mon, Jun 21, 2021 at 04:18:59PM +0100, Catalin Marinas wrote: > On Mon, Jun 21, 2021 at 01:39:37PM +0100, Will Deacon wrote: > > On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > > > +Upgrading to stricter tag checking modes > > > > > +---------------------------------------- > > > > > + > > > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > > > +is similar to that of less strict tag checking modes. This makes it > > > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > > > +checking mode is requested, in order to gain the error detection > > > > > +benefits of the stricter checks without the performance downsides. To > > > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > > > + > > > > > +This feature is currently only supported for upgrading from > > > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > > > +upgrading they may write the value ``0``. By default the feature is > > > > > +disabled on all CPUs. > > > > > + > > > > > Initial process state > > > > > --------------------- > > > > > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > > > - ``PSTATE.TCO`` set to 0 > > > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > > > interface with a per-cpu interface for selecting async/sync MTE and the > > > > priorities are somewhat confusing. > > > > > > > > I think a better interface would be for the sysfs entry for each CPU to > > > > allow selection between: > > > > > > > > task : Honour the prctl() (current behaviour) > > > > async : Force async for tasks using MTE > > > > sync : Force sync for tasks using MTE > > > > none : MTE disabled > > > > > > > > i.e. the per-cpu setting is an override. > > > > > > As Peter mentioned, forcing it is a potential ABI break, so such feature > > > would need backporting to 5.10. There's also a minor use-case that came > > > up in the early discussions - an app may want to use async mode only for > > > reporting but forcing it to sync would break such application (since > > > sync mode prevents the faulty access from taking place). > > > > Why is it an ABI break? The default will be "task" which behaves exactly as > > things do today. If the policy is explicitly changed by userspace, then you > > get new behaviour. I don't really see why this is different to e.g. writing > > /proc/sys/vm/mmap_min_addr and having some applications fail because they > > rely on the default setting. > > That's slightly different from mmap_min_addr, it depends on the user > expectations. Most applications have no interest in a MAP_FIXED of > address 0, so they wouldn't notice a non-zero setting. Not sure; if I set mmap_min_addr to 1GB then applications start to break (32-bit applications SEGV instantly). So I think it's pretty similar. > The semantics of MTE TCF none/sync/async are different and an > application may have different expectations. For example, it may not > want any tag checking, just being able to read/write tags. Or it may > want just some lightweight monitoring with a simple restart after an > async signal (sync requires either emulating the access or setting the > PSTATE.TCO bit). I think this is an argument against doing this at all. Realistically, if PR_MTE_DYNAMIC_TCF is the "go fast" mode, then people will set it blindly and applications will just be expected to deal with a combination of sync and async faults; I really don't think the flag changes anything in that regards. It's not a buy-in from the application, given that the visible behaviour is unknown and dynamic. > Forcing the TCF via sysfs may be seen as a user problem but that's > pretty much rendering the application choice of the tag check mode > useless since an admin could override it. I think the application choice _is_ useless if we decide to offer a mechanism where it is set on a per-cpu basis instead. > > > So I'd rather leave it up to the user task to decide whether its choice > > > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > > > purpose (or a different name if you have a better suggestion). > > > > I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an > > extra bit of logic to go and set it, but then what? It doesn't know which > > behaviour it's getting, so it just feels like an extra hoop to jump through > > without actually adding anything useful. > > Well, without this additional bit, an application can't rely on the mode > it requested. With an additional forced tagged address enable, we might > as well remove the prctl() altogether (well, that wouldn't be a bad > thing). I think the prctl() is still useful to say whether an application wants MTE or not, but I'm inclined to agree that sync vs async shouldn't be part of the prctl() call. > Given that there are no real users of MTE yet, we have some choice of > tweaking the ABI, backporting to 5.10. The question: is the expectation > that the sysfs forcing of TCF is limited to deployments where the user > space is tightly controlled (e.g. Android with most apps starting from > zygote) or we allow it to become more of a general hint of what's the > fastest check on a CPU? If the former, I'm fine with forcing without any > additional bit, though I'd still prefer the opt-in. For the latter, I'd > like some wider discussion with non-Android folk on what they expect > from the TCF setting. Otherwise simply using PROT_MTE would may lead to > tag check faults. I don't think there's anything Android-specific here. The problem being solved concerns big/little SoCs with MTE, and I think it's up to the distribution how the sysfs stuff is used. Will
On Mon, Jun 21, 2021 at 06:39:02PM +0100, Will Deacon wrote: > On Mon, Jun 21, 2021 at 04:18:59PM +0100, Catalin Marinas wrote: > > On Mon, Jun 21, 2021 at 01:39:37PM +0100, Will Deacon wrote: > > > On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > > > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > > > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > > > > +Upgrading to stricter tag checking modes > > > > > > +---------------------------------------- > > > > > > + > > > > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > > > > +is similar to that of less strict tag checking modes. This makes it > > > > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > > > > +checking mode is requested, in order to gain the error detection > > > > > > +benefits of the stricter checks without the performance downsides. To > > > > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > > > > + > > > > > > +This feature is currently only supported for upgrading from > > > > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > > > > +upgrading they may write the value ``0``. By default the feature is > > > > > > +disabled on all CPUs. > > > > > > + > > > > > > Initial process state > > > > > > --------------------- > > > > > > > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > > > > - ``PSTATE.TCO`` set to 0 > > > > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > > > > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > > > > interface with a per-cpu interface for selecting async/sync MTE and the > > > > > priorities are somewhat confusing. > > > > > > > > > > I think a better interface would be for the sysfs entry for each CPU to > > > > > allow selection between: > > > > > > > > > > task : Honour the prctl() (current behaviour) > > > > > async : Force async for tasks using MTE > > > > > sync : Force sync for tasks using MTE > > > > > none : MTE disabled > > > > > > > > > > i.e. the per-cpu setting is an override. > > > > > > > > As Peter mentioned, forcing it is a potential ABI break, so such feature > > > > would need backporting to 5.10. There's also a minor use-case that came > > > > up in the early discussions - an app may want to use async mode only for > > > > reporting but forcing it to sync would break such application (since > > > > sync mode prevents the faulty access from taking place). > > > > > > Why is it an ABI break? The default will be "task" which behaves exactly as > > > things do today. If the policy is explicitly changed by userspace, then you > > > get new behaviour. I don't really see why this is different to e.g. writing > > > /proc/sys/vm/mmap_min_addr and having some applications fail because they > > > rely on the default setting. > > > > That's slightly different from mmap_min_addr, it depends on the user > > expectations. Most applications have no interest in a MAP_FIXED of > > address 0, so they wouldn't notice a non-zero setting. > > Not sure; if I set mmap_min_addr to 1GB then applications start to break > (32-bit applications SEGV instantly). So I think it's pretty similar. You don't expect applications to be tolerant to randomly high mmap_min_addr, hence admins don't set this to more than a page. However, for the proposed TCF sysfs interface, we do expect applications to cope with any forced value, either by documenting that the PR_* settings are useless or allowing an application to opt in to being forced. Either way, it's a significant difference between mmap_min_addr and the proposed MTE control. If you imply that people shouldn't use the new TCF sysfs controls because it may break applications, that's fine as well, we document it as a dangerous feature, only to be used if you have control of the user-space deployment (Android is an ideal target here). > > The semantics of MTE TCF none/sync/async are different and an > > application may have different expectations. For example, it may not > > want any tag checking, just being able to read/write tags. Or it may > > want just some lightweight monitoring with a simple restart after an > > async signal (sync requires either emulating the access or setting the > > PSTATE.TCO bit). > > I think this is an argument against doing this at all. Realistically, > if PR_MTE_DYNAMIC_TCF is the "go fast" mode, then people will set it > blindly and applications will just be expected to deal with a combination > of sync and async faults; I really don't think the flag changes anything in > that regards. It's not a buy-in from the application, given that the visible > behaviour is unknown and dynamic. Most applications won't care about the mode, they just want to crash when some incorrect access happens. But not having an opt-in (or opt-out, it works either way) a minority of other use-cases are no longer possible. Apps can't even rely on having the same TCF for the lifetime of a thread. > > Forcing the TCF via sysfs may be seen as a user problem but that's > > pretty much rendering the application choice of the tag check mode > > useless since an admin could override it. > > I think the application choice _is_ useless if we decide to offer a > mechanism where it is set on a per-cpu basis instead. It depends on how we implement this mechanism. Do we want to preserve the application choice via an opt-in/out or we remove such option entirely? That's a significant ABI change IMO. Just stating that setting the sysfs to anything other than "task" breaks applications looks like discouraging people from using it. Maybe we can even add an EXPERT dependency. > > > > So I'd rather leave it up to the user task to decide whether its choice > > > > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > > > > purpose (or a different name if you have a better suggestion). > > > > > > I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an > > > extra bit of logic to go and set it, but then what? It doesn't know which > > > behaviour it's getting, so it just feels like an extra hoop to jump through > > > without actually adding anything useful. > > > > Well, without this additional bit, an application can't rely on the mode > > it requested. With an additional forced tagged address enable, we might > > as well remove the prctl() altogether (well, that wouldn't be a bad > > thing). > > I think the prctl() is still useful to say whether an application wants MTE > or not, but I'm inclined to agree that sync vs async shouldn't be part of > the prctl() call. That means that if the user choice was PR_MTE_TCF_NONE, it won't be forced to any of the sync vs async, which makes sense if we go this route. The async/sync (and a new asymmetric mode in v8.7) could become just a PR_MTE_TCF_CHECKED. However, if glibc upstreamed the MTE support already, it will be harder to remove the PR_* controls already defined (well, we can still keep the sync/async as a hint) > > Given that there are no real users of MTE yet, we have some choice of > > tweaking the ABI, backporting to 5.10. The question: is the expectation > > that the sysfs forcing of TCF is limited to deployments where the user > > space is tightly controlled (e.g. Android with most apps starting from > > zygote) or we allow it to become more of a general hint of what's the > > fastest check on a CPU? If the former, I'm fine with forcing without any > > additional bit, though I'd still prefer the opt-in. For the latter, I'd > > like some wider discussion with non-Android folk on what they expect > > from the TCF setting. Otherwise simply using PROT_MTE would may lead to > > tag check faults. > > I don't think there's anything Android-specific here. The problem being > solved concerns big/little SoCs with MTE, and I think it's up to the > distribution how the sysfs stuff is used. But distros don't control what applications are running, so most likely they would disable the sysfs control entirely. At that point, the feature becomes primarily an Android play. Anyway, I'm not dead against forcing of the TCF mode regardless of the user choice but I'd like to ensure that we don't miss other use-cases or that we don't make the sysfs control an expert-only feature. Adding Szabolcs to get a view from the glibc perspective.
On Mon, Jun 21, 2021 at 11:50 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Jun 21, 2021 at 06:39:02PM +0100, Will Deacon wrote: > > On Mon, Jun 21, 2021 at 04:18:59PM +0100, Catalin Marinas wrote: > > > On Mon, Jun 21, 2021 at 01:39:37PM +0100, Will Deacon wrote: > > > > On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > > > > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > > > > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > > > > > +Upgrading to stricter tag checking modes > > > > > > > +---------------------------------------- > > > > > > > + > > > > > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > > > > > +is similar to that of less strict tag checking modes. This makes it > > > > > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > > > > > +checking mode is requested, in order to gain the error detection > > > > > > > +benefits of the stricter checks without the performance downsides. To > > > > > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > > > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > > > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > > > > > + > > > > > > > +This feature is currently only supported for upgrading from > > > > > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > > > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > > > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > > > > > +upgrading they may write the value ``0``. By default the feature is > > > > > > > +disabled on all CPUs. > > > > > > > + > > > > > > > Initial process state > > > > > > > --------------------- > > > > > > > > > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > > > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > > > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > > > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > > > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > > > > > - ``PSTATE.TCO`` set to 0 > > > > > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > > > > > > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > > > > > interface with a per-cpu interface for selecting async/sync MTE and the > > > > > > priorities are somewhat confusing. > > > > > > > > > > > > I think a better interface would be for the sysfs entry for each CPU to > > > > > > allow selection between: > > > > > > > > > > > > task : Honour the prctl() (current behaviour) > > > > > > async : Force async for tasks using MTE > > > > > > sync : Force sync for tasks using MTE > > > > > > none : MTE disabled > > > > > > > > > > > > i.e. the per-cpu setting is an override. > > > > > > > > > > As Peter mentioned, forcing it is a potential ABI break, so such feature > > > > > would need backporting to 5.10. There's also a minor use-case that came > > > > > up in the early discussions - an app may want to use async mode only for > > > > > reporting but forcing it to sync would break such application (since > > > > > sync mode prevents the faulty access from taking place). > > > > > > > > Why is it an ABI break? The default will be "task" which behaves exactly as > > > > things do today. If the policy is explicitly changed by userspace, then you > > > > get new behaviour. I don't really see why this is different to e.g. writing > > > > /proc/sys/vm/mmap_min_addr and having some applications fail because they > > > > rely on the default setting. > > > > > > That's slightly different from mmap_min_addr, it depends on the user > > > expectations. Most applications have no interest in a MAP_FIXED of > > > address 0, so they wouldn't notice a non-zero setting. > > > > Not sure; if I set mmap_min_addr to 1GB then applications start to break > > (32-bit applications SEGV instantly). So I think it's pretty similar. > > You don't expect applications to be tolerant to randomly high > mmap_min_addr, hence admins don't set this to more than a page. However, > for the proposed TCF sysfs interface, we do expect applications to cope > with any forced value, either by documenting that the PR_* settings are > useless or allowing an application to opt in to being forced. Either > way, it's a significant difference between mmap_min_addr and the > proposed MTE control. > > If you imply that people shouldn't use the new TCF sysfs controls > because it may break applications, that's fine as well, we document it > as a dangerous feature, only to be used if you have control of the > user-space deployment (Android is an ideal target here). > > > > The semantics of MTE TCF none/sync/async are different and an > > > application may have different expectations. For example, it may not > > > want any tag checking, just being able to read/write tags. Or it may > > > want just some lightweight monitoring with a simple restart after an > > > async signal (sync requires either emulating the access or setting the > > > PSTATE.TCO bit). > > > > I think this is an argument against doing this at all. Realistically, > > if PR_MTE_DYNAMIC_TCF is the "go fast" mode, then people will set it > > blindly and applications will just be expected to deal with a combination > > of sync and async faults; I really don't think the flag changes anything in > > that regards. It's not a buy-in from the application, given that the visible > > behaviour is unknown and dynamic. > > Most applications won't care about the mode, they just want to crash > when some incorrect access happens. But not having an opt-in (or > opt-out, it works either way) a minority of other use-cases are no > longer possible. Apps can't even rely on having the same TCF for the > lifetime of a thread. > > > > Forcing the TCF via sysfs may be seen as a user problem but that's > > > pretty much rendering the application choice of the tag check mode > > > useless since an admin could override it. > > > > I think the application choice _is_ useless if we decide to offer a > > mechanism where it is set on a per-cpu basis instead. > > It depends on how we implement this mechanism. Do we want to preserve > the application choice via an opt-in/out or we remove such option > entirely? That's a significant ABI change IMO. > > Just stating that setting the sysfs to anything other than "task" breaks > applications looks like discouraging people from using it. Maybe we can > even add an EXPERT dependency. > > > > > > So I'd rather leave it up to the user task to decide whether its choice > > > > > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > > > > > purpose (or a different name if you have a better suggestion). > > > > > > > > I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an > > > > extra bit of logic to go and set it, but then what? It doesn't know which > > > > behaviour it's getting, so it just feels like an extra hoop to jump through > > > > without actually adding anything useful. > > > > > > Well, without this additional bit, an application can't rely on the mode > > > it requested. With an additional forced tagged address enable, we might > > > as well remove the prctl() altogether (well, that wouldn't be a bad > > > thing). > > > > I think the prctl() is still useful to say whether an application wants MTE > > or not, but I'm inclined to agree that sync vs async shouldn't be part of > > the prctl() call. > > That means that if the user choice was PR_MTE_TCF_NONE, it won't be > forced to any of the sync vs async, which makes sense if we go this > route. The async/sync (and a new asymmetric mode in v8.7) could become > just a PR_MTE_TCF_CHECKED. However, if glibc upstreamed the MTE support > already, it will be harder to remove the PR_* controls already defined > (well, we can still keep the sync/async as a hint) For Android I think we will always want there to be a way to select the "minimum" mode. This is because sync mode provides slightly more security than async, as well as being relied on for deterministic error reporting. If we just have PR_MTE_TCF_CHECKED and sync/async is taken as a hint, it means that we can't rely on the processor to be in sync mode when it takes a tag check fault, so we won't necessarily see the error report or detect the fault early enough. > > > Given that there are no real users of MTE yet, we have some choice of > > > tweaking the ABI, backporting to 5.10. The question: is the expectation > > > that the sysfs forcing of TCF is limited to deployments where the user > > > space is tightly controlled (e.g. Android with most apps starting from > > > zygote) or we allow it to become more of a general hint of what's the > > > fastest check on a CPU? If the former, I'm fine with forcing without any > > > additional bit, though I'd still prefer the opt-in. For the latter, I'd > > > like some wider discussion with non-Android folk on what they expect > > > from the TCF setting. Otherwise simply using PROT_MTE would may lead to > > > tag check faults. > > > > I don't think there's anything Android-specific here. The problem being > > solved concerns big/little SoCs with MTE, and I think it's up to the > > distribution how the sysfs stuff is used. > > But distros don't control what applications are running, so most likely > they would disable the sysfs control entirely. At that point, the > feature becomes primarily an Android play. > > Anyway, I'm not dead against forcing of the TCF mode regardless of the > user choice but I'd like to ensure that we don't miss other use-cases or > that we don't make the sysfs control an expert-only feature. > > Adding Szabolcs to get a view from the glibc perspective. Given these diverging opinions my view is that we should choose whichever option leaves our options open for the future. For example, imagine that we make the ABI change now such that upgrades may happen for all applications and we don't have PR_MTE_DYNAMIC_TCF. This means that applications no longer have a guarantee of their TCF mode which may preclude some use cases (if we add an opt out later, applications will be affected when running on the kernel versions between when we changed the ABI and when we added the opt out). On the other hand, if we introduce PR_MTE_DYNAMIC_TCF, we can always make the ABI change later and start ignoring the PR_MTE_DYNAMIC_TCF flag at that point. Maybe the best compromise would be to change the ABI and at the same time add the opt out, but I don't have a strong opinion. Peter
The 06/22/2021 11:37, Peter Collingbourne wrote: > On Mon, Jun 21, 2021 at 11:50 AM Catalin Marinas > <catalin.marinas@arm.com> wrote: > > On Mon, Jun 21, 2021 at 06:39:02PM +0100, Will Deacon wrote: > > > On Mon, Jun 21, 2021 at 04:18:59PM +0100, Catalin Marinas wrote: > > > > On Mon, Jun 21, 2021 at 01:39:37PM +0100, Will Deacon wrote: > > > > > On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > > > > > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > > > > > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > > > > > > +Upgrading to stricter tag checking modes > > > > > > > > +---------------------------------------- > > > > > > > > + > > > > > > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > > > > > > +is similar to that of less strict tag checking modes. This makes it > > > > > > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > > > > > > +checking mode is requested, in order to gain the error detection > > > > > > > > +benefits of the stricter checks without the performance downsides. To > > > > > > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > > > > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > > > > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > > > > > > + > > > > > > > > +This feature is currently only supported for upgrading from > > > > > > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > > > > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > > > > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > > > > > > +upgrading they may write the value ``0``. By default the feature is > > > > > > > > +disabled on all CPUs. > > > > > > > > + > > > > > > > > Initial process state > > > > > > > > --------------------- > > > > > > > > > > > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > > > > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > > > > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > > > > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > > > > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > > > > > > - ``PSTATE.TCO`` set to 0 > > > > > > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > > > > > > > > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > > > > > > interface with a per-cpu interface for selecting async/sync MTE and the > > > > > > > priorities are somewhat confusing. > > > > > > > > > > > > > > I think a better interface would be for the sysfs entry for each CPU to > > > > > > > allow selection between: > > > > > > > > > > > > > > task : Honour the prctl() (current behaviour) > > > > > > > async : Force async for tasks using MTE > > > > > > > sync : Force sync for tasks using MTE > > > > > > > none : MTE disabled > > > > > > > > > > > > > > i.e. the per-cpu setting is an override. > > > > > > > > > > > > As Peter mentioned, forcing it is a potential ABI break, so such feature > > > > > > would need backporting to 5.10. There's also a minor use-case that came > > > > > > up in the early discussions - an app may want to use async mode only for > > > > > > reporting but forcing it to sync would break such application (since > > > > > > sync mode prevents the faulty access from taking place). > > > > > > > > > > Why is it an ABI break? The default will be "task" which behaves exactly as > > > > > things do today. If the policy is explicitly changed by userspace, then you > > > > > get new behaviour. I don't really see why this is different to e.g. writing > > > > > /proc/sys/vm/mmap_min_addr and having some applications fail because they > > > > > rely on the default setting. > > > > > > > > That's slightly different from mmap_min_addr, it depends on the user > > > > expectations. Most applications have no interest in a MAP_FIXED of > > > > address 0, so they wouldn't notice a non-zero setting. > > > > > > Not sure; if I set mmap_min_addr to 1GB then applications start to break > > > (32-bit applications SEGV instantly). So I think it's pretty similar. > > > > You don't expect applications to be tolerant to randomly high > > mmap_min_addr, hence admins don't set this to more than a page. However, > > for the proposed TCF sysfs interface, we do expect applications to cope > > with any forced value, either by documenting that the PR_* settings are > > useless or allowing an application to opt in to being forced. Either > > way, it's a significant difference between mmap_min_addr and the > > proposed MTE control. > > > > If you imply that people shouldn't use the new TCF sysfs controls > > because it may break applications, that's fine as well, we document it > > as a dangerous feature, only to be used if you have control of the > > user-space deployment (Android is an ideal target here). > > > > > > The semantics of MTE TCF none/sync/async are different and an > > > > application may have different expectations. For example, it may not > > > > want any tag checking, just being able to read/write tags. Or it may > > > > want just some lightweight monitoring with a simple restart after an > > > > async signal (sync requires either emulating the access or setting the > > > > PSTATE.TCO bit). > > > > > > I think this is an argument against doing this at all. Realistically, > > > if PR_MTE_DYNAMIC_TCF is the "go fast" mode, then people will set it > > > blindly and applications will just be expected to deal with a combination > > > of sync and async faults; I really don't think the flag changes anything in > > > that regards. It's not a buy-in from the application, given that the visible > > > behaviour is unknown and dynamic. > > > > Most applications won't care about the mode, they just want to crash > > when some incorrect access happens. But not having an opt-in (or > > opt-out, it works either way) a minority of other use-cases are no > > longer possible. Apps can't even rely on having the same TCF for the > > lifetime of a thread. > > > > > > Forcing the TCF via sysfs may be seen as a user problem but that's > > > > pretty much rendering the application choice of the tag check mode > > > > useless since an admin could override it. > > > > > > I think the application choice _is_ useless if we decide to offer a > > > mechanism where it is set on a per-cpu basis instead. > > > > It depends on how we implement this mechanism. Do we want to preserve > > the application choice via an opt-in/out or we remove such option > > entirely? That's a significant ABI change IMO. > > > > Just stating that setting the sysfs to anything other than "task" breaks > > applications looks like discouraging people from using it. Maybe we can > > even add an EXPERT dependency. > > > > > > > > So I'd rather leave it up to the user task to decide whether its choice > > > > > > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > > > > > > purpose (or a different name if you have a better suggestion). > > > > > > > > > > I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an > > > > > extra bit of logic to go and set it, but then what? It doesn't know which > > > > > behaviour it's getting, so it just feels like an extra hoop to jump through > > > > > without actually adding anything useful. > > > > > > > > Well, without this additional bit, an application can't rely on the mode > > > > it requested. With an additional forced tagged address enable, we might > > > > as well remove the prctl() altogether (well, that wouldn't be a bad > > > > thing). > > > > > > I think the prctl() is still useful to say whether an application wants MTE > > > or not, but I'm inclined to agree that sync vs async shouldn't be part of > > > the prctl() call. > > > > That means that if the user choice was PR_MTE_TCF_NONE, it won't be > > forced to any of the sync vs async, which makes sense if we go this > > route. The async/sync (and a new asymmetric mode in v8.7) could become > > just a PR_MTE_TCF_CHECKED. However, if glibc upstreamed the MTE support > > already, it will be harder to remove the PR_* controls already defined > > (well, we can still keep the sync/async as a hint) > > For Android I think we will always want there to be a way to select > the "minimum" mode. This is because sync mode provides slightly more > security than async, as well as being relied on for deterministic > error reporting. If we just have PR_MTE_TCF_CHECKED and sync/async is > taken as a hint, it means that we can't rely on the processor to be in > sync mode when it takes a tag check fault, so we won't necessarily see > the error report or detect the fault early enough. > > > > > Given that there are no real users of MTE yet, we have some choice of > > > > tweaking the ABI, backporting to 5.10. The question: is the expectation > > > > that the sysfs forcing of TCF is limited to deployments where the user > > > > space is tightly controlled (e.g. Android with most apps starting from > > > > zygote) or we allow it to become more of a general hint of what's the > > > > fastest check on a CPU? If the former, I'm fine with forcing without any > > > > additional bit, though I'd still prefer the opt-in. For the latter, I'd > > > > like some wider discussion with non-Android folk on what they expect > > > > from the TCF setting. Otherwise simply using PROT_MTE would may lead to > > > > tag check faults. > > > > > > I don't think there's anything Android-specific here. The problem being > > > solved concerns big/little SoCs with MTE, and I think it's up to the > > > distribution how the sysfs stuff is used. > > > > But distros don't control what applications are running, so most likely > > they would disable the sysfs control entirely. At that point, the > > feature becomes primarily an Android play. > > > > Anyway, I'm not dead against forcing of the TCF mode regardless of the > > user choice but I'd like to ensure that we don't miss other use-cases or > > that we don't make the sysfs control an expert-only feature. > > > > Adding Szabolcs to get a view from the glibc perspective. Adding Tejas as he will look at memory tagging in glibc. > Given these diverging opinions my view is that we should choose > whichever option leaves our options open for the future. For example, > imagine that we make the ABI change now such that upgrades may happen > for all applications and we don't have PR_MTE_DYNAMIC_TCF. This means > that applications no longer have a guarantee of their TCF mode which > may preclude some use cases (if we add an opt out later, applications > will be affected when running on the kernel versions between when we > changed the ABI and when we added the opt out). On the other hand, if > we introduce PR_MTE_DYNAMIC_TCF, we can always make the ABI change > later and start ignoring the PR_MTE_DYNAMIC_TCF flag at that point. > > Maybe the best compromise would be to change the ABI and at the same > time add the opt out, but I don't have a strong opinion. so the observable difference between async mode and async mode upgraded to sync mode is that async mode allows to ignoring the fault and things can continue, while in sync mode the program cannot move forward in case of a fault since the pc is still at the faulting instruction? may be we can have a mode where the cpu is in sync mode checks but the kernel steps over the faulting instruction before reporting it? so emulating async semantics (in a slow and complicated way..), but guaranteeing (almost) immediate faults for better debugging/security. personally i don't see a big issue with a mode that says "either sync or async check behaviour" and user code has to deal with it, however.. the per cpu setting is a bit nasty: can the kernel decide which cpu should use sync and which async? or a privileged user will have to fiddle with sysfs settings on every system to make this useful?
On Wed, Jun 23, 2021 at 1:56 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: > > The 06/22/2021 11:37, Peter Collingbourne wrote: > > On Mon, Jun 21, 2021 at 11:50 AM Catalin Marinas > > <catalin.marinas@arm.com> wrote: > > > On Mon, Jun 21, 2021 at 06:39:02PM +0100, Will Deacon wrote: > > > > On Mon, Jun 21, 2021 at 04:18:59PM +0100, Catalin Marinas wrote: > > > > > On Mon, Jun 21, 2021 at 01:39:37PM +0100, Will Deacon wrote: > > > > > > On Fri, Jun 18, 2021 at 04:09:55PM +0100, Catalin Marinas wrote: > > > > > > > On Thu, Jun 17, 2021 at 10:58:30PM +0100, Will Deacon wrote: > > > > > > > > On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote: > > > > > > > > > +Upgrading to stricter tag checking modes > > > > > > > > > +---------------------------------------- > > > > > > > > > + > > > > > > > > > +On some CPUs the performance of MTE in stricter tag checking modes > > > > > > > > > +is similar to that of less strict tag checking modes. This makes it > > > > > > > > > +worthwhile to enable stricter checks on those CPUs when a less strict > > > > > > > > > +checking mode is requested, in order to gain the error detection > > > > > > > > > +benefits of the stricter checks without the performance downsides. To > > > > > > > > > +opt into upgrading to a stricter checking mode on those CPUs, the user > > > > > > > > > +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument > > > > > > > > > +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. > > > > > > > > > + > > > > > > > > > +This feature is currently only supported for upgrading from > > > > > > > > > +asynchronous mode. To configure a CPU to upgrade from asynchronous mode > > > > > > > > > +to synchronous mode, a privileged user may write the value ``1`` to > > > > > > > > > +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable > > > > > > > > > +upgrading they may write the value ``0``. By default the feature is > > > > > > > > > +disabled on all CPUs. > > > > > > > > > + > > > > > > > > > Initial process state > > > > > > > > > --------------------- > > > > > > > > > > > > > > > > > > @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: > > > > > > > > > - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) > > > > > > > > > - Tag checking mode set to ``PR_MTE_TCF_NONE`` > > > > > > > > > - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) > > > > > > > > > +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) > > > > > > > > > - ``PSTATE.TCO`` set to 0 > > > > > > > > > - ``PROT_MTE`` not set on any of the initial memory maps > > > > > > > > > > > > > > > > Something about this doesn't sit right with me, as we're mixing a per-task > > > > > > > > interface with a per-cpu interface for selecting async/sync MTE and the > > > > > > > > priorities are somewhat confusing. > > > > > > > > > > > > > > > > I think a better interface would be for the sysfs entry for each CPU to > > > > > > > > allow selection between: > > > > > > > > > > > > > > > > task : Honour the prctl() (current behaviour) > > > > > > > > async : Force async for tasks using MTE > > > > > > > > sync : Force sync for tasks using MTE > > > > > > > > none : MTE disabled > > > > > > > > > > > > > > > > i.e. the per-cpu setting is an override. > > > > > > > > > > > > > > As Peter mentioned, forcing it is a potential ABI break, so such feature > > > > > > > would need backporting to 5.10. There's also a minor use-case that came > > > > > > > up in the early discussions - an app may want to use async mode only for > > > > > > > reporting but forcing it to sync would break such application (since > > > > > > > sync mode prevents the faulty access from taking place). > > > > > > > > > > > > Why is it an ABI break? The default will be "task" which behaves exactly as > > > > > > things do today. If the policy is explicitly changed by userspace, then you > > > > > > get new behaviour. I don't really see why this is different to e.g. writing > > > > > > /proc/sys/vm/mmap_min_addr and having some applications fail because they > > > > > > rely on the default setting. > > > > > > > > > > That's slightly different from mmap_min_addr, it depends on the user > > > > > expectations. Most applications have no interest in a MAP_FIXED of > > > > > address 0, so they wouldn't notice a non-zero setting. > > > > > > > > Not sure; if I set mmap_min_addr to 1GB then applications start to break > > > > (32-bit applications SEGV instantly). So I think it's pretty similar. > > > > > > You don't expect applications to be tolerant to randomly high > > > mmap_min_addr, hence admins don't set this to more than a page. However, > > > for the proposed TCF sysfs interface, we do expect applications to cope > > > with any forced value, either by documenting that the PR_* settings are > > > useless or allowing an application to opt in to being forced. Either > > > way, it's a significant difference between mmap_min_addr and the > > > proposed MTE control. > > > > > > If you imply that people shouldn't use the new TCF sysfs controls > > > because it may break applications, that's fine as well, we document it > > > as a dangerous feature, only to be used if you have control of the > > > user-space deployment (Android is an ideal target here). > > > > > > > > The semantics of MTE TCF none/sync/async are different and an > > > > > application may have different expectations. For example, it may not > > > > > want any tag checking, just being able to read/write tags. Or it may > > > > > want just some lightweight monitoring with a simple restart after an > > > > > async signal (sync requires either emulating the access or setting the > > > > > PSTATE.TCO bit). > > > > > > > > I think this is an argument against doing this at all. Realistically, > > > > if PR_MTE_DYNAMIC_TCF is the "go fast" mode, then people will set it > > > > blindly and applications will just be expected to deal with a combination > > > > of sync and async faults; I really don't think the flag changes anything in > > > > that regards. It's not a buy-in from the application, given that the visible > > > > behaviour is unknown and dynamic. > > > > > > Most applications won't care about the mode, they just want to crash > > > when some incorrect access happens. But not having an opt-in (or > > > opt-out, it works either way) a minority of other use-cases are no > > > longer possible. Apps can't even rely on having the same TCF for the > > > lifetime of a thread. > > > > > > > > Forcing the TCF via sysfs may be seen as a user problem but that's > > > > > pretty much rendering the application choice of the tag check mode > > > > > useless since an admin could override it. > > > > > > > > I think the application choice _is_ useless if we decide to offer a > > > > mechanism where it is set on a per-cpu basis instead. > > > > > > It depends on how we implement this mechanism. Do we want to preserve > > > the application choice via an opt-in/out or we remove such option > > > entirely? That's a significant ABI change IMO. > > > > > > Just stating that setting the sysfs to anything other than "task" breaks > > > applications looks like discouraging people from using it. Maybe we can > > > even add an EXPERT dependency. > > > > > > > > > > So I'd rather leave it up to the user task to decide whether its choice > > > > > > > can be changed. Peter introduced a new PR_MTE_DYNAMIC_TCF for this > > > > > > > purpose (or a different name if you have a better suggestion). > > > > > > > > > > > > I don't see how PR_MTE_DYNAMIC_TCF is useful to userspace, really. It's an > > > > > > extra bit of logic to go and set it, but then what? It doesn't know which > > > > > > behaviour it's getting, so it just feels like an extra hoop to jump through > > > > > > without actually adding anything useful. > > > > > > > > > > Well, without this additional bit, an application can't rely on the mode > > > > > it requested. With an additional forced tagged address enable, we might > > > > > as well remove the prctl() altogether (well, that wouldn't be a bad > > > > > thing). > > > > > > > > I think the prctl() is still useful to say whether an application wants MTE > > > > or not, but I'm inclined to agree that sync vs async shouldn't be part of > > > > the prctl() call. > > > > > > That means that if the user choice was PR_MTE_TCF_NONE, it won't be > > > forced to any of the sync vs async, which makes sense if we go this > > > route. The async/sync (and a new asymmetric mode in v8.7) could become > > > just a PR_MTE_TCF_CHECKED. However, if glibc upstreamed the MTE support > > > already, it will be harder to remove the PR_* controls already defined > > > (well, we can still keep the sync/async as a hint) > > > > For Android I think we will always want there to be a way to select > > the "minimum" mode. This is because sync mode provides slightly more > > security than async, as well as being relied on for deterministic > > error reporting. If we just have PR_MTE_TCF_CHECKED and sync/async is > > taken as a hint, it means that we can't rely on the processor to be in > > sync mode when it takes a tag check fault, so we won't necessarily see > > the error report or detect the fault early enough. > > > > > > > Given that there are no real users of MTE yet, we have some choice of > > > > > tweaking the ABI, backporting to 5.10. The question: is the expectation > > > > > that the sysfs forcing of TCF is limited to deployments where the user > > > > > space is tightly controlled (e.g. Android with most apps starting from > > > > > zygote) or we allow it to become more of a general hint of what's the > > > > > fastest check on a CPU? If the former, I'm fine with forcing without any > > > > > additional bit, though I'd still prefer the opt-in. For the latter, I'd > > > > > like some wider discussion with non-Android folk on what they expect > > > > > from the TCF setting. Otherwise simply using PROT_MTE would may lead to > > > > > tag check faults. > > > > > > > > I don't think there's anything Android-specific here. The problem being > > > > solved concerns big/little SoCs with MTE, and I think it's up to the > > > > distribution how the sysfs stuff is used. > > > > > > But distros don't control what applications are running, so most likely > > > they would disable the sysfs control entirely. At that point, the > > > feature becomes primarily an Android play. > > > > > > Anyway, I'm not dead against forcing of the TCF mode regardless of the > > > user choice but I'd like to ensure that we don't miss other use-cases or > > > that we don't make the sysfs control an expert-only feature. > > > > > > Adding Szabolcs to get a view from the glibc perspective. > > Adding Tejas as he will look at memory tagging in glibc. > > > Given these diverging opinions my view is that we should choose > > whichever option leaves our options open for the future. For example, > > imagine that we make the ABI change now such that upgrades may happen > > for all applications and we don't have PR_MTE_DYNAMIC_TCF. This means > > that applications no longer have a guarantee of their TCF mode which > > may preclude some use cases (if we add an opt out later, applications > > will be affected when running on the kernel versions between when we > > changed the ABI and when we added the opt out). On the other hand, if > > we introduce PR_MTE_DYNAMIC_TCF, we can always make the ABI change > > later and start ignoring the PR_MTE_DYNAMIC_TCF flag at that point. > > > > Maybe the best compromise would be to change the ABI and at the same > > time add the opt out, but I don't have a strong opinion. > > so the observable difference between async mode and > async mode upgraded to sync mode is that async mode > allows to ignoring the fault and things can continue, > while in sync mode the program cannot move forward > in case of a fault since the pc is still at the > faulting instruction? Correct. > may be we can have a mode where the cpu is in sync > mode checks but the kernel steps over the faulting > instruction before reporting it? so emulating async > semantics (in a slow and complicated way..), but > guaranteeing (almost) immediate faults for better > debugging/security. > > personally i don't see a big issue with a mode that > says "either sync or async check behaviour" and user > code has to deal with it, however.. Agreed. I don't think that emulating async would be particularly useful and at least for Android an emulated async fault would cause us to drop a limited amount of diagnostic information about the allocation (size and offset for OOB accesses, based on the fault address which is unavailable for async faults) in the error report. > the per cpu setting is a bit nasty: can the kernel > decide which cpu should use sync and which async? > or a privileged user will have to fiddle with sysfs > settings on every system to make this useful? Unfortunately not as it depends to some extent on how the system was designed (e.g. memory system and other aspects of the SOC). The initial proposal was to have this information in DT/ACPI but this was still considered too inflexible (e.g. performance of sync may be slightly slower than async but still tolerable depending on the use case) so the choice was left to the privileged user. Peter
On Wed, Jun 23, 2021 at 09:55:30AM +0100, Szabolcs Nagy wrote: > The 06/22/2021 11:37, Peter Collingbourne wrote: > > On Mon, Jun 21, 2021 at 11:50 AM Catalin Marinas > > <catalin.marinas@arm.com> wrote: > > > On Mon, Jun 21, 2021 at 06:39:02PM +0100, Will Deacon wrote: > > > > On Mon, Jun 21, 2021 at 04:18:59PM +0100, Catalin Marinas wrote: > > > > > Given that there are no real users of MTE yet, we have some choice of > > > > > tweaking the ABI, backporting to 5.10. The question: is the expectation > > > > > that the sysfs forcing of TCF is limited to deployments where the user > > > > > space is tightly controlled (e.g. Android with most apps starting from > > > > > zygote) or we allow it to become more of a general hint of what's the > > > > > fastest check on a CPU? If the former, I'm fine with forcing without any > > > > > additional bit, though I'd still prefer the opt-in. For the latter, I'd > > > > > like some wider discussion with non-Android folk on what they expect > > > > > from the TCF setting. Otherwise simply using PROT_MTE would may lead to > > > > > tag check faults. > > > > > > > > I don't think there's anything Android-specific here. The problem being > > > > solved concerns big/little SoCs with MTE, and I think it's up to the > > > > distribution how the sysfs stuff is used. > > > > > > But distros don't control what applications are running, so most likely > > > they would disable the sysfs control entirely. At that point, the > > > feature becomes primarily an Android play. > > > > > > Anyway, I'm not dead against forcing of the TCF mode regardless of the > > > user choice but I'd like to ensure that we don't miss other use-cases or > > > that we don't make the sysfs control an expert-only feature. > > > > > > Adding Szabolcs to get a view from the glibc perspective. > > Adding Tejas as he will look at memory tagging in glibc. Thanks. Is there any MTE support in mainline glibc? If not, we may have another chance of adjusting the ABI. > > Given these diverging opinions my view is that we should choose > > whichever option leaves our options open for the future. For example, > > imagine that we make the ABI change now such that upgrades may happen > > for all applications and we don't have PR_MTE_DYNAMIC_TCF. This means > > that applications no longer have a guarantee of their TCF mode which > > may preclude some use cases (if we add an opt out later, applications > > will be affected when running on the kernel versions between when we > > changed the ABI and when we added the opt out). On the other hand, if > > we introduce PR_MTE_DYNAMIC_TCF, we can always make the ABI change > > later and start ignoring the PR_MTE_DYNAMIC_TCF flag at that point. > > > > Maybe the best compromise would be to change the ABI and at the same > > time add the opt out, but I don't have a strong opinion. > > so the observable difference between async mode and async mode > upgraded to sync mode is that async mode allows to ignoring the fault > and things can continue, while in sync mode the program cannot move > forward in case of a fault since the pc is still at the faulting > instruction? Yes. Would an application find the async mode useful or it can be freely overridden by the kernel (well, a user via sysfs)? > may be we can have a mode where the cpu is in sync mode checks but the > kernel steps over the faulting instruction before reporting it? so > emulating async semantics (in a slow and complicated way..), but > guaranteeing (almost) immediate faults for better debugging/security. It's a pretty complex step over (or emulate), so I wouldn't go there. The user signal handler could disable tag checking altogether (PSTATE.TCO) and continue. > personally i don't see a big issue with a mode that says "either sync > or async check behaviour" and user code has to deal with it, however.. The question is more about whether we still want to keep the current user program choice of sync vs async (vs the new asymmetric mode in 8.7). If the user wouldn't care, we just override it from the kernel without any additional PR_ flag for opting in (or out). > the per cpu setting is a bit nasty: can the kernel decide which cpu > should use sync and which async? or a privileged user will have to > fiddle with sysfs settings on every system to make this useful? The proposed interface is sysfs. I think that's not relevant to the user application since it wouldn't have control over it anyway. What's visible is that it cannot rely on the mode it requested, not even for the lifetime of a thread (as it may migrate between CPUs). Do you see any issues with this? For Android, it's probably fine but if other programs cannot cope (or need the specific mode requested), we'd need a new control (for opt-in or opt-out).
The 06/24/2021 17:52, Catalin Marinas wrote: > Thanks. Is there any MTE support in mainline glibc? If not, we may have > another chance of adjusting the ABI. glibc exposed heap tagging via an env var mechanism that can change between glibc releases, i.e. not abi stable, and we have no real contract about how the prctl can be used on top of glibc (see e.g. the multi-thread issue). we don't expect the async mode to be very useful on glibc based linux systems. changing async mode is unlikely to affect anything in userspace at this point. > It's a pretty complex step over (or emulate), so I wouldn't go there. > The user signal handler could disable tag checking altogether > (PSTATE.TCO) and continue. ah yes, disabling checks makes more sense if user wants to continue. > The question is more about whether we still want to keep the current > user program choice of sync vs async (vs the new asymmetric mode in > 8.7). If the user wouldn't care, we just override it from the kernel > without any additional PR_ flag for opting in (or out). i think the usefulness of pure async mode is very limited, and we haven't seen valid use-cases for it. > > the per cpu setting is a bit nasty: can the kernel decide which cpu > > should use sync and which async? or a privileged user will have to > > fiddle with sysfs settings on every system to make this useful? > > The proposed interface is sysfs. I think that's not relevant to the user > application since it wouldn't have control over it anyway. What's > visible is that it cannot rely on the mode it requested, not even for > the lifetime of a thread (as it may migrate between CPUs). Do you see > any issues with this? For Android, it's probably fine but if other > programs cannot cope (or need the specific mode requested), we'd need a > new control (for opt-in or opt-out). i don't see any issues with changing async mode. i assume the prctl get operation would return whatever was the prctl setting for the thread and not try to expose the per cpu architectural state. i assume async vs sync fault can be distinguished via the SEGV_MTE{A,S}ERR si_code.
On Fri, Jun 25, 2021 at 10:22:53AM +0100, Szabolcs Nagy wrote: > The 06/24/2021 17:52, Catalin Marinas wrote: > > Thanks. Is there any MTE support in mainline glibc? If not, we may have > > another chance of adjusting the ABI. > > glibc exposed heap tagging via an env var mechanism that can change > between glibc releases, i.e. not abi stable, and we have no real > contract about how the prctl can be used on top of glibc (see e.g. the > multi-thread issue). > > we don't expect the async mode to be very useful on glibc based linux > systems. > > changing async mode is unlikely to affect anything in userspace at > this point. Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not ABI, the user app can't rely on what the glibc has configured. Arguably, since it's driven from outside the application (env), we could say the same for sysfs, though for the glibc case, the user app is still be able to override it before the first thread is created (or per-thread). I assume glibc only issues the prctl() once, not for every new thread. > > The proposed interface is sysfs. I think that's not relevant to the user > > application since it wouldn't have control over it anyway. What's > > visible is that it cannot rely on the mode it requested, not even for > > the lifetime of a thread (as it may migrate between CPUs). Do you see > > any issues with this? For Android, it's probably fine but if other > > programs cannot cope (or need the specific mode requested), we'd need a > > new control (for opt-in or opt-out). > > i don't see any issues with changing async mode. > > i assume the prctl get operation would return whatever was the prctl > setting for the thread and not try to expose the per cpu architectural > state. Yes. > i assume async vs sync fault can be distinguished via the > SEGV_MTE{A,S}ERR si_code. Indeed. So we can document that the mode requested by the app is an indication, the system may change it to another value (and back-port documentation to 5.10). If we get a request from developers to honour a specific mode, we can add a new PR_MTE_TCF_EXACT bit or something but it's not essential we do it now. So if we allow the kernel to change the user requested mode (via sysfs), I think we still have two more issues to clarify: 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can downgrade to a less strict mode. I'd go for upgrade here to a stricter check as in Peter's patch. 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, so I'd say yes. Any other thoughts are welcome.
On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > On Fri, Jun 25, 2021 at 10:22:53AM +0100, Szabolcs Nagy wrote: > > The 06/24/2021 17:52, Catalin Marinas wrote: > > > Thanks. Is there any MTE support in mainline glibc? If not, we may have > > > another chance of adjusting the ABI. > > > > glibc exposed heap tagging via an env var mechanism that can change > > between glibc releases, i.e. not abi stable, and we have no real > > contract about how the prctl can be used on top of glibc (see e.g. the > > multi-thread issue). > > > > we don't expect the async mode to be very useful on glibc based linux > > systems. > > > > changing async mode is unlikely to affect anything in userspace at > > this point. > > Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not > ABI, the user app can't rely on what the glibc has configured. Arguably, > since it's driven from outside the application (env), we could say the > same for sysfs, though for the glibc case, the user app is still be able > to override it before the first thread is created (or per-thread). I > assume glibc only issues the prctl() once, not for every new thread. > > > > The proposed interface is sysfs. I think that's not relevant to the user > > > application since it wouldn't have control over it anyway. What's > > > visible is that it cannot rely on the mode it requested, not even for > > > the lifetime of a thread (as it may migrate between CPUs). Do you see > > > any issues with this? For Android, it's probably fine but if other > > > programs cannot cope (or need the specific mode requested), we'd need a > > > new control (for opt-in or opt-out). > > > > i don't see any issues with changing async mode. > > > > i assume the prctl get operation would return whatever was the prctl > > setting for the thread and not try to expose the per cpu architectural > > state. > > Yes. > > > i assume async vs sync fault can be distinguished via the > > SEGV_MTE{A,S}ERR si_code. > > Indeed. > > So we can document that the mode requested by the app is an indication, > the system may change it to another value (and back-port documentation > to 5.10). If we get a request from developers to honour a specific mode, > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > essential we do it now. > > So if we allow the kernel to change the user requested mode (via sysfs), > I think we still have two more issues to clarify: > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > downgrade to a less strict mode. I'd go for upgrade here to a > stricter check as in Peter's patch. > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > so I'd say yes. > > Any other thoughts are welcome. As I mentioned before, I think the sysfs interface should offer: "task" : Honour whatever the task has asked for (default) "async" : Force async on this CPU "sync" : Force sync on this CPU I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a "none" option in here. I originally suggested that, but in hindsight it feels like a bad idea because a task could SIGILL on migration. So what we're saying is that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always enable MTE on success, but the reporting mode is a hint. I don't think upgrade/downgrade makes a lot of sense given that the sysfs controls can be changed at any point in time. It should just be an override. This means that we can force async for CPUs where sync mode is horribly slow, whilst honouring the task's request on CPUs which are better implemented. Will
On Fri, Jun 25, 2021 at 01:39:59PM +0100, Will Deacon wrote: > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > > So we can document that the mode requested by the app is an indication, > > the system may change it to another value (and back-port documentation > > to 5.10). If we get a request from developers to honour a specific mode, > > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > > essential we do it now. > > > > So if we allow the kernel to change the user requested mode (via sysfs), > > I think we still have two more issues to clarify: > > > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > > downgrade to a less strict mode. I'd go for upgrade here to a > > stricter check as in Peter's patch. > > > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > > so I'd say yes. > > > > Any other thoughts are welcome. > > As I mentioned before, I think the sysfs interface should offer: > > "task" : Honour whatever the task has asked for (default) > "async" : Force async on this CPU > "sync" : Force sync on this CPU > > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a "none" > option in here. I originally suggested that, but in hindsight it feels like > a bad idea because a task could SIGILL on migration. So what we're saying is > that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always enable MTE on success, > but the reporting mode is a hint. > > I don't think upgrade/downgrade makes a lot of sense given that the sysfs > controls can be changed at any point in time. It should just be an override. The problem with sysfs is that it's global, so it assumes that any process has the same needs. The _MTAG_ENABLE glibc tunable at least can be set per process. > This means that we can force async for CPUs where sync mode is horribly > slow, whilst honouring the task's request on CPUs which are better > implemented. This may hamper debugging on, for example, a system where the root configured the modes for CPUs and a normal user wants to use MTE to identify access bugs. Another case is some service that wants tightened security from MTE irrespective of the performance. The slight downside of the "upgrade" mode assumes that the user is aware that async is the fastest and asks for this unless it has specific needs. Of course, we can also extend the interface to "sync-force" or "sync-upgrade" etc. but I think it's over-engineered.
The 06/25/2021 13:39, Will Deacon wrote: > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > > Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not > > ABI, the user app can't rely on what the glibc has configured. Arguably, > > since it's driven from outside the application (env), we could say the > > same for sysfs, though for the glibc case, the user app is still be able > > to override it before the first thread is created (or per-thread). I > > assume glibc only issues the prctl() once, not for every new thread. note: in the end the tunable is like GLIBC_TUNABLES=glibc.mem.tagging=3 ./exe not _MTAG_ENABLE. and yes the setting comes from outside and glibc calls prctl once. > > So we can document that the mode requested by the app is an indication, > > the system may change it to another value (and back-port documentation > > to 5.10). If we get a request from developers to honour a specific mode, > > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > > essential we do it now. > > > > So if we allow the kernel to change the user requested mode (via sysfs), > > I think we still have two more issues to clarify: > > > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > > downgrade to a less strict mode. I'd go for upgrade here to a > > stricter check as in Peter's patch. > > > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > > so I'd say yes. > > > > Any other thoughts are welcome. > > As I mentioned before, I think the sysfs interface should offer: > > "task" : Honour whatever the task has asked for (default) > "async" : Force async on this CPU > "sync" : Force sync on this CPU > > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a "none" > option in here. I originally suggested that, but in hindsight it feels like > a bad idea because a task could SIGILL on migration. So what we're saying is > that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always enable MTE on success, > but the reporting mode is a hint. > > I don't think upgrade/downgrade makes a lot of sense given that the sysfs > controls can be changed at any point in time. It should just be an override. > > This means that we can force async for CPUs where sync mode is horribly > slow, whilst honouring the task's request on CPUs which are better > implemented. i think a user should be able to ask for sync check mode for a process and get an error if that's not possible. at least this is the semantics that makes sense in glibc. i think it's very confusing if somebody explicitly asks for sync checks to debug something but then gets useless diagnostics because somebody else tried to second guess their performance tradeoff preferences. (if sync check is too slow on a cpu then the user can taskset to a cpu that's not slow or just use other debugging method, silent override sounds bad.)
> -----Original Message----- > From: Szabolcs Nagy <Szabolcs.Nagy@arm.com> > Sent: Friday, June 25, 2021 3:15 PM > To: Will Deacon <will@kernel.org> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Peter Collingbourne > <pcc@google.com>; Vincenzo Frascino <Vincenzo.Frascino@arm.com>; Evgenii > Stepanov <eugenis@google.com>; Linux ARM <linux-arm- > kernel@lists.infradead.org>; Tejas Belagod <Tejas.Belagod@arm.com> > Subject: Re: [PATCH v5] arm64: mte: allow async MTE to be upgraded to sync on > a per-CPU basis > > The 06/25/2021 13:39, Will Deacon wrote: > > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > > > Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not > > > ABI, the user app can't rely on what the glibc has configured. > > > Arguably, since it's driven from outside the application (env), we > > > could say the same for sysfs, though for the glibc case, the user > > > app is still be able to override it before the first thread is > > > created (or per-thread). I assume glibc only issues the prctl() once, not for > every new thread. > > note: in the end the tunable is like > > GLIBC_TUNABLES=glibc.mem.tagging=3 ./exe > > not _MTAG_ENABLE. > > and yes the setting comes from outside and glibc calls prctl once. > > > > So we can document that the mode requested by the app is an > > > indication, the system may change it to another value (and back-port > > > documentation to 5.10). If we get a request from developers to > > > honour a specific mode, we can add a new PR_MTE_TCF_EXACT bit or > > > something but it's not essential we do it now. > > > > > > So if we allow the kernel to change the user requested mode (via > > > sysfs), I think we still have two more issues to clarify: > > > > > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > > > downgrade to a less strict mode. I'd go for upgrade here to a > > > stricter check as in Peter's patch. > > > > > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does > that, > > > so I'd say yes. > > > > > > Any other thoughts are welcome. > > > > As I mentioned before, I think the sysfs interface should offer: > > > > "task" : Honour whatever the task has asked for (default) > > "async" : Force async on this CPU > > "sync" : Force sync on this CPU > > > > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a > "none" > > option in here. I originally suggested that, but in hindsight it feels > > like a bad idea because a task could SIGILL on migration. So what > > we're saying is that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always > > enable MTE on success, but the reporting mode is a hint. > > > > I don't think upgrade/downgrade makes a lot of sense given that the > > sysfs controls can be changed at any point in time. It should just be an > override. > > > > This means that we can force async for CPUs where sync mode is > > horribly slow, whilst honouring the task's request on CPUs which are > > better implemented. > > i think a user should be able to ask for sync check mode for a process and get an > error if that's not possible. > > at least this is the semantics that makes sense in glibc. i think it's very confusing > if somebody explicitly asks for sync checks to debug something but then gets > useless diagnostics because somebody else tried to second guess their > performance tradeoff preferences. (if sync check is too slow on a cpu then the > user can taskset to a cpu that's not slow or just use other debugging method, > silent override sounds bad.) Sorry I'm late to the party - just catching up with this thread. I'm not a kernel/glibc expert so please correct me if I'm wrong here - I see two themes in this discussion - the usage model between user/system of the TCF mode and its implementation. From a user's perspective, they should be able to get the TCF mode they asked for and get atleast a warning if that's not possible for whatever reason(performance et al). From a system/kernel's perspective, if the system wants to use MTE, it should be able to provide the most performant(or most strict) TCF mode/per CPU as it sees fit - user's 'task' setting in sysfs/GLIBC's env will override it. Can a user-setting downgrade a system-setting TCF level is another question - probably not, so the user will need to be warned against it. So what happens if a process with TCF_NONE migrates from a CPU with TCF_NONE to TCF_ASYNC - should the kernel control process/CPU migrations according to the TCF setting? Won't this affect performance anyway by increasing contention? Implementation-wise, the sysfs interface of 'task', 'async', 'sync' seems to make sense to me as it fits in well if we use the above as a guiding principle. Thanks, Tejas.
On Fri, Jun 25, 2021 at 5:01 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Fri, Jun 25, 2021 at 10:22:53AM +0100, Szabolcs Nagy wrote: > > The 06/24/2021 17:52, Catalin Marinas wrote: > > > Thanks. Is there any MTE support in mainline glibc? If not, we may have > > > another chance of adjusting the ABI. > > > > glibc exposed heap tagging via an env var mechanism that can change > > between glibc releases, i.e. not abi stable, and we have no real > > contract about how the prctl can be used on top of glibc (see e.g. the > > multi-thread issue). > > > > we don't expect the async mode to be very useful on glibc based linux > > systems. > > > > changing async mode is unlikely to affect anything in userspace at > > this point. > > Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not > ABI, the user app can't rely on what the glibc has configured. Arguably, > since it's driven from outside the application (env), we could say the > same for sysfs, though for the glibc case, the user app is still be able > to override it before the first thread is created (or per-thread). I > assume glibc only issues the prctl() once, not for every new thread. > > > > The proposed interface is sysfs. I think that's not relevant to the user > > > application since it wouldn't have control over it anyway. What's > > > visible is that it cannot rely on the mode it requested, not even for > > > the lifetime of a thread (as it may migrate between CPUs). Do you see > > > any issues with this? For Android, it's probably fine but if other > > > programs cannot cope (or need the specific mode requested), we'd need a > > > new control (for opt-in or opt-out). > > > > i don't see any issues with changing async mode. > > > > i assume the prctl get operation would return whatever was the prctl > > setting for the thread and not try to expose the per cpu architectural > > state. > > Yes. > > > i assume async vs sync fault can be distinguished via the > > SEGV_MTE{A,S}ERR si_code. > > Indeed. > > So we can document that the mode requested by the app is an indication, > the system may change it to another value (and back-port documentation > to 5.10). If we get a request from developers to honour a specific mode, > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > essential we do it now. > > So if we allow the kernel to change the user requested mode (via sysfs), > I think we still have two more issues to clarify: > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > downgrade to a less strict mode. I'd go for upgrade here to a > stricter check as in Peter's patch. Agreed, for the reasons that I and Szabolcs have mentioned. > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > so I'd say yes. This would be a problem for Android. Currently when disabling MTE in a process which has previously had MTE enabled we set TCF to NONE via prctl and then proceed mostly as if MTE was never enabled. This means e.g. that tags in existing PROT_MTE pages are not updated. If TCF is set to something other than NONE as a result of the prctl we would randomly hit tag check faults as a result of accessing allocations with non-updated tags. It would not be sufficient to disable tag checks via TCO because TCO is disabled in signal handlers. Peter
On Fri, Jun 25, 2021 at 02:53:50PM +0100, Catalin Marinas wrote: > On Fri, Jun 25, 2021 at 01:39:59PM +0100, Will Deacon wrote: > > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > > > So we can document that the mode requested by the app is an indication, > > > the system may change it to another value (and back-port documentation > > > to 5.10). If we get a request from developers to honour a specific mode, > > > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > > > essential we do it now. > > > > > > So if we allow the kernel to change the user requested mode (via sysfs), > > > I think we still have two more issues to clarify: > > > > > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > > > downgrade to a less strict mode. I'd go for upgrade here to a > > > stricter check as in Peter's patch. > > > > > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > > > so I'd say yes. > > > > > > Any other thoughts are welcome. > > > > As I mentioned before, I think the sysfs interface should offer: > > > > "task" : Honour whatever the task has asked for (default) > > "async" : Force async on this CPU > > "sync" : Force sync on this CPU > > > > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a "none" > > option in here. I originally suggested that, but in hindsight it feels like > > a bad idea because a task could SIGILL on migration. So what we're saying is > > that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always enable MTE on success, > > but the reporting mode is a hint. > > > > I don't think upgrade/downgrade makes a lot of sense given that the sysfs > > controls can be changed at any point in time. It should just be an override. > > The problem with sysfs is that it's global, so it assumes that any > process has the same needs. The _MTAG_ENABLE glibc tunable at least can > be set per process. Again, this seems to be an argument against doing this at all. We already have a per-task interface to change the checking mode, but there is a need to override this on a per-cpu basis to achieve acceptable performance. Applications can't possibly be aware of that and so their "needs" cannot be taken into account here. > > This means that we can force async for CPUs where sync mode is horribly > > slow, whilst honouring the task's request on CPUs which are better > > implemented. > > This may hamper debugging on, for example, a system where the root > configured the modes for CPUs and a normal user wants to use MTE to > identify access bugs. Another case is some service that wants tightened > security from MTE irrespective of the performance. I suppose the way I see this is similar to how I see CPU errata: essentially sync mode is unusably slow on some CPUs, so we disable it (drop to async) on a per-cpu basis. The only difference is that we provide the switch to userspace, since there isn't a functional problem. However, when we inevitably hit real errata, we could even force the mode in the kernel rather than disable the feature entirely. > The slight downside of the "upgrade" mode assumes that the user is aware > that async is the fastest and asks for this unless it has specific > needs. Of course, we can also extend the interface to "sync-force" or > "sync-upgrade" etc. but I think it's over-engineered. Another reason I dislike "upgrade" is because it means that the kernel embeds an ordering of which mode upgrades to another mode and, as new modes get added by the architecture in future, this feels more like policy to me and would be better off handled in userspace. Will
On Fri, Jun 25, 2021 at 03:14:40PM +0100, Szabolcs Nagy wrote: > The 06/25/2021 13:39, Will Deacon wrote: > > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > > > Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not > > > ABI, the user app can't rely on what the glibc has configured. Arguably, > > > since it's driven from outside the application (env), we could say the > > > same for sysfs, though for the glibc case, the user app is still be able > > > to override it before the first thread is created (or per-thread). I > > > assume glibc only issues the prctl() once, not for every new thread. > > note: in the end the tunable is like > > GLIBC_TUNABLES=glibc.mem.tagging=3 ./exe > > not _MTAG_ENABLE. > > and yes the setting comes from outside and glibc > calls prctl once. > > > > So we can document that the mode requested by the app is an indication, > > > the system may change it to another value (and back-port documentation > > > to 5.10). If we get a request from developers to honour a specific mode, > > > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > > > essential we do it now. > > > > > > So if we allow the kernel to change the user requested mode (via sysfs), > > > I think we still have two more issues to clarify: > > > > > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > > > downgrade to a less strict mode. I'd go for upgrade here to a > > > stricter check as in Peter's patch. > > > > > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > > > so I'd say yes. > > > > > > Any other thoughts are welcome. > > > > As I mentioned before, I think the sysfs interface should offer: > > > > "task" : Honour whatever the task has asked for (default) > > "async" : Force async on this CPU > > "sync" : Force sync on this CPU > > > > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a "none" > > option in here. I originally suggested that, but in hindsight it feels like > > a bad idea because a task could SIGILL on migration. So what we're saying is > > that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always enable MTE on success, > > but the reporting mode is a hint. > > > > I don't think upgrade/downgrade makes a lot of sense given that the sysfs > > controls can be changed at any point in time. It should just be an override. > > > > This means that we can force async for CPUs where sync mode is horribly > > slow, whilst honouring the task's request on CPUs which are better > > implemented. > > i think a user should be able to ask for sync check > mode for a process and get an error if that's not > possible. Hmm. The question then is what do we do if the sysfs override is changed after the application has started running? Will
On Mon, Jun 28, 2021 at 11:14:49AM +0100, Will Deacon wrote: > On Fri, Jun 25, 2021 at 02:53:50PM +0100, Catalin Marinas wrote: > > On Fri, Jun 25, 2021 at 01:39:59PM +0100, Will Deacon wrote: > > > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote: > > > > So we can document that the mode requested by the app is an indication, > > > > the system may change it to another value (and back-port documentation > > > > to 5.10). If we get a request from developers to honour a specific mode, > > > > we can add a new PR_MTE_TCF_EXACT bit or something but it's not > > > > essential we do it now. > > > > > > > > So if we allow the kernel to change the user requested mode (via sysfs), > > > > I think we still have two more issues to clarify: > > > > > > > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can > > > > downgrade to a less strict mode. I'd go for upgrade here to a > > > > stricter check as in Peter's patch. > > > > > > > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does that, > > > > so I'd say yes. > > > > > > > > Any other thoughts are welcome. > > > > > > As I mentioned before, I think the sysfs interface should offer: > > > > > > "task" : Honour whatever the task has asked for (default) > > > "async" : Force async on this CPU > > > "sync" : Force sync on this CPU > > > > > > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a "none" > > > option in here. I originally suggested that, but in hindsight it feels like > > > a bad idea because a task could SIGILL on migration. So what we're saying is > > > that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always enable MTE on success, > > > but the reporting mode is a hint. > > > > > > I don't think upgrade/downgrade makes a lot of sense given that the sysfs > > > controls can be changed at any point in time. It should just be an override. > > > > The problem with sysfs is that it's global, so it assumes that any > > process has the same needs. The _MTAG_ENABLE glibc tunable at least can > > be set per process. > > Again, this seems to be an argument against doing this at all. We already > have a per-task interface to change the checking mode, but there is a need > to override this on a per-cpu basis to achieve acceptable performance. > Applications can't possibly be aware of that and so their "needs" cannot be > taken into account here. That's because we have two conflicting needs: (a) more precise checking for debugging or increased security and (b) better performance. The sysfs interface is primarily meant for (b) but that overrides any application choice for (a). > > > This means that we can force async for CPUs where sync mode is horribly > > > slow, whilst honouring the task's request on CPUs which are better > > > implemented. > > > > This may hamper debugging on, for example, a system where the root > > configured the modes for CPUs and a normal user wants to use MTE to > > identify access bugs. Another case is some service that wants tightened > > security from MTE irrespective of the performance. > > I suppose the way I see this is similar to how I see CPU errata: essentially > sync mode is unusably slow on some CPUs, so we disable it (drop to async) on > a per-cpu basis. The only difference is that we provide the switch to > userspace, since there isn't a functional problem. However, when we > inevitably hit real errata, we could even force the mode in the kernel > rather than disable the feature entirely. I think everyone assumes that sync is slow, so they'd only set it for debugging or some increased security. It's quite hard to quantify what a "too slow" sync is. > > The slight downside of the "upgrade" mode assumes that the user is aware > > that async is the fastest and asks for this unless it has specific > > needs. Of course, we can also extend the interface to "sync-force" or > > "sync-upgrade" etc. but I think it's over-engineered. > > Another reason I dislike "upgrade" is because it means that the kernel embeds > an ordering of which mode upgrades to another mode and, as new modes get > added by the architecture in future, this feels more like policy to me and > would be better off handled in userspace. Yeah, I don't like this aspect of the "upgrade" either but I'd accept it as a compromise. Another option is a mapping table where async can be remapped to sync and sync to async (or even to "none" for both). That's not far from one of Peter's mte-upgrade-async proposal, we just add mte-map-async and mte-map-sync options. Most likely we'll just use mte-map-async for now to map it to sync on some CPUs but it wouldn't exclude other forced settings.
The 06/28/2021 11:17, Will Deacon wrote: > On Fri, Jun 25, 2021 at 03:14:40PM +0100, Szabolcs Nagy wrote: > > i think a user should be able to ask for sync check > > mode for a process and get an error if that's not > > possible. > > Hmm. The question then is what do we do if the sysfs override is changed > after the application has started running? if the failure cannot be reported then i think it is not ok to override. at least running a process in sync mode sounds useful to me. with the per cpu override this may need privilege and even then it may not be possible without significantly affecting other processes. and sysfs is not a reliable api to figure out what's going on. if we need the ability to completely disable sync mode on a cpu (as opposed to just doing it for performance tuning) then it can be a boot option, or processes that requested guaranteed sync mode can be killed. i'm not against per cpu setting since most code should work with whatever tag check mode, but if we make that the abi (that code should work in whatever mode) then i think we weaken the usability of mte.
On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > Another option is a mapping table where async can be remapped to sync > and sync to async (or even to "none" for both). That's not far from one > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > mte-map-sync options. Most likely we'll just use mte-map-async for now > to map it to sync on some CPUs but it wouldn't exclude other forced > settings. Catalin and I discussed this offline and ended up with another option: retrospectively change the prctl() ABI so that the 'flags' argument accepts a bitmask of modes that the application is willing to accept. This doesn't break any existing users, as we currently enforce that only one mode is specified, but it would allow things like: prctl(PR_SET_TAGGED_ADDR_CTRL, PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, 0, 0, 0); which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with the difference that I think this extends more naturally as new PR_MTR_TCF_* flags are introduced. Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") which initially reads as "async". If the root user does, e.g. # echo "sync" > cpu1/mte_tcf_preferred then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on CPU1, but async mode on other CPUs (assuming they retain the default value). We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The only values which the sysfs files would accept today are "sync" and "async". When faced with a situation where the prctl() flags for a task do not intersect with the preferred mode for a CPU on which the task is going to run, the lowest bit number flag is chosen from the mask set by the prctl(). Thoughts? Will
The 06/29/2021 11:46, Will Deacon wrote: > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > Another option is a mapping table where async can be remapped to sync > > and sync to async (or even to "none" for both). That's not far from one > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > to map it to sync on some CPUs but it wouldn't exclude other forced > > settings. > > Catalin and I discussed this offline and ended up with another option: > retrospectively change the prctl() ABI so that the 'flags' argument > accepts a bitmask of modes that the application is willing to accept. This > doesn't break any existing users, as we currently enforce that only one > mode is specified, but it would allow things like: > > prctl(PR_SET_TAGGED_ADDR_CTRL, > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > 0, 0, 0); > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > the difference that I think this extends more naturally as new PR_MTR_TCF_* > flags are introduced. > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > which initially reads as "async". If the root user does, e.g. > > # echo "sync" > cpu1/mte_tcf_preferred > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > CPU1, but async mode on other CPUs (assuming they retain the default value). > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > only values which the sysfs files would accept today are "sync" and "async". > > When faced with a situation where the prctl() flags for a task do not > intersect with the preferred mode for a CPU on which the task is going > to run, the lowest bit number flag is chosen from the mask set by the > prctl(). > > Thoughts? i'm happy with this. "lowest bit number" flag may not be optimal if there are many flags, but i don't expect many more tag check modes. no separate TCF_NONE bit means if we later want to turn tag check off per cpu there is no opt-out. but i guess this is fine. will armv8.7-a style asymmetric check use separate flag or TCF_SYNC | TCF_ASYNC may enable it? i see arguments either way.
> -----Original Message----- > From: Szabolcs Nagy <Szabolcs.Nagy@arm.com> > Sent: Tuesday, June 29, 2021 2:59 PM > To: Will Deacon <will@kernel.org> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Peter Collingbourne > <pcc@google.com>; Vincenzo Frascino <Vincenzo.Frascino@arm.com>; Evgenii > Stepanov <eugenis@google.com>; Linux ARM <linux-arm- > kernel@lists.infradead.org>; Tejas Belagod <Tejas.Belagod@arm.com> > Subject: Re: [PATCH v5] arm64: mte: allow async MTE to be upgraded to sync on > a per-CPU basis > > The 06/29/2021 11:46, Will Deacon wrote: > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > Another option is a mapping table where async can be remapped to > > > sync and sync to async (or even to "none" for both). That's not far > > > from one of Peter's mte-upgrade-async proposal, we just add > > > mte-map-async and mte-map-sync options. Most likely we'll just use > > > mte-map-async for now to map it to sync on some CPUs but it wouldn't > > > exclude other forced settings. > > > > Catalin and I discussed this offline and ended up with another option: > > retrospectively change the prctl() ABI so that the 'flags' argument > > accepts a bitmask of modes that the application is willing to accept. > > This doesn't break any existing users, as we currently enforce that > > only one mode is specified, but it would allow things like: > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > 0, 0, 0); > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, > > with the difference that I think this extends more naturally as new > > PR_MTR_TCF_* flags are introduced. > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > which initially reads as "async". If the root user does, e.g. > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL > > prctl() request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in > > sync mode on CPU1, but async mode on other CPUs (assuming they retain the > default value). > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand > > for "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the > > same as doing PR_MTE_TCF_SYNC (which I think is already the behaviour > > today). The only values which the sysfs files would accept today are "sync" and > "async". > > > > When faced with a situation where the prctl() flags for a task do not > > intersect with the preferred mode for a CPU on which the task is going > > to run, the lowest bit number flag is chosen from the mask set by the > > prctl(). > > > > Thoughts? > > i'm happy with this. > > "lowest bit number" flag may not be optimal if there are many flags, but i don't > expect many more tag check modes. > > no separate TCF_NONE bit means if we later want to turn tag check off per cpu > there is no opt-out. > but i guess this is fine. > > will armv8.7-a style asymmetric check use separate flag or TCF_SYNC | > TCF_ASYNC may enable it? > i see arguments either way. If app wants to say 'I want either TCF_SYNC | TCF_ASYNC, but not TCF_ASYM' it will have to be a separate bit, right? Thanks, Tejas.
On Tue, Jun 29, 2021 at 03:31:26PM +0100, Tejas Belagod wrote: > > The 06/29/2021 11:46, Will Deacon wrote: > > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > > Another option is a mapping table where async can be remapped to > > > > sync and sync to async (or even to "none" for both). That's not far > > > > from one of Peter's mte-upgrade-async proposal, we just add > > > > mte-map-async and mte-map-sync options. Most likely we'll just use > > > > mte-map-async for now to map it to sync on some CPUs but it wouldn't > > > > exclude other forced settings. > > > > > > Catalin and I discussed this offline and ended up with another option: > > > retrospectively change the prctl() ABI so that the 'flags' argument > > > accepts a bitmask of modes that the application is willing to accept. > > > This doesn't break any existing users, as we currently enforce that > > > only one mode is specified, but it would allow things like: > > > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > > 0, 0, 0); > > > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, > > > with the difference that I think this extends more naturally as new > > > PR_MTR_TCF_* flags are introduced. > > > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > > which initially reads as "async". If the root user does, e.g. > > > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL > > > prctl() request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in > > > sync mode on CPU1, but async mode on other CPUs (assuming they > > > retain the default value). > > > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand > > > for "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the > > > same as doing PR_MTE_TCF_SYNC (which I think is already the behaviour > > > today). The only values which the sysfs files would accept today > > > are "sync" and "async". > > > > > > When faced with a situation where the prctl() flags for a task do not > > > intersect with the preferred mode for a CPU on which the task is going > > > to run, the lowest bit number flag is chosen from the mask set by the > > > prctl(). > > > > > > Thoughts? > > > > i'm happy with this. > > > > "lowest bit number" flag may not be optimal if there are many flags, > > but i don't expect many more tag check modes. > > > > no separate TCF_NONE bit means if we later want to turn tag check > > off per cpu there is no opt-out. but i guess this is fine. If some CPU in the system has an erratum that requires TCF_NONE, I'd rather disable MTE altogether (via a kernel patch). > > will armv8.7-a style asymmetric check use separate flag or TCF_SYNC > > | TCF_ASYNC may enable it? i see arguments either way. > > If app wants to say 'I want either TCF_SYNC | TCF_ASYNC, but not > TCF_ASYM' it will have to be a separate bit, right? Will and I talked about this as well and we decided that it's better to have a separate TCF_ASYM bit. So SYNC|ASYNC won't be "upgradable" to ASYM.
On Tue, Jun 29, 2021 at 3:46 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > Another option is a mapping table where async can be remapped to sync > > and sync to async (or even to "none" for both). That's not far from one > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > to map it to sync on some CPUs but it wouldn't exclude other forced > > settings. > > Catalin and I discussed this offline and ended up with another option: > retrospectively change the prctl() ABI so that the 'flags' argument > accepts a bitmask of modes that the application is willing to accept. This > doesn't break any existing users, as we currently enforce that only one > mode is specified, but it would allow things like: > > prctl(PR_SET_TAGGED_ADDR_CTRL, > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > 0, 0, 0); > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > the difference that I think this extends more naturally as new PR_MTR_TCF_* > flags are introduced. > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > which initially reads as "async". If the root user does, e.g. > > # echo "sync" > cpu1/mte_tcf_preferred > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > CPU1, but async mode on other CPUs (assuming they retain the default value). > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > only values which the sysfs files would accept today are "sync" and "async". > > When faced with a situation where the prctl() flags for a task do not > intersect with the preferred mode for a CPU on which the task is going > to run, the lowest bit number flag is chosen from the mask set by the > prctl(). > > Thoughts? This all sounds great and I'm glad you were able to come to an agreement on this. I'll get started on implementing it. Once we have ASYM support I'm not sure if we can rely on bit numbering for ordering, as we will want the ordering to be ASYNC < ASYM < SYNC and ASYNC is bit-adjacent to SYNC. So I think we will need to make ASYM a special case. This would also allow NONE to be upgraded by allocating a bit position for NONE, but if we change the value of NONE it may break applications built against new headers running on old kernels, so maybe it should be made a separate constant. This doesn't need to be done immediately, though. Peter
On Tue, Jun 29, 2021 at 12:11:17PM -0700, Peter Collingbourne wrote: > On Tue, Jun 29, 2021 at 3:46 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > Another option is a mapping table where async can be remapped to sync > > > and sync to async (or even to "none" for both). That's not far from one > > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > > to map it to sync on some CPUs but it wouldn't exclude other forced > > > settings. > > > > Catalin and I discussed this offline and ended up with another option: > > retrospectively change the prctl() ABI so that the 'flags' argument > > accepts a bitmask of modes that the application is willing to accept. This > > doesn't break any existing users, as we currently enforce that only one > > mode is specified, but it would allow things like: > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > 0, 0, 0); > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > > the difference that I think this extends more naturally as new PR_MTR_TCF_* > > flags are introduced. > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > which initially reads as "async". If the root user does, e.g. > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > > CPU1, but async mode on other CPUs (assuming they retain the default value). > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > > only values which the sysfs files would accept today are "sync" and "async". > > > > When faced with a situation where the prctl() flags for a task do not > > intersect with the preferred mode for a CPU on which the task is going > > to run, the lowest bit number flag is chosen from the mask set by the > > prctl(). > > > > Thoughts? > > This all sounds great and I'm glad you were able to come to an > agreement on this. I'll get started on implementing it. > > Once we have ASYM support I'm not sure if we can rely on bit numbering > for ordering, as we will want the ordering to be ASYNC < ASYM < SYNC > and ASYNC is bit-adjacent to SYNC. So I think we will need to make > ASYM a special case. The bit position based order - SYNC < ASYNC < ASYM - is indeed arbitrary but I think it's easier to follow. When we add ASYM, it will be the last one rather than squeezing it in the middle of the current order. Any other order somehow implies that one is better than the other but we don't have a clear definition for "better". > This would also allow NONE to be upgraded by allocating a bit position > for NONE, but if we change the value of NONE it may break applications > built against new headers running on old kernels, so maybe it should > be made a separate constant. This doesn't need to be done immediately, > though. I think we should leave NONE as non-upgradable, the software doesn't expect any fault when accessing with the wrong tag. We also can't change the definition now as it's already in upstream glibc.
On Wed, Jun 30, 2021 at 8:19 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Jun 29, 2021 at 12:11:17PM -0700, Peter Collingbourne wrote: > > On Tue, Jun 29, 2021 at 3:46 AM Will Deacon <will@kernel.org> wrote: > > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > > Another option is a mapping table where async can be remapped to sync > > > > and sync to async (or even to "none" for both). That's not far from one > > > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > > > to map it to sync on some CPUs but it wouldn't exclude other forced > > > > settings. > > > > > > Catalin and I discussed this offline and ended up with another option: > > > retrospectively change the prctl() ABI so that the 'flags' argument > > > accepts a bitmask of modes that the application is willing to accept. This > > > doesn't break any existing users, as we currently enforce that only one > > > mode is specified, but it would allow things like: > > > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > > 0, 0, 0); > > > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > > > the difference that I think this extends more naturally as new PR_MTR_TCF_* > > > flags are introduced. > > > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > > which initially reads as "async". If the root user does, e.g. > > > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > > > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > > > CPU1, but async mode on other CPUs (assuming they retain the default value). > > > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > > > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > > > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > > > only values which the sysfs files would accept today are "sync" and "async". > > > > > > When faced with a situation where the prctl() flags for a task do not > > > intersect with the preferred mode for a CPU on which the task is going > > > to run, the lowest bit number flag is chosen from the mask set by the > > > prctl(). > > > > > > Thoughts? > > > > This all sounds great and I'm glad you were able to come to an > > agreement on this. I'll get started on implementing it. > > > > Once we have ASYM support I'm not sure if we can rely on bit numbering > > for ordering, as we will want the ordering to be ASYNC < ASYM < SYNC > > and ASYNC is bit-adjacent to SYNC. So I think we will need to make > > ASYM a special case. > > The bit position based order - SYNC < ASYNC < ASYM - is indeed arbitrary > but I think it's easier to follow. When we add ASYM, it will be the last > one rather than squeezing it in the middle of the current order. Any > other order somehow implies that one is better than the other but we > don't have a clear definition for "better". At least from my perspective "more strict" is a reasonable enough definition for "better", at least by default, since it seems like what most users would want. If users really want a different ordering, perhaps it can be an opt-in behavior. > > This would also allow NONE to be upgraded by allocating a bit position > > for NONE, but if we change the value of NONE it may break applications > > built against new headers running on old kernels, so maybe it should > > be made a separate constant. This doesn't need to be done immediately, > > though. > > I think we should leave NONE as non-upgradable, the software doesn't > expect any fault when accessing with the wrong tag. We also can't change > the definition now as it's already in upstream glibc. Right, I was thinking that if we made this upgradeable we would introduce a constant with a different name, like NONE2 or NONE_UPGRADEABLE or something. Peter
On Wed, Jun 30, 2021 at 04:39:02PM -0700, Peter Collingbourne wrote: > On Wed, Jun 30, 2021 at 8:19 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > On Tue, Jun 29, 2021 at 12:11:17PM -0700, Peter Collingbourne wrote: > > > On Tue, Jun 29, 2021 at 3:46 AM Will Deacon <will@kernel.org> wrote: > > > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > > > Another option is a mapping table where async can be remapped to sync > > > > > and sync to async (or even to "none" for both). That's not far from one > > > > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > > > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > > > > to map it to sync on some CPUs but it wouldn't exclude other forced > > > > > settings. > > > > > > > > Catalin and I discussed this offline and ended up with another option: > > > > retrospectively change the prctl() ABI so that the 'flags' argument > > > > accepts a bitmask of modes that the application is willing to accept. This > > > > doesn't break any existing users, as we currently enforce that only one > > > > mode is specified, but it would allow things like: > > > > > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > > > 0, 0, 0); > > > > > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > > > > the difference that I think this extends more naturally as new PR_MTR_TCF_* > > > > flags are introduced. > > > > > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > > > which initially reads as "async". If the root user does, e.g. > > > > > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > > > > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > > > > CPU1, but async mode on other CPUs (assuming they retain the default value). > > > > > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > > > > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > > > > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > > > > only values which the sysfs files would accept today are "sync" and "async". > > > > > > > > When faced with a situation where the prctl() flags for a task do not > > > > intersect with the preferred mode for a CPU on which the task is going > > > > to run, the lowest bit number flag is chosen from the mask set by the > > > > prctl(). > > > > > > > > Thoughts? > > > > > > This all sounds great and I'm glad you were able to come to an > > > agreement on this. I'll get started on implementing it. > > > > > > Once we have ASYM support I'm not sure if we can rely on bit numbering > > > for ordering, as we will want the ordering to be ASYNC < ASYM < SYNC > > > and ASYNC is bit-adjacent to SYNC. So I think we will need to make > > > ASYM a special case. > > > > The bit position based order - SYNC < ASYNC < ASYM - is indeed arbitrary > > but I think it's easier to follow. When we add ASYM, it will be the last > > one rather than squeezing it in the middle of the current order. Any > > other order somehow implies that one is better than the other but we > > don't have a clear definition for "better". > > At least from my perspective "more strict" is a reasonable enough > definition for "better", at least by default, since it seems like what > most users would want. If users really want a different ordering, > perhaps it can be an opt-in behavior. I think some of this hinges on how we want to handle something like a request for PR_MTE_TCF_ASYM | PR_MTE_TCF_ASYNC on a system which doesn't have hardware support for PR_MTE_TCF_ASYM. If this returns an error from the prctl(), then I'm not strongly opposed to having a "more strict" priority ordering for the options, but if this was to succeed, then I think we need something like the bit-position ordering so that the user-visible behaviour doesn't change in a non-discoverable manner based on what the CPU supports. But yes, for now this is all moot with only two options. Will
On Wed, Jul 07, 2021 at 11:30:04AM +0100, Will Deacon wrote: > On Wed, Jun 30, 2021 at 04:39:02PM -0700, Peter Collingbourne wrote: > > On Wed, Jun 30, 2021 at 8:19 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Tue, Jun 29, 2021 at 12:11:17PM -0700, Peter Collingbourne wrote: > > > > On Tue, Jun 29, 2021 at 3:46 AM Will Deacon <will@kernel.org> wrote: > > > > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > > > > Another option is a mapping table where async can be remapped to sync > > > > > > and sync to async (or even to "none" for both). That's not far from one > > > > > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > > > > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > > > > > to map it to sync on some CPUs but it wouldn't exclude other forced > > > > > > settings. > > > > > > > > > > Catalin and I discussed this offline and ended up with another option: > > > > > retrospectively change the prctl() ABI so that the 'flags' argument > > > > > accepts a bitmask of modes that the application is willing to accept. This > > > > > doesn't break any existing users, as we currently enforce that only one > > > > > mode is specified, but it would allow things like: > > > > > > > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > > > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > > > > 0, 0, 0); > > > > > > > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > > > > > the difference that I think this extends more naturally as new PR_MTR_TCF_* > > > > > flags are introduced. > > > > > > > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > > > > which initially reads as "async". If the root user does, e.g. > > > > > > > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > > > > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > > > > > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > > > > > CPU1, but async mode on other CPUs (assuming they retain the default value). > > > > > > > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > > > > > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > > > > > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > > > > > only values which the sysfs files would accept today are "sync" and "async". > > > > > > > > > > When faced with a situation where the prctl() flags for a task do not > > > > > intersect with the preferred mode for a CPU on which the task is going > > > > > to run, the lowest bit number flag is chosen from the mask set by the > > > > > prctl(). > > > > > > > > > > Thoughts? > > > > > > > > This all sounds great and I'm glad you were able to come to an > > > > agreement on this. I'll get started on implementing it. > > > > > > > > Once we have ASYM support I'm not sure if we can rely on bit numbering > > > > for ordering, as we will want the ordering to be ASYNC < ASYM < SYNC > > > > and ASYNC is bit-adjacent to SYNC. So I think we will need to make > > > > ASYM a special case. > > > > > > The bit position based order - SYNC < ASYNC < ASYM - is indeed arbitrary > > > but I think it's easier to follow. When we add ASYM, it will be the last > > > one rather than squeezing it in the middle of the current order. Any > > > other order somehow implies that one is better than the other but we > > > don't have a clear definition for "better". > > > > At least from my perspective "more strict" is a reasonable enough > > definition for "better", at least by default, since it seems like what > > most users would want. If users really want a different ordering, > > perhaps it can be an opt-in behavior. > > I think some of this hinges on how we want to handle something like a > request for PR_MTE_TCF_ASYM | PR_MTE_TCF_ASYNC on a system which doesn't > have hardware support for PR_MTE_TCF_ASYM. If this returns an error from > the prctl(), then I'm not strongly opposed to having a "more strict" > priority ordering for the options, but if this was to succeed, then I think > we need something like the bit-position ordering so that the user-visible > behaviour doesn't change in a non-discoverable manner based on what the CPU > supports. I replied on a subsequent update to this series and I think leaving it as imp def is best ;) (as with other ARM architecture behaviours). https://lore.kernel.org/r/20210701170450.GC12484@arm.com
On Wed, Jul 07, 2021 at 01:55:45PM +0100, Catalin Marinas wrote: > On Wed, Jul 07, 2021 at 11:30:04AM +0100, Will Deacon wrote: > > On Wed, Jun 30, 2021 at 04:39:02PM -0700, Peter Collingbourne wrote: > > > On Wed, Jun 30, 2021 at 8:19 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > On Tue, Jun 29, 2021 at 12:11:17PM -0700, Peter Collingbourne wrote: > > > > > On Tue, Jun 29, 2021 at 3:46 AM Will Deacon <will@kernel.org> wrote: > > > > > > On Mon, Jun 28, 2021 at 04:20:24PM +0100, Catalin Marinas wrote: > > > > > > > Another option is a mapping table where async can be remapped to sync > > > > > > > and sync to async (or even to "none" for both). That's not far from one > > > > > > > of Peter's mte-upgrade-async proposal, we just add mte-map-async and > > > > > > > mte-map-sync options. Most likely we'll just use mte-map-async for now > > > > > > > to map it to sync on some CPUs but it wouldn't exclude other forced > > > > > > > settings. > > > > > > > > > > > > Catalin and I discussed this offline and ended up with another option: > > > > > > retrospectively change the prctl() ABI so that the 'flags' argument > > > > > > accepts a bitmask of modes that the application is willing to accept. This > > > > > > doesn't break any existing users, as we currently enforce that only one > > > > > > mode is specified, but it would allow things like: > > > > > > > > > > > > prctl(PR_SET_TAGGED_ADDR_CTRL, > > > > > > PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC, > > > > > > 0, 0, 0); > > > > > > > > > > > > which is actually very similar to Peter's PR_MTE_DYNAMIC_TCF proposal, with > > > > > > the difference that I think this extends more naturally as new PR_MTR_TCF_* > > > > > > flags are introduced. > > > > > > > > > > > > Then we expose a per-cpu file in sysfs (say "cpuX/mte_tcf_preferred") > > > > > > which initially reads as "async". If the root user does, e.g. > > > > > > > > > > > > # echo "sync" > cpu1/mte_tcf_preferred > > > > > > > > > > > > then a task which has successfully issued a PR_SET_TAGGED_ADDR_CTRL prctl() > > > > > > request for PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC will run in sync mode on > > > > > > CPU1, but async mode on other CPUs (assuming they retain the default value). > > > > > > > > > > > > We'll need to special-case PR_MTE_TCF_NONE, as that's just a shorthand for > > > > > > "no flags" so doing PR_MTE_TCF_NONE | PR_MTE_TCF_SYNC is just the same as > > > > > > doing PR_MTE_TCF_SYNC (which I think is already the behaviour today). The > > > > > > only values which the sysfs files would accept today are "sync" and "async". > > > > > > > > > > > > When faced with a situation where the prctl() flags for a task do not > > > > > > intersect with the preferred mode for a CPU on which the task is going > > > > > > to run, the lowest bit number flag is chosen from the mask set by the > > > > > > prctl(). > > > > > > > > > > > > Thoughts? > > > > > > > > > > This all sounds great and I'm glad you were able to come to an > > > > > agreement on this. I'll get started on implementing it. > > > > > > > > > > Once we have ASYM support I'm not sure if we can rely on bit numbering > > > > > for ordering, as we will want the ordering to be ASYNC < ASYM < SYNC > > > > > and ASYNC is bit-adjacent to SYNC. So I think we will need to make > > > > > ASYM a special case. > > > > > > > > The bit position based order - SYNC < ASYNC < ASYM - is indeed arbitrary > > > > but I think it's easier to follow. When we add ASYM, it will be the last > > > > one rather than squeezing it in the middle of the current order. Any > > > > other order somehow implies that one is better than the other but we > > > > don't have a clear definition for "better". > > > > > > At least from my perspective "more strict" is a reasonable enough > > > definition for "better", at least by default, since it seems like what > > > most users would want. If users really want a different ordering, > > > perhaps it can be an opt-in behavior. > > > > I think some of this hinges on how we want to handle something like a > > request for PR_MTE_TCF_ASYM | PR_MTE_TCF_ASYNC on a system which doesn't > > have hardware support for PR_MTE_TCF_ASYM. If this returns an error from > > the prctl(), then I'm not strongly opposed to having a "more strict" > > priority ordering for the options, but if this was to succeed, then I think > > we need something like the bit-position ordering so that the user-visible > > behaviour doesn't change in a non-discoverable manner based on what the CPU > > supports. > > I replied on a subsequent update to this series and I think leaving it > as imp def is best ;) (as with other ARM architecture behaviours). > > https://lore.kernel.org/r/20210701170450.GC12484@arm.com I agree for now as there are only two flags, but I think we'll need to define the behaviour once we add a third flag because it's a bit rubbish from a userspace perspective otherwise. Will
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst index b540178a93f8..5375d5efd18c 100644 --- a/Documentation/arm64/memory-tagging-extension.rst +++ b/Documentation/arm64/memory-tagging-extension.rst @@ -120,6 +120,25 @@ in the ``PR_MTE_TAG_MASK`` bit-field. interface provides an include mask. An include mask of ``0`` (exclusion mask ``0xffff``) results in the CPU always generating tag ``0``. +Upgrading to stricter tag checking modes +---------------------------------------- + +On some CPUs the performance of MTE in stricter tag checking modes +is similar to that of less strict tag checking modes. This makes it +worthwhile to enable stricter checks on those CPUs when a less strict +checking mode is requested, in order to gain the error detection +benefits of the stricter checks without the performance downsides. To +opt into upgrading to a stricter checking mode on those CPUs, the user +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call. + +This feature is currently only supported for upgrading from +asynchronous mode. To configure a CPU to upgrade from asynchronous mode +to synchronous mode, a privileged user may write the value ``1`` to +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable +upgrading they may write the value ``0``. By default the feature is +disabled on all CPUs. + Initial process state --------------------- @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration: - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) - Tag checking mode set to ``PR_MTE_TCF_NONE`` - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled) - ``PSTATE.TCO`` set to 0 - ``PROT_MTE`` not set on any of the initial memory maps diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index bc88a1ced0d7..719687412798 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -40,6 +40,7 @@ void mte_free_tag_storage(char *storage); void mte_sync_tags(pte_t *ptep, pte_t pte); void mte_copy_page_tags(void *kto, const void *kfrom); void mte_thread_init_user(void); +void mte_update_sctlr_user(struct task_struct *task); void mte_thread_switch(struct task_struct *next); void mte_suspend_enter(void); void mte_suspend_exit(void); @@ -62,6 +63,9 @@ static inline void mte_copy_page_tags(void *kto, const void *kfrom) static inline void mte_thread_init_user(void) { } +static inline void mte_update_sctlr_user(struct task_struct *task) +{ +} static inline void mte_thread_switch(struct task_struct *next) { } diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 9df3feeee890..f8607c3a5706 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -16,6 +16,18 @@ */ #define NET_IP_ALIGN 0 +#define MTE_CTRL_GCR_USER_EXCL_SHIFT 0 +#define MTE_CTRL_GCR_USER_EXCL_MASK 0xffff + +#define MTE_CTRL_TCF_SHIFT 16 +#define MTE_CTRL_TCF_NONE (0UL << MTE_CTRL_TCF_SHIFT) +#define MTE_CTRL_TCF_SYNC (1UL << MTE_CTRL_TCF_SHIFT) +#define MTE_CTRL_TCF_ASYNC (2UL << MTE_CTRL_TCF_SHIFT) +#define MTE_CTRL_TCF_MASK (3UL << MTE_CTRL_TCF_SHIFT) + +#define MTE_CTRL_DYNAMIC_TCF_SHIFT 18 +#define MTE_CTRL_DYNAMIC_TCF (1UL << MTE_CTRL_DYNAMIC_TCF_SHIFT) + #ifndef __ASSEMBLY__ #include <linux/build_bug.h> @@ -151,7 +163,7 @@ struct thread_struct { struct ptrauth_keys_kernel keys_kernel; #endif #ifdef CONFIG_ARM64_MTE - u64 gcr_user_excl; + u64 mte_ctrl; #endif u64 sctlr_user; }; diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 0cb34ccb6e73..63d02cd67b44 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -49,7 +49,7 @@ int main(void) DEFINE(THREAD_KEYS_KERNEL, offsetof(struct task_struct, thread.keys_kernel)); #endif #ifdef CONFIG_ARM64_MTE - DEFINE(THREAD_GCR_EL1_USER, offsetof(struct task_struct, thread.gcr_user_excl)); + DEFINE(THREAD_MTE_CTRL, offsetof(struct task_struct, thread.mte_ctrl)); #endif BLANK(); DEFINE(S_X0, offsetof(struct pt_regs, regs[0])); diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 3513984a88bd..ce59280355c5 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -182,7 +182,7 @@ alternative_else_nop_endif * the RRND (bit[16]) setting. */ mrs_s \tmp2, SYS_GCR_EL1 - bfi \tmp2, \tmp, #0, #16 + bfxil \tmp2, \tmp, #MTE_CTRL_GCR_USER_EXCL_SHIFT, #16 msr_s SYS_GCR_EL1, \tmp2 #endif .endm @@ -205,7 +205,7 @@ alternative_else_nop_endif alternative_if_not ARM64_MTE b 1f alternative_else_nop_endif - ldr \tmp, [\tsk, #THREAD_GCR_EL1_USER] + ldr \tmp, [\tsk, #THREAD_MTE_CTRL] mte_set_gcr \tmp, \tmp2 1: diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 125a10e413e9..0862f3155d0a 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -4,6 +4,7 @@ */ #include <linux/bitops.h> +#include <linux/cpu.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/prctl.h> @@ -26,6 +27,8 @@ u64 gcr_kernel_excl __ro_after_init; static bool report_fault_once = true; +static DEFINE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async); + #ifdef CONFIG_KASAN_HW_TAGS /* Whether the MTE asynchronous mode is enabled. */ DEFINE_STATIC_KEY_FALSE(mte_async_mode); @@ -197,16 +200,6 @@ static void update_gcr_el1_excl(u64 excl) sysreg_clear_set_s(SYS_GCR_EL1, SYS_GCR_EL1_EXCL_MASK, excl); } -static void set_gcr_el1_excl(u64 excl) -{ - current->thread.gcr_user_excl = excl; - - /* - * SYS_GCR_EL1 will be set to current->thread.gcr_user_excl value - * by mte_set_user_gcr() in kernel_exit, - */ -} - void mte_thread_init_user(void) { if (!system_supports_mte()) @@ -216,15 +209,32 @@ void mte_thread_init_user(void) dsb(ish); write_sysreg_s(0, SYS_TFSRE0_EL1); clear_thread_flag(TIF_MTE_ASYNC_FAULT); - /* disable tag checking */ - set_task_sctlr_el1((current->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK) | - SCTLR_EL1_TCF0_NONE); - /* reset tag generation mask */ - set_gcr_el1_excl(SYS_GCR_EL1_EXCL_MASK); + /* disable tag checking and reset tag generation mask */ + current->thread.mte_ctrl = + MTE_CTRL_GCR_USER_EXCL_MASK | MTE_CTRL_TCF_NONE; + mte_update_sctlr_user(current); + set_task_sctlr_el1(current->thread.sctlr_user); +} + +void mte_update_sctlr_user(struct task_struct *task) +{ + unsigned long sctlr = task->thread.sctlr_user; + + sctlr &= ~SCTLR_EL1_TCF0_MASK; + if ((task->thread.mte_ctrl & MTE_CTRL_DYNAMIC_TCF) && + (task->thread.mte_ctrl & MTE_CTRL_TCF_MASK) == MTE_CTRL_TCF_ASYNC) { + sctlr |= __this_cpu_read(mte_upgrade_async); + } else { + sctlr |= ((task->thread.mte_ctrl & MTE_CTRL_TCF_MASK) >> + MTE_CTRL_TCF_SHIFT) << SCTLR_EL1_TCF0_SHIFT; + } + task->thread.sctlr_user = sctlr; } void mte_thread_switch(struct task_struct *next) { + mte_update_sctlr_user(next); + /* * Check if an async tag exception occurred at EL1. * @@ -262,33 +272,34 @@ void mte_suspend_exit(void) long set_mte_ctrl(struct task_struct *task, unsigned long arg) { - u64 sctlr = task->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK; - u64 gcr_excl = ~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) & - SYS_GCR_EL1_EXCL_MASK; + u64 mte_ctrl = (~((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT) & + SYS_GCR_EL1_EXCL_MASK) + << MTE_CTRL_GCR_USER_EXCL_SHIFT; if (!system_supports_mte()) return 0; switch (arg & PR_MTE_TCF_MASK) { case PR_MTE_TCF_NONE: - sctlr |= SCTLR_EL1_TCF0_NONE; + mte_ctrl |= MTE_CTRL_TCF_NONE; break; case PR_MTE_TCF_SYNC: - sctlr |= SCTLR_EL1_TCF0_SYNC; + mte_ctrl |= MTE_CTRL_TCF_SYNC; break; case PR_MTE_TCF_ASYNC: - sctlr |= SCTLR_EL1_TCF0_ASYNC; + mte_ctrl |= MTE_CTRL_TCF_ASYNC; break; default: return -EINVAL; } - if (task != current) { - task->thread.sctlr_user = sctlr; - task->thread.gcr_user_excl = gcr_excl; - } else { - set_task_sctlr_el1(sctlr); - set_gcr_el1_excl(gcr_excl); + if (arg & PR_MTE_DYNAMIC_TCF) + mte_ctrl |= MTE_CTRL_DYNAMIC_TCF; + + task->thread.mte_ctrl = mte_ctrl; + if (task == current) { + mte_update_sctlr_user(task); + set_task_sctlr_el1(task->thread.sctlr_user); } return 0; @@ -297,25 +308,29 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg) long get_mte_ctrl(struct task_struct *task) { unsigned long ret; - u64 incl = ~task->thread.gcr_user_excl & SYS_GCR_EL1_EXCL_MASK; + u64 incl = (~task->thread.mte_ctrl >> MTE_CTRL_GCR_USER_EXCL_SHIFT) & + SYS_GCR_EL1_EXCL_MASK; if (!system_supports_mte()) return 0; ret = incl << PR_MTE_TAG_SHIFT; - switch (task->thread.sctlr_user & SCTLR_EL1_TCF0_MASK) { - case SCTLR_EL1_TCF0_NONE: + switch (task->thread.mte_ctrl & MTE_CTRL_TCF_MASK) { + case MTE_CTRL_TCF_NONE: ret |= PR_MTE_TCF_NONE; break; - case SCTLR_EL1_TCF0_SYNC: + case MTE_CTRL_TCF_SYNC: ret |= PR_MTE_TCF_SYNC; break; - case SCTLR_EL1_TCF0_ASYNC: + case MTE_CTRL_TCF_ASYNC: ret |= PR_MTE_TCF_ASYNC; break; } + if (task->thread.mte_ctrl & MTE_CTRL_DYNAMIC_TCF) + ret |= PR_MTE_DYNAMIC_TCF; + return ret; } @@ -453,3 +468,75 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request, return ret; } + +static ssize_t mte_upgrade_async_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + switch (per_cpu(mte_upgrade_async, dev->id)) { + case SCTLR_EL1_TCF0_ASYNC: + return sysfs_emit(buf, "0\n"); + case SCTLR_EL1_TCF0_SYNC: + return sysfs_emit(buf, "1\n"); + default: + return sysfs_emit(buf, "???\n"); + } +} + +static void sync_sctlr(void *arg) +{ + mte_update_sctlr_user(current); + set_task_sctlr_el1(current->thread.sctlr_user); +} + +static ssize_t mte_upgrade_async_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + u32 val; + u64 tcf; + + ret = kstrtou32(buf, 0, &val); + if (ret < 0) + return ret; + + switch (val) { + case 0: + tcf = SCTLR_EL1_TCF0_ASYNC; + break; + case 1: + tcf = SCTLR_EL1_TCF0_SYNC; + break; + default: + return -EINVAL; + } + + device_lock(dev); + per_cpu(mte_upgrade_async, dev->id) = tcf; + + if (cpu_online(dev->id)) + ret = smp_call_function_single(dev->id, sync_sctlr, NULL, 0); + if (ret == 0) + ret = count; + device_unlock(dev); + + return ret; +} +static DEVICE_ATTR_RW(mte_upgrade_async); + +static int register_mte_upgrade_async_sysctl(void) +{ + unsigned int cpu; + + if (!system_supports_mte()) + return 0; + + for_each_possible_cpu(cpu) { + per_cpu(mte_upgrade_async, cpu) = SCTLR_EL1_TCF0_ASYNC; + device_create_file(get_cpu_device(cpu), + &dev_attr_mte_upgrade_async); + } + + return 0; +} +subsys_initcall(register_mte_upgrade_async_sysctl); diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index b4bb67f17a2c..09bd9c378678 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -659,7 +659,7 @@ long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg) return -EINVAL; if (system_supports_mte()) - valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK; + valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK | PR_MTE_DYNAMIC_TCF; if (arg & ~valid_mask) return -EINVAL; diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 18a9f59dc067..4dab44732814 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -242,6 +242,8 @@ struct prctl_mm_map { /* MTE tag inclusion mask */ # define PR_MTE_TAG_SHIFT 3 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) +/* Enable dynamic upgrading of MTE tag check fault mode */ +# define PR_MTE_DYNAMIC_TCF (1UL << 19) /* Control reclaim behavior when allocating memory */ #define PR_SET_IO_FLUSHER 57
On some CPUs the performance of MTE in synchronous mode is similar to that of asynchronous mode. This makes it worthwhile to enable synchronous mode on those CPUs when asynchronous mode is requested, in order to gain the error detection benefits of synchronous mode without the performance downsides. Therefore, make it possible for user programs to opt into upgrading to synchronous mode on those CPUs via a new prctl flag. The flag is orthogonal to the existing TCF modes in order to accommodate upgrading from other TCF modes in the future. The feature is controlled on a per-CPU basis via sysfs. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8 --- v5: - updated documentation - address some nits in mte.c v4: - switch to new mte_ctrl field - make register_mte_upgrade_async_sysctl return an int - change the sysctl to take 0 or 1 instead of raw TCF values - "same as" -> "similar to" v3: - drop the device tree support - add documentation - add static_assert to ensure no overlap with real HW bits - move per-CPU variable initialization to mte.c - use smp_call_function_single instead of stop_machine v2: - make it an opt-in behavior - change the format of the device tree node - also allow controlling the feature via sysfs .../arm64/memory-tagging-extension.rst | 20 +++ arch/arm64/include/asm/mte.h | 4 + arch/arm64/include/asm/processor.h | 14 +- arch/arm64/kernel/asm-offsets.c | 2 +- arch/arm64/kernel/entry.S | 4 +- arch/arm64/kernel/mte.c | 151 ++++++++++++++---- arch/arm64/kernel/process.c | 2 +- include/uapi/linux/prctl.h | 2 + 8 files changed, 162 insertions(+), 37 deletions(-)