mbox series

[RFC,v0,0/6] x86/AMD: Userspace address tagging

Message ID 20220310111545.10852-1-bharata@amd.com (mailing list archive)
Headers show
Series x86/AMD: Userspace address tagging | expand

Message

Bharata B Rao March 10, 2022, 11:15 a.m. UTC
Hi,

This patchset makes use of Upper Address Ignore (UAI) feature available
on upcoming AMD processors to provide user address tagging support for x86/AMD.

UAI allows software to store a tag in the upper 7 bits of a logical
address [63:57]. When enabled, the processor will suppress the
traditional canonical address checks on the addresses. More information
about UAI can be found in section 5.10 of 'AMD64 Architecture
Programmer's Manual, Vol 2: System Programming' which is available from

https://bugzilla.kernel.org/attachment.cgi?id=300549

Currently ARM64 provides a way for processes to opt-in for
relaxed tagged ABI via prctl() options PR_SET/GET_TAGGED_ADDR_CTRL.
The prctl() API was found to be a bit restrictive for x86 use and
Kirill had posted an extension to it as part of Intel LAM patchset.
(https://lore.kernel.org/linux-mm/20210205151631.43511-1-kirill.shutemov@linux.intel.com/)

This patchset builds on that prctl() extension and adds support
for AMD UAI. AMD implementation is kept separate as equivalent
Intel LAM implementation is likely to be different due to different
bit positions and tag width.

This is an early implementation which has been only lightly tested.
I have used the tags_test.c from selftests/vm/tags/ to test this.
For ARM64 changes, I have only ensured that the changes compile.

Regards,
Bharata.

Bharata B Rao (5):
  x86/cpufeatures: Add Upper Address Ignore(UAI) as CPU feature
  x86: Enable Upper Address Ignore(UAI) feature
  x86: Provide an implementation of untagged_addr()
  x86: Untag user pointers in access_ok()
  x86: Add prctl() options to control tagged user addresses ABI

Kirill A. Shutemov (1):
  mm, arm64: Update PR_SET/GET_TAGGED_ADDR_CTRL interface

 arch/arm64/include/asm/processor.h            |  12 +-
 arch/arm64/kernel/process.c                   |  45 +++++-
 arch/arm64/kernel/ptrace.c                    |   4 +-
 arch/x86/Kconfig                              |   9 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +-
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/include/asm/page_32.h                |   3 +
 arch/x86/include/asm/page_64.h                |  26 ++++
 arch/x86/include/asm/processor.h              |  12 ++
 arch/x86/include/asm/thread_info.h            |   2 +
 arch/x86/include/asm/uaccess.h                |  29 +++-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 arch/x86/kernel/process.c                     | 134 ++++++++++++++++++
 arch/x86/kernel/setup.c                       |   8 ++
 kernel/sys.c                                  |  14 +-
 .../testing/selftests/arm64/tags/tags_test.c  |  31 ----
 .../selftests/{arm64 => vm}/tags/.gitignore   |   0
 .../selftests/{arm64 => vm}/tags/Makefile     |   0
 .../{arm64 => vm}/tags/run_tags_test.sh       |   0
 tools/testing/selftests/vm/tags/tags_test.c   |  59 ++++++++
 20 files changed, 335 insertions(+), 58 deletions(-)
 delete mode 100644 tools/testing/selftests/arm64/tags/tags_test.c
 rename tools/testing/selftests/{arm64 => vm}/tags/.gitignore (100%)
 rename tools/testing/selftests/{arm64 => vm}/tags/Makefile (100%)
 rename tools/testing/selftests/{arm64 => vm}/tags/run_tags_test.sh (100%)
 create mode 100644 tools/testing/selftests/vm/tags/tags_test.c

Comments

David Laight March 10, 2022, 2:32 p.m. UTC | #1
From: Bharata B Rao <bharata@amd.com>
> Sent: 10 March 2022 11:16
> 
> This patchset makes use of Upper Address Ignore (UAI) feature available
> on upcoming AMD processors to provide user address tagging support for x86/AMD.
> 
> UAI allows software to store a tag in the upper 7 bits of a logical
> address [63:57]. When enabled, the processor will suppress the
> traditional canonical address checks on the addresses. More information
> about UAI can be found in section 5.10 of 'AMD64 Architecture
> Programmer's Manual, Vol 2: System Programming' which is available from
> 
> https://bugzilla.kernel.org/attachment.cgi?id=300549

Is that really allowing bit 63 to be used?
That is normally the user-kernel bit.
I can't help feeling that will just badly break things.

Otherwise the best thing is just to change access_ok()
to only reject addresses with the top bit set.
Then you shouldn't need any extra tests in the fast-path
of access_ok().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Dave Hansen March 10, 2022, 3:16 p.m. UTC | #2
On 3/10/22 03:15, Bharata B Rao wrote:>
> This patchset builds on that prctl() extension and adds support
> for AMD UAI. AMD implementation is kept separate as equivalent
> Intel LAM implementation is likely to be different due to different
> bit positions and tag width.

Please don't keep the implementations separate.

We'll have one x86 implementation of address bit masking.  Both the
Intel and AMD implementations will feed into a shared implementation.
Something _like_ the cc_set_mask() interface where both implementations
do their detection and then call into common code to say how many bits
are being ignored.

A good litmus test for this is how many vendor-specific checks there are
in common code.  If there are a lot of them, it's not a good sign for
the design.

I'd also highly suggest going over Kirill's patch set in detail.  There
are things like this:

> https://lore.kernel.org/linux-mm/20210205151631.43511-10-kirill.shutemov@linux.intel.com/

which seem pretty sane to me but which are (I think) missing in this set.

I don't know if we can get there but, in an ideal world, this would be
series with, say 7 patches.  Patches 1-5 are generic enabling.  Patch 6
is tiny and does detection and enabling for UAI.  Patch 7 does the same
for LAM.  All the patches in the series are acked from LAM and UAI folks.
Dave Hansen March 10, 2022, 3:22 p.m. UTC | #3
On 3/10/22 07:16, Dave Hansen wrote:
> I'd also highly suggest going over Kirill's patch set in detail.  There
> are things like this:
> 
>> https://lore.kernel.org/linux-mm/20210205151631.43511-10-kirill.shutemov@linux.intel.com/
> which seem pretty sane to me but which are (I think) missing in this set.

Oh silly me, that's just for LAM_U48 which ignores more bits than
LAM_U57 or UAI.

Either way, I know you've looked at Kirill's set, but please do go over
it in detail to make sure there's nothing else you need to lift out of
there.
Dave Hansen March 10, 2022, 4:45 p.m. UTC | #4
On 3/10/22 06:32, David Laight wrote:
>> UAI allows software to store a tag in the upper 7 bits of a logical
>> address [63:57]. When enabled, the processor will suppress the
>> traditional canonical address checks on the addresses. More information
>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>> Programmer's Manual, Vol 2: System Programming' which is available from
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=300549
> Is that really allowing bit 63 to be used?
> That is normally the user-kernel bit.
> I can't help feeling that will just badly break things.

Yeah, this does seem worrisome.  The LAM approach[1] retains
canonicality checking for bit 63.


1.
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
David Laight March 10, 2022, 5:19 p.m. UTC | #5
From: Dave Hansen <dave.hansen@intel.com>
> Sent: 10 March 2022 16:46
> 
> On 3/10/22 06:32, David Laight wrote:
> >> UAI allows software to store a tag in the upper 7 bits of a logical
> >> address [63:57]. When enabled, the processor will suppress the
> >> traditional canonical address checks on the addresses. More information
> >> about UAI can be found in section 5.10 of 'AMD64 Architecture
> >> Programmer's Manual, Vol 2: System Programming' which is available from
> >>
> >> https://bugzilla.kernel.org/attachment.cgi?id=300549
> > Is that really allowing bit 63 to be used?
> > That is normally the user-kernel bit.
> > I can't help feeling that will just badly break things.
> 
> Yeah, this does seem worrisome.  The LAM approach[1] retains
> canonicality checking for bit 63.

Actually it is rather worse than 'worrisome'.
Allowing the user all address upto the base of the valid
kernel addresses (probably tags to 3e, but not 3f)
means that you can't use a fast address check in access_ok().
You are forced to use the strict test that 32bit kernels use.

Otherwise for 64bit access_ok() need only test address < 0
and rely on kernel code reading something below the (big)
offset to valid kernel addresses.
No real need to include the length at all.

If the hardware is just ignoring the high address bits
then the should be no need to mask them in kernel.
The required kernel accesses to user memory should 'just work'.

Of course, the bit to enable this (wherever it is) needs
to be restored on every process switch.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Bharata B Rao March 11, 2022, 5:42 a.m. UTC | #6
On 3/10/2022 10:49 PM, David Laight wrote:
> From: Dave Hansen <dave.hansen@intel.com>
>> Sent: 10 March 2022 16:46
>>
>> On 3/10/22 06:32, David Laight wrote:
>>>> UAI allows software to store a tag in the upper 7 bits of a logical
>>>> address [63:57]. When enabled, the processor will suppress the
>>>> traditional canonical address checks on the addresses. More information
>>>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>>>> Programmer's Manual, Vol 2: System Programming' which is available from
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fattachment.cgi%3Fid%3D300549&amp;data=04%7C01%7Cbharata%40amd.com%7Ca1de24223931481b3fcb08da02ba2e6f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637825295938946622%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=HijEAUq172r8YwkcCuhvl99Vk5BwE6iSROXcSQXmJHk%3D&amp;reserved=0
>>> Is that really allowing bit 63 to be used?
>>> That is normally the user-kernel bit.
>>> I can't help feeling that will just badly break things.
>>
>> Yeah, this does seem worrisome.  The LAM approach[1] retains
>> canonicality checking for bit 63.
> 
> Actually it is rather worse than 'worrisome'.
> Allowing the user all address upto the base of the valid
> kernel addresses (probably tags to 3e, but not 3f)
> means that you can't use a fast address check in access_ok().
> You are forced to use the strict test that 32bit kernels use.

From what I see, there is a single implementation of access_ok()
in arch/x86/asm/include/uaccess.h that does check if the user
address+size exceeds the limit.

Guess I am missing something, but can you please point me to the fast
implementation(that benefits from bit 63 being user/kernel address
disambiguation bit) and the strict checking in 32bit kernels that
are you are referring to?

Also I wonder here why ARM64 TBI which also uses the full upper byte
(including bit 63) for storing tag/metadata doesn't suffer from
this same problem?

Regards,
Bharata.
David Laight March 11, 2022, 8:15 a.m. UTC | #7
From: Bharata B Rao
> Sent: 11 March 2022 05:43
> On 3/10/2022 10:49 PM, David Laight wrote:
> > From: Dave Hansen <dave.hansen@intel.com>
> >> Sent: 10 March 2022 16:46
> >>
> >> On 3/10/22 06:32, David Laight wrote:
> >>>> UAI allows software to store a tag in the upper 7 bits of a logical
> >>>> address [63:57]. When enabled, the processor will suppress the
> >>>> traditional canonical address checks on the addresses. More information
> >>>> about UAI can be found in section 5.10 of 'AMD64 Architecture
> >>>> Programmer's Manual, Vol 2: System Programming' which is available from
> >>>>
,,,
> >>> Is that really allowing bit 63 to be used?
> >>> That is normally the user-kernel bit.
> >>> I can't help feeling that will just badly break things.
> >>
> >> Yeah, this does seem worrisome.  The LAM approach[1] retains
> >> canonicality checking for bit 63.
> >
> > Actually it is rather worse than 'worrisome'.
> > Allowing the user all address upto the base of the valid
> > kernel addresses (probably tags to 3e, but not 3f)
> > means that you can't use a fast address check in access_ok().
> > You are forced to use the strict test that 32bit kernels use.
> 
> From what I see, there is a single implementation of access_ok()
> in arch/x86/asm/include/uaccess.h that does check if the user
> address+size exceeds the limit.
> 
> Guess I am missing something, but can you please point me to the fast
> implementation(that benefits from bit 63 being user/kernel address
> disambiguation bit) and the strict checking in 32bit kernels that
> are you are referring to?

You can just check ((address | size) >> 62) on 64bit arch that
use bit 63 to select user/kernel and have a massive address
hole near the boundary.
The compiler optimises out constant size from that calculation.
On x86-64 non-canonical addresses give a different fault
to 'page not present' - but that can be handled.

In practice you can (probably) even completely ignore the 'size'
and just error if the top bit of the address is set.
While accesses might not be completely sequential they aren't
going to jump over the non-canonical addresses.

> Also I wonder here why ARM64 TBI which also uses the full upper byte
> (including bit 63) for storing tag/metadata doesn't suffer from
> this same problem?

The user-kernel bit is lower down (something like bit 53) so the
tags are using separate bits.
Although that choice of user/kernel bit makes life hard elsewhere.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Bharata B Rao March 11, 2022, 9:11 a.m. UTC | #8
On 3/11/2022 1:45 PM, David Laight wrote:
> From: Bharata B Rao
>> Sent: 11 March 2022 05:43
>> On 3/10/2022 10:49 PM, David Laight wrote:
>>> From: Dave Hansen <dave.hansen@intel.com>
>>>> Sent: 10 March 2022 16:46
>>>>
>>>> On 3/10/22 06:32, David Laight wrote:
>>>>>> UAI allows software to store a tag in the upper 7 bits of a logical
>>>>>> address [63:57]. When enabled, the processor will suppress the
>>>>>> traditional canonical address checks on the addresses. More information
>>>>>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>>>>>> Programmer's Manual, Vol 2: System Programming' which is available from
>>>>>>
> ,,,
>>>>> Is that really allowing bit 63 to be used?
>>>>> That is normally the user-kernel bit.
>>>>> I can't help feeling that will just badly break things.
>>>>
>>>> Yeah, this does seem worrisome.  The LAM approach[1] retains
>>>> canonicality checking for bit 63.
>>>
>>> Actually it is rather worse than 'worrisome'.
>>> Allowing the user all address upto the base of the valid
>>> kernel addresses (probably tags to 3e, but not 3f)
>>> means that you can't use a fast address check in access_ok().
>>> You are forced to use the strict test that 32bit kernels use.
>>
>> From what I see, there is a single implementation of access_ok()
>> in arch/x86/asm/include/uaccess.h that does check if the user
>> address+size exceeds the limit.
>>
>> Guess I am missing something, but can you please point me to the fast
>> implementation(that benefits from bit 63 being user/kernel address
>> disambiguation bit) and the strict checking in 32bit kernels that
>> are you are referring to?
> 
> You can just check ((address | size) >> 62) on 64bit arch that
> use bit 63 to select user/kernel and have a massive address
> hole near the boundary.
> The compiler optimises out constant size from that calculation.
> On x86-64 non-canonical addresses give a different fault
> to 'page not present' - but that can be handled.

Ok, so are you mentioning about a future optimization to access_ok()
that could benefit by retaining bit 63 as kernel/user bit?

Since you said using bit 63 to store metadata will break things,
I was trying to understand how and where does it break.

Regards,
Bharata.
David Laight March 11, 2022, 9:36 a.m. UTC | #9
From: Bharata B Rao
> Sent: 11 March 2022 09:11
> 
> On 3/11/2022 1:45 PM, David Laight wrote:
> > From: Bharata B Rao
> >> Sent: 11 March 2022 05:43
> >> On 3/10/2022 10:49 PM, David Laight wrote:
> >>> From: Dave Hansen <dave.hansen@intel.com>
> >>>> Sent: 10 March 2022 16:46
> >>>>
> >>>> On 3/10/22 06:32, David Laight wrote:
> >>>>>> UAI allows software to store a tag in the upper 7 bits of a logical
> >>>>>> address [63:57]. When enabled, the processor will suppress the
> >>>>>> traditional canonical address checks on the addresses. More information
> >>>>>> about UAI can be found in section 5.10 of 'AMD64 Architecture
> >>>>>> Programmer's Manual, Vol 2: System Programming' which is available from
> >>>>>>
> > ,,,
> >>>>> Is that really allowing bit 63 to be used?
> >>>>> That is normally the user-kernel bit.
> >>>>> I can't help feeling that will just badly break things.
> >>>>
> >>>> Yeah, this does seem worrisome.  The LAM approach[1] retains
> >>>> canonicality checking for bit 63.
> >>>
> >>> Actually it is rather worse than 'worrisome'.
> >>> Allowing the user all address upto the base of the valid
> >>> kernel addresses (probably tags to 3e, but not 3f)
> >>> means that you can't use a fast address check in access_ok().
> >>> You are forced to use the strict test that 32bit kernels use.
> >>
> >> From what I see, there is a single implementation of access_ok()
> >> in arch/x86/asm/include/uaccess.h that does check if the user
> >> address+size exceeds the limit.
> >>
> >> Guess I am missing something, but can you please point me to the fast
> >> implementation(that benefits from bit 63 being user/kernel address
> >> disambiguation bit) and the strict checking in 32bit kernels that
> >> are you are referring to?
> >
> > You can just check ((address | size) >> 62) on 64bit arch that
> > use bit 63 to select user/kernel and have a massive address
> > hole near the boundary.
> > The compiler optimises out constant size from that calculation.
> > On x86-64 non-canonical addresses give a different fault
> > to 'page not present' - but that can be handled.
> 
> Ok, so are you mentioning about a future optimization to access_ok()
> that could benefit by retaining bit 63 as kernel/user bit?
> 
> Since you said using bit 63 to store metadata will break things,
> I was trying to understand how and where does it break.

Kernel addresses start at 0xffxxx (for 57bit, 5 level page tables).
(Maybe the valid ones are still 0xffff8xxx.)
so you really don't want userspace using those to alias valid user
addresses.

I'm not entirely sure what enabling UAI does.
But the user page tables have to contain mappings for some kernel
addresses (even with page table separation).
Also you really don't want to have to mask off the high address
bits before a kernel access to the use buffer.
So it isn't really obvious how addresses that start 0xff can be used.
and that rather implies you can use bit 63 at all (without horrid
logic in some (probably) very critical hardware timing paths.

Wikipedia also notes:
    Intel has implemented a scheme with a 5-level page table, which allows
    Intel 64 processors to support a 57-bit virtual address space.
    Further extensions may allow full 64-bit virtual address space and
    physical memory by expanding the page table entry size to 128-bit,
    and reduce page walks in the 5-level hierarchy by using a larger 64 KiB
    page allocation size that still supports 4 KiB page operations for
    backward compatibility.
If they implement 64K pages then you lose the extra bits.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Dave Hansen March 11, 2022, 4:51 p.m. UTC | #10
On 3/11/22 01:36, David Laight wrote:
> Wikipedia also notes:
>     Intel has implemented a scheme with a 5-level page table, which allows
>     Intel 64 processors to support a 57-bit virtual address space.
>     Further extensions may allow full 64-bit virtual address space and
>     physical memory by expanding the page table entry size to 128-bit,
>     and reduce page walks in the 5-level hierarchy by using a larger 64 KiB
>     page allocation size that still supports 4 KiB page operations for
>     backward compatibility.
> If they implement 64K pages then you lose the extra bits.

I can't believe I need to say this:  Wikipedia is not an authoritative
source about what anyone is going to do with their CPUs in the future.
Please don't base any Linux decisions off this information.
Bharata B Rao March 14, 2022, 5 a.m. UTC | #11
On 3/10/2022 8:46 PM, Dave Hansen wrote:
> On 3/10/22 03:15, Bharata B Rao wrote:>
>> This patchset builds on that prctl() extension and adds support
>> for AMD UAI. AMD implementation is kept separate as equivalent
>> Intel LAM implementation is likely to be different due to different
>> bit positions and tag width.
> 
> Please don't keep the implementations separate.
> 
> We'll have one x86 implementation of address bit masking.  Both the
> Intel and AMD implementations will feed into a shared implementation.
> Something _like_ the cc_set_mask() interface where both implementations
> do their detection and then call into common code to say how many bits
> are being ignored.

The difference isn't limited to the number of ignored bits, see below...

> 
> A good litmus test for this is how many vendor-specific checks there are
> in common code.  If there are a lot of them, it's not a good sign for
> the design.

That is generally true but considering the below differences, I felt it
was much cleaner to go for separate implementations for enablement and
untagging while still building on the common prctl() extension in the
RFC version.

1. While Intel LAM provides two LAM widths (15 and 6 bits wide), AMD UAI
has a fixed tag width of 7 bits. This results in following differences
in the implementation:
   - Two threadinfo flags (TIF_LAM_57 and TIF_LAM_48) in Intel LAM vs
     single flag TIF_TAGGED_ADDR(like in ARM64) in AMD UAI.
   - Untagging needs to be aware of 2 widths in Intel LAM as against
     a single width in AMD UAI.
   - Requirement of making LAM_U48 and mappings above 47bits mutually
     exclusive is required for Intel LAM only.

2. The enablement bit is part of CR3 in Intel LAM which brings in
additional complexity of updating the CR3 with right LAM mode on every
MM switch and associated tlbstate changes. In AMD UAI, enablement bit
is part of EFER MSR and it is a single time enablement with no MM switch
time changes.

However, let me take a detailed look once again and explore further
code sharing possibilities.

Regards,
Bharata.
Dave Hansen March 14, 2022, 7:03 a.m. UTC | #12
On 3/13/22 22:00, Bharata B Rao wrote:
> On 3/10/2022 8:46 PM, Dave Hansen wrote:> 1. While Intel LAM provides two LAM widths (15 and 6 bits wide), AMD UAI
> has a fixed tag width of 7 bits. This results in following differences
> in the implementation:
>    - Two threadinfo flags (TIF_LAM_57 and TIF_LAM_48) in Intel LAM vs
>      single flag TIF_TAGGED_ADDR(like in ARM64) in AMD UAI.
>    - Untagging needs to be aware of 2 widths in Intel LAM as against
>      a single width in AMD UAI.

I'd be perfectly happy with an initial version of this stuff that only
comprehends UAI and LAM_57.  As long as the ABI can be used for all
three cases, adding the two most similar ones initially would make a lot
of sense.

>    - Requirement of making LAM_U48 and mappings above 47bits mutually
>      exclusive is required for Intel LAM only.
> 
> 2. The enablement bit is part of CR3 in Intel LAM which brings in
> additional complexity of updating the CR3 with right LAM mode on every
> MM switch and associated tlbstate changes. In AMD UAI, enablement bit
> is part of EFER MSR and it is a single time enablement with no MM switch
> time changes.

Hold on a sec.  The context-switching is a *policy*.  A _kernel_ policy.
If we wanted the LAM settings to be static and system wide, we'd just
have a single:

	if (cpu_feature_enabled(LAM))
		cr3 |= LAM_MASK;

in build_cr3().  That's not exactly complicated.

You know what's a heck of a lot more complicated than that?  Adding
context-switching for EFER.

I can't imagine a world where we want this tagging to be system-wide.
There *ARE* going to be apps that break with this pointer tagging stuff.
 Normal, user apps.  Apps in containers.  With a system-wide setting,
they have zero recourse when things break.  All a user can do is reboot
the kernel.

As-is, it seems like it would be awful to even develop apps that use
tagging.  You always want to test them with tagging on and off.  With
this, you need to reboot to test either way.

Also, the prctl() in Kirill's version actually enables and disables the
hardware feature in addition to the in-kernel tag/untag operations.
This series takes the same ABI and doesn't change the hardware feature
state.  That will need to be rectified at some point.

Speaking of which, it's also really worrying that kernel behavior is
affected by _EFER_UAI.  When tagging is "disabled" in the prctl(),
_EFER_UAI is still set.  The kernel can go merrily dereferencing both
kernel and userspace pointers with no canonical checks.  That seems
scary.  LAM's supervisor separation makes things a lot less scary.
Andy Lutomirski March 21, 2022, 10:29 p.m. UTC | #13
On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
> Hi,
>
> This patchset makes use of Upper Address Ignore (UAI) feature available
> on upcoming AMD processors to provide user address tagging support for x86/AMD.
>
> UAI allows software to store a tag in the upper 7 bits of a logical
> address [63:57]. When enabled, the processor will suppress the
> traditional canonical address checks on the addresses. More information
> about UAI can be found in section 5.10 of 'AMD64 Architecture
> Programmer's Manual, Vol 2: System Programming' which is available from
>
> https://bugzilla.kernel.org/attachment.cgi?id=300549

I hate to be a pain, but I'm really not convinced that this feature is suitable for Linux.  There are a few reasons:

Right now, the concept that the high bit of an address determines whether it's a user or a kernel address is fairly fundamental to the x86_64 (and x86_32!) code.  It may not be strictly necessary to preserve this, but violating it would require substantial thought.  With UAI enabled, kernel and user addresses are, functionally, interleaved.  This makes things like access_ok checks, and more generally anything that operates on a range of addresses, behave potentially quite differently.  A lot of auditing of existing code would be needed to make it safe.

UAI looks like it wasn't intended to be context switched and, indeed, your series doesn't context switch it.  As far as I'm concerned, this is an error, and if we support UAI at all, we should context switch it.  Yes, this will be slow, perhaps painfully slow.  AMD knows how to fix it by, for example, reading the Intel SDM.  By *not* context switching UAI, we force it on for all user code, including unsuspecting user code, as well as for kernel code.  Do we actually want it on for kernel code?  With LAM, in contrast, the semantics for kernel pointers vs user pointers actually make sense and can be set per mm, which will make things like io_uring (in theory) do the right thing.

UAI and LAM are incompatible from a userspace perspective.  Since LAM is pretty clearly superior [0], it seems like a better long term outcome would be for programs that want tag bits to target LAM and for AMD to support LAM if there is demand.  For that matter, do we actually expect any userspace to want to support UAI?  (Are there existing too-clever sandboxes that would be broken by enabling UAI?)

Given that UAI is not efficiently context switched, the implementation of uaccess is rather bizarre.  With the implementation in this series in particular, if the access_ok checks ever get out of sync with actual user access, a user access could be emitted with the high bits not masked despite the range check succeeding due to masking, which would, unless great care is taken, result in a "user" access hitting the kernel range.  That's no good.

I believe it's possible for a high-quality kernel UAI implementation to exist, but, as above, I think it would be slow, and it might be quite complex and fragile.  Are we sure that it's worth supporting it?

[0] I hope I don't have to argue this point.



>
> Currently ARM64 provides a way for processes to opt-in for
> relaxed tagged ABI via prctl() options PR_SET/GET_TAGGED_ADDR_CTRL.
> The prctl() API was found to be a bit restrictive for x86 use and
> Kirill had posted an extension to it as part of Intel LAM patchset.
> (https://lore.kernel.org/linux-mm/20210205151631.43511-1-kirill.shutemov@linux.intel.com/)
>
> This patchset builds on that prctl() extension and adds support
> for AMD UAI. AMD implementation is kept separate as equivalent
> Intel LAM implementation is likely to be different due to different
> bit positions and tag width.
>
> This is an early implementation which has been only lightly tested.
> I have used the tags_test.c from selftests/vm/tags/ to test this.
> For ARM64 changes, I have only ensured that the changes compile.
>
> Regards,
> Bharata.
>
> Bharata B Rao (5):
>   x86/cpufeatures: Add Upper Address Ignore(UAI) as CPU feature
>   x86: Enable Upper Address Ignore(UAI) feature
>   x86: Provide an implementation of untagged_addr()
>   x86: Untag user pointers in access_ok()
>   x86: Add prctl() options to control tagged user addresses ABI
>
> Kirill A. Shutemov (1):
>   mm, arm64: Update PR_SET/GET_TAGGED_ADDR_CTRL interface
>
>  arch/arm64/include/asm/processor.h            |  12 +-
>  arch/arm64/kernel/process.c                   |  45 +++++-
>  arch/arm64/kernel/ptrace.c                    |   4 +-
>  arch/x86/Kconfig                              |   9 ++
>  arch/x86/include/asm/cpufeatures.h            |   2 +-
>  arch/x86/include/asm/msr-index.h              |   2 +
>  arch/x86/include/asm/page_32.h                |   3 +
>  arch/x86/include/asm/page_64.h                |  26 ++++
>  arch/x86/include/asm/processor.h              |  12 ++
>  arch/x86/include/asm/thread_info.h            |   2 +
>  arch/x86/include/asm/uaccess.h                |  29 +++-
>  arch/x86/kernel/cpu/scattered.c               |   1 +
>  arch/x86/kernel/process.c                     | 134 ++++++++++++++++++
>  arch/x86/kernel/setup.c                       |   8 ++
>  kernel/sys.c                                  |  14 +-
>  .../testing/selftests/arm64/tags/tags_test.c  |  31 ----
>  .../selftests/{arm64 => vm}/tags/.gitignore   |   0
>  .../selftests/{arm64 => vm}/tags/Makefile     |   0
>  .../{arm64 => vm}/tags/run_tags_test.sh       |   0
>  tools/testing/selftests/vm/tags/tags_test.c   |  59 ++++++++
>  20 files changed, 335 insertions(+), 58 deletions(-)
>  delete mode 100644 tools/testing/selftests/arm64/tags/tags_test.c
>  rename tools/testing/selftests/{arm64 => vm}/tags/.gitignore (100%)
>  rename tools/testing/selftests/{arm64 => vm}/tags/Makefile (100%)
>  rename tools/testing/selftests/{arm64 => vm}/tags/run_tags_test.sh (100%)
>  create mode 100644 tools/testing/selftests/vm/tags/tags_test.c
>
> -- 
> 2.25.1
Thomas Gleixner March 21, 2022, 10:59 p.m. UTC | #14
On Mon, Mar 21 2022 at 15:29, Andy Lutomirski wrote:
> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
>> This patchset makes use of Upper Address Ignore (UAI) feature available
>> on upcoming AMD processors to provide user address tagging support for x86/AMD.
>>
>> UAI allows software to store a tag in the upper 7 bits of a logical
>> address [63:57]. When enabled, the processor will suppress the
>> traditional canonical address checks on the addresses. More information
>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>> Programmer's Manual, Vol 2: System Programming' which is available from
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=300549
>
> I hate to be a pain, but I'm really not convinced that this feature is
> suitable for Linux.  There are a few reasons:

Abusing bit 63 is not suitable for any OS in my opinion.

> Right now, the concept that the high bit of an address determines
> whether it's a user or a kernel address is fairly fundamental to the
> x86_64 (and x86_32!) code.  It may not be strictly necessary to
> preserve this, but violating it would require substantial thought.
> With UAI enabled, kernel and user addresses are, functionally,
> interleaved.  This makes things like access_ok checks, and more
> generally anything that operates on a range of addresses, behave
> potentially quite differently.  A lot of auditing of existing code
> would be needed to make it safe.

Which might be finished ten years from now....

Seriously there is no justification for the bit 63 abuse. This has been
pointed out by various people to AMD before this saw the public. Other
vendors seem to have gotten the memo.

The proper solution here is to issue an erratum and fix this nonsense in
microcode for the already taped out silicon and get rid of it in the
design of future ones completely. Anything else is just wishful
thinking.

Thanks,

        tglx
David Laight March 22, 2022, 5:31 a.m. UTC | #15
From: Andy Lutomirski
> Sent: 21 March 2022 22:30
> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
> > Hi,
> >
> > This patchset makes use of Upper Address Ignore (UAI) feature available
> > on upcoming AMD processors to provide user address tagging support for x86/AMD.
> >
> > UAI allows software to store a tag in the upper 7 bits of a logical
> > address [63:57]. When enabled, the processor will suppress the
> > traditional canonical address checks on the addresses. More information
> > about UAI can be found in section 5.10 of 'AMD64 Architecture
> > Programmer's Manual, Vol 2: System Programming' which is available from
> >
> > https://bugzilla.kernel.org/attachment.cgi?id=300549
> 
> I hate to be a pain, but I'm really not convinced that this feature is suitable for Linux.  There are
> a few reasons:
> 
> Right now, the concept that the high bit of an address determines whether it's a user or a kernel
> address is fairly fundamental to the x86_64 (and x86_32!) code.  It may not be strictly necessary to
> preserve this, but violating it would require substantial thought.  With UAI enabled, kernel and user
> addresses are, functionally, interleaved.  This makes things like access_ok checks, and more generally
> anything that operates on a range of addresses, behave potentially quite differently.  A lot of
> auditing of existing code would be needed to make it safe.
> 
> UAI looks like it wasn't intended to be context switched and, indeed, your series doesn't context
> switch it.  As far as I'm concerned, this is an error, and if we support UAI at all, we should context
> switch it.  Yes, this will be slow, perhaps painfully slow.  AMD knows how to fix it by, for example,
> reading the Intel SDM.  By *not* context switching UAI, we force it on for all user code, including
> unsuspecting user code, as well as for kernel code.  Do we actually want it on for kernel code?  With
> LAM, in contrast, the semantics for kernel pointers vs user pointers actually make sense and can be
> set per mm, which will make things like io_uring (in theory) do the right thing.

You also need the kernel to be able to use the 'tagged'
addresses to access userspace - ie without having to
mask off any tag.

This is really for the same reason they want hardware
support for tagged addresses to avoid having to 'untag'
before calling library routines.

I'm not even sure which address the page fault handler
should see - it will need to the difference between a
normal process using a non-canonical address and a PF
on a tagged address.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Bharata B Rao March 23, 2022, 7:48 a.m. UTC | #16
On 3/22/2022 3:59 AM, Andy Lutomirski wrote:
> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
>> Hi,
>>
>> This patchset makes use of Upper Address Ignore (UAI) feature available
>> on upcoming AMD processors to provide user address tagging support for x86/AMD.
>>
>> UAI allows software to store a tag in the upper 7 bits of a logical
>> address [63:57]. When enabled, the processor will suppress the
>> traditional canonical address checks on the addresses. More information
>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>> Programmer's Manual, Vol 2: System Programming' which is available from
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=300549
> 
> I hate to be a pain, but I'm really not convinced that this feature is suitable for Linux.  There are a few reasons:
> 
> Right now, the concept that the high bit of an address determines whether it's a user or a kernel address is fairly fundamental to the x86_64 (and x86_32!) code.  It may not be strictly necessary to preserve this, but violating it would require substantial thought.  With UAI enabled, kernel and user addresses are, functionally, interleaved.  This makes things like access_ok checks, and more generally anything that operates on a range of addresses, behave potentially quite differently.  A lot of auditing of existing code would be needed to make it safe.

Ok got that. However can you point to me a few instances in the current
kernel code where such assumption of high bit being user/kernel address
differentiator exists so that I get some idea of what it takes to
audit all such cases?

Also wouldn't the problem of high bit be solved by using only the
6 out of 7 available bits in UAI and leaving the 63rd bit alone?
The hardware will still ignore the top bit, but this should take
care of the requirement of high bit being 0/1 for user/kernel in the
x86_64 kernel. Wouldn't that work?

> 
> UAI looks like it wasn't intended to be context switched and, indeed, your series doesn't context switch it.  As far as I'm concerned, this is an error, and if we support UAI at all, we should context switch it.  Yes, this will be slow, perhaps painfully slow.  AMD knows how to fix it by, for example, reading the Intel SDM.  By *not* context switching UAI, we force it on for all user code, including unsuspecting user code, as well as for kernel code.  Do we actually want it on for kernel code?  With LAM, in contrast, the semantics for kernel pointers vs user pointers actually make sense and can be set per mm, which will make things like io_uring (in theory) do the right thing.

I plan to enable/disable UAI based on the next task's settings by
doing MSR write to EFER during context switch. I will have to measure
how much additional cost an MSR write in context switch path brings in.
However given that without a hardware feature like ARM64 MTE, this would
primarily be used in non-production environments. Hence I wonder if MSR
write cost could be tolerated?

Regarding enabling UAI for kernel, I will have to check how clean and
efficient it would be to disable/enable UAI on user/kernel entry/exit
points.

> 
> UAI and LAM are incompatible from a userspace perspective.  Since LAM is pretty clearly superior [0], it seems like a better long term outcome would be for programs that want tag bits to target LAM and for AMD to support LAM if there is demand.  For that matter, do we actually expect any userspace to want to support UAI?  (Are there existing too-clever sandboxes that would be broken by enabling UAI?)
> 
> Given that UAI is not efficiently context switched, the implementation of uaccess is rather bizarre.  With the implementation in this series in particular, if the access_ok checks ever get out of sync with actual user access, a user access could be emitted with the high bits not masked despite the range check succeeding due to masking, which would, unless great care is taken, result in a "user" access hitting the kernel range.  That's no good.

Okay, I guess if context switching and sticking to 6 bits as mentioned
earlier is feasible, this concern too goes away unless I am missing something.

Thanks for your feedback.

Regards,
Bharata.
Dave Hansen April 1, 2022, 7:25 p.m. UTC | #17
On 3/23/22 00:48, Bharata B Rao wrote:
> Ok got that. However can you point to me a few instances in the current
> kernel code where such assumption of high bit being user/kernel address
> differentiator exists so that I get some idea of what it takes to
> audit all such cases?

Look around for comparisons against TASK_SIZE_MAX.
arch/x86/lib/putuser.S or something like arch_check_bp_in_kernelspace()
come to mind.

> Also wouldn't the problem of high bit be solved by using only the
> 6 out of 7 available bits in UAI and leaving the 63rd bit alone?
> The hardware will still ignore the top bit, but this should take
> care of the requirement of high bit being 0/1 for user/kernel in the
> x86_64 kernel. Wouldn't that work?

I don't think so.

The kernel knows that a dereference of a pointer that looks like a
kernel address that get kernel data.  Userspace must be kept away from
things that look like kernel addresses.

Let's say some app does:

	void *ptr = (void *)0xffffffffc038d130;
	read(fd, ptr, 1);

and inside the kernel, that boils down to:

	put_user('x', 0xffffffffc038d130);

Today the kernels knows that 0xffffffffc038d130 is >=TASK_SIZE_MAX, so
this is obviously naughty userspace trying to write to the kernel.  But,
it's not obviously wrong if the high bits are ignored.

Like you said, we could, as a convention, check for the highest bit
being set and use *that* to indicate a kernel address.  But, the sneaky
old userspace would just do:

	put_user('x', 0x7fffffffc038d130);

It would pass the "high bit" check since that bit is clear, but it still
accesses kernel memory because UAI ignores the bit userspace just cleared.

I think the only way to get around this is to go find every single place
in the kernel that does a userspace address check and ensure that it
fully untags the pointer first.

...
> However given that without a hardware feature like ARM64 MTE, this would
> primarily be used in non-production environments. Hence I wonder if MSR
> write cost could be tolerated?

It would be great of the mysterious folks who asked both Intel and AMD
for this feature could weigh in on this thread.  But, I've been assuming
that these features will be used to accelerate address sanitizers which
used heavily today in non-production environments but are (generally)
too slow for production.

I'd be very surprised if folks wanted this snazzy new hardware feature
from every CPU vendor on the planet just to speed up their
non-production environments.

I'd be less surprised if they wanted to expand the use of pointer
tagging into more production environments.
Andy Lutomirski April 1, 2022, 7:41 p.m. UTC | #18
On Wed, Mar 23, 2022, at 12:48 AM, Bharata B Rao wrote:
> On 3/22/2022 3:59 AM, Andy Lutomirski wrote:
>> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
>>> Hi,
>>>
>>> This patchset makes use of Upper Address Ignore (UAI) feature available
>>> on upcoming AMD processors to provide user address tagging support for x86/AMD.
>>>
>>> UAI allows software to store a tag in the upper 7 bits of a logical
>>> address [63:57]. When enabled, the processor will suppress the
>>> traditional canonical address checks on the addresses. More information
>>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>>> Programmer's Manual, Vol 2: System Programming' which is available from
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=300549
>> 
>> I hate to be a pain, but I'm really not convinced that this feature is suitable for Linux.  There are a few reasons:
>> 
>> Right now, the concept that the high bit of an address determines whether it's a user or a kernel address is fairly fundamental to the x86_64 (and x86_32!) code.  It may not be strictly necessary to preserve this, but violating it would require substantial thought.  With UAI enabled, kernel and user addresses are, functionally, interleaved.  This makes things like access_ok checks, and more generally anything that operates on a range of addresses, behave potentially quite differently.  A lot of auditing of existing code would be needed to make it safe.
>
> Ok got that. However can you point to me a few instances in the current
> kernel code where such assumption of high bit being user/kernel address
> differentiator exists so that I get some idea of what it takes to
> audit all such cases?

Anything that thinks that an address >= some value means kernel.

>
> Also wouldn't the problem of high bit be solved by using only the
> 6 out of 7 available bits in UAI and leaving the 63rd bit alone?
> The hardware will still ignore the top bit, but this should take
> care of the requirement of high bit being 0/1 for user/kernel in the
> x86_64 kernel. Wouldn't that work?

Maybe, but that seems quite ugly.  This will make the userspace and kernel semantics diverge.

>
>> 
>> UAI looks like it wasn't intended to be context switched and, indeed, your series doesn't context switch it.  As far as I'm concerned, this is an error, and if we support UAI at all, we should context switch it.  Yes, this will be slow, perhaps painfully slow.  AMD knows how to fix it by, for example, reading the Intel SDM.  By *not* context switching UAI, we force it on for all user code, including unsuspecting user code, as well as for kernel code.  Do we actually want it on for kernel code?  With LAM, in contrast, the semantics for kernel pointers vs user pointers actually make sense and can be set per mm, which will make things like io_uring (in theory) do the right thing.
>
> I plan to enable/disable UAI based on the next task's settings by
> doing MSR write to EFER during context switch. I will have to measure
> how much additional cost an MSR write in context switch path brings in.
> However given that without a hardware feature like ARM64 MTE, this would
> primarily be used in non-production environments. Hence I wonder if MSR
> write cost could be tolerated?

I'm not sure what you mean by a feature like ARM64 MTE.

>
> Regarding enabling UAI for kernel, I will have to check how clean and
> efficient it would be to disable/enable UAI on user/kernel entry/exit
> points.
>
>> 
>> UAI and LAM are incompatible from a userspace perspective.  Since LAM is pretty clearly superior [0], it seems like a better long term outcome would be for programs that want tag bits to target LAM and for AMD to support LAM if there is demand.  For that matter, do we actually expect any userspace to want to support UAI?  (Are there existing too-clever sandboxes that would be broken by enabling UAI?)
>> 
>> Given that UAI is not efficiently context switched, the implementation of uaccess is rather bizarre.  With the implementation in this series in particular, if the access_ok checks ever get out of sync with actual user access, a user access could be emitted with the high bits not masked despite the range check succeeding due to masking, which would, unless great care is taken, result in a "user" access hitting the kernel range.  That's no good.
>
> Okay, I guess if context switching and sticking to 6 bits as mentioned
> earlier is feasible, this concern too goes away unless I am missing something.

I think it does go away.
Bharata B Rao April 5, 2022, 5:58 a.m. UTC | #19
On 4/2/2022 12:55 AM, Dave Hansen wrote:
> On 3/23/22 00:48, Bharata B Rao wrote:
>> Ok got that. However can you point to me a few instances in the current
>> kernel code where such assumption of high bit being user/kernel address
>> differentiator exists so that I get some idea of what it takes to
>> audit all such cases?
> 
> Look around for comparisons against TASK_SIZE_MAX.
> arch/x86/lib/putuser.S or something like arch_check_bp_in_kernelspace()
> come to mind.
> 
>> Also wouldn't the problem of high bit be solved by using only the
>> 6 out of 7 available bits in UAI and leaving the 63rd bit alone?
>> The hardware will still ignore the top bit, but this should take
>> care of the requirement of high bit being 0/1 for user/kernel in the
>> x86_64 kernel. Wouldn't that work?
> 
> I don't think so.
> 
> The kernel knows that a dereference of a pointer that looks like a
> kernel address that get kernel data.  Userspace must be kept away from
> things that look like kernel addresses.
> 
> Let's say some app does:
> 
> 	void *ptr = (void *)0xffffffffc038d130;
> 	read(fd, ptr, 1);
> 
> and inside the kernel, that boils down to:
> 
> 	put_user('x', 0xffffffffc038d130);
> 
> Today the kernels knows that 0xffffffffc038d130 is >=TASK_SIZE_MAX, so
> this is obviously naughty userspace trying to write to the kernel.  But,
> it's not obviously wrong if the high bits are ignored.
> 
> Like you said, we could, as a convention, check for the highest bit
> being set and use *that* to indicate a kernel address.  But, the sneaky
> old userspace would just do:
> 
> 	put_user('x', 0x7fffffffc038d130);
> 
> It would pass the "high bit" check since that bit is clear, but it still
> accesses kernel memory because UAI ignores the bit userspace just cleared.
> 
> I think the only way to get around this is to go find every single place
> in the kernel that does a userspace address check and ensure that it
> fully untags the pointer first.

Thanks Dave for the detailed explanation.

We are discussing these aspects with the hardware team to check the best
possible path forward.

Regards,
Bharata.
Peter Zijlstra April 5, 2022, 8:14 a.m. UTC | #20
On Wed, Mar 23, 2022 at 01:18:41PM +0530, Bharata B Rao wrote:
> On 3/22/2022 3:59 AM, Andy Lutomirski wrote:

> > I hate to be a pain, but I'm really not convinced that this feature
> > is suitable for Linux.  There are a few reasons:
> > 
> > Right now, the concept that the high bit of an address determines
> > whether it's a user or a kernel address is fairly fundamental to the
> > x86_64 (and x86_32!) code.  It may not be strictly necessary to
> > preserve this, but violating it would require substantial thought.
> > With UAI enabled, kernel and user addresses are, functionally,
> > interleaved.  This makes things like access_ok checks, and more
> > generally anything that operates on a range of addresses, behave
> > potentially quite differently.  A lot of auditing of existing code
> > would be needed to make it safe.
> 
> Ok got that. However can you point to me a few instances in the current
> kernel code where such assumption of high bit being user/kernel address
> differentiator exists so that I get some idea of what it takes to
> audit all such cases?

The fact that you have to ask and can't readily find them should be a
big honking clue on its own, no?

Anyway, see here:

arch/x86/events/perf_event.h:static inline bool kernel_ip(unsigned long ip)
arch/x86/events/perf_event.h:{
arch/x86/events/perf_event.h:#ifdef CONFIG_X86_32
arch/x86/events/perf_event.h:   return ip > PAGE_OFFSET;
arch/x86/events/perf_event.h:#else
arch/x86/events/perf_event.h:   return (long)ip < 0;
arch/x86/events/perf_event.h:#endif
arch/x86/events/perf_event.h:}
Bharata B Rao April 5, 2022, 8:40 a.m. UTC | #21
On 4/5/2022 1:44 PM, Peter Zijlstra wrote:
> On Wed, Mar 23, 2022 at 01:18:41PM +0530, Bharata B Rao wrote:
>> On 3/22/2022 3:59 AM, Andy Lutomirski wrote:
> 
>>> I hate to be a pain, but I'm really not convinced that this feature
>>> is suitable for Linux.  There are a few reasons:
>>>
>>> Right now, the concept that the high bit of an address determines
>>> whether it's a user or a kernel address is fairly fundamental to the
>>> x86_64 (and x86_32!) code.  It may not be strictly necessary to
>>> preserve this, but violating it would require substantial thought.
>>> With UAI enabled, kernel and user addresses are, functionally,
>>> interleaved.  This makes things like access_ok checks, and more
>>> generally anything that operates on a range of addresses, behave
>>> potentially quite differently.  A lot of auditing of existing code
>>> would be needed to make it safe.
>>
>> Ok got that. However can you point to me a few instances in the current
>> kernel code where such assumption of high bit being user/kernel address
>> differentiator exists so that I get some idea of what it takes to
>> audit all such cases?
> 
> The fact that you have to ask and can't readily find them should be a
> big honking clue on its own, no?
> 
> Anyway, see here:
> 
> arch/x86/events/perf_event.h:static inline bool kernel_ip(unsigned long ip)
> arch/x86/events/perf_event.h:{
> arch/x86/events/perf_event.h:#ifdef CONFIG_X86_32
> arch/x86/events/perf_event.h:   return ip > PAGE_OFFSET;
> arch/x86/events/perf_event.h:#else
> arch/x86/events/perf_event.h:   return (long)ip < 0;
> arch/x86/events/perf_event.h:#endif
> arch/x86/events/perf_event.h:}

That's a pretty good and clear example.

Thanks Peter. I do now see that auditing all such instances would be
an uphill task.

Regards,
Bharata.
Catalin Marinas April 8, 2022, 5:41 p.m. UTC | #22
On Mon, Mar 21, 2022 at 03:29:34PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
> > This patchset makes use of Upper Address Ignore (UAI) feature available
> > on upcoming AMD processors to provide user address tagging support for x86/AMD.
> >
> > UAI allows software to store a tag in the upper 7 bits of a logical
> > address [63:57]. When enabled, the processor will suppress the
> > traditional canonical address checks on the addresses. More information
> > about UAI can be found in section 5.10 of 'AMD64 Architecture
> > Programmer's Manual, Vol 2: System Programming' which is available from
> >
> > https://bugzilla.kernel.org/attachment.cgi?id=300549
> 
> I hate to be a pain, but I'm really not convinced that this feature is
> suitable for Linux.  There are a few reasons:
> 
> Right now, the concept that the high bit of an address determines
> whether it's a user or a kernel address is fairly fundamental to the
> x86_64 (and x86_32!) code.  It may not be strictly necessary to
> preserve this, but violating it would require substantial thought.
> With UAI enabled, kernel and user addresses are, functionally,
> interleaved.  This makes things like access_ok checks, and more
> generally anything that operates on a range of addresses, behave
> potentially quite differently.  A lot of auditing of existing code
> would be needed to make it safe.

Just catching up with this thread. I'm not entirely familiar with the
x86 codebase but some points from the arm64 TBI (top-byte ignore)
feature that may be useful:

In the 52-bit VA configuration (maximum) the kernel addresses on arm64
start at 0xfff00000_00000000 and the user ones go up to
0x000fffff_ffffffff. Anything in between these addresses would trigger a
fault on access. So a non-zero top-byte, even with bit 63 set, would not
access any kernel address unless bits 52 to 63 are all 1 (and this would
fail the access_ok() check, see below).

On arm64 we had TBI from day 0 but the syscall ABI did not allow user
tagged pointers into the kernel. An access_ok() checking addr < TASK_SIZE
was sufficient. With the tagged address ABI, we wanted to allow user
addresses with a non-zero top byte into the kernel. The access_ok() was
changed to sign-extend from bit 55 before comparing with TASK_SIZE. The
hardware also uses bit 55 to select the user or the kernel page tables
(TTBR0/TTBR1_EL1 regs or current->mm->pgd vs swapper_pg_dir in Linux
terms).

I haven't looked at the AMD UAI feature but if it still selects the user
vs kernel page tables based on bit 63, there may be a potential problem.
However, if access_ok() ensures that bit 56 is 0 for valid user
addresses, such access would fault as it's below the kernel's
0xff000000_00000000 limit (if I got it correctly for x86).

Since the UAI goes from bit 57 and up, I have a suspicion that it keeps
bit 56 for user vs kernel address selection. An access_ok()
sign-extending from this bit should be sufficient. As I said above,
there's no risk if such addresses get past access_ok(). With bit 56
cleared they'd not be able to access any kernel data.

(that's unless I missed something in the x86 kernel address layout)

> UAI looks like it wasn't intended to be context switched and, indeed,
> your series doesn't context switch it.  As far as I'm concerned, this
> is an error, and if we support UAI at all, we should context switch
> it.  Yes, this will be slow, perhaps painfully slow.  AMD knows how to
> fix it by, for example, reading the Intel SDM.  By *not* context
> switching UAI, we force it on for all user code, including
> unsuspecting user code, as well as for kernel code.  Do we actually
> want it on for kernel code?  With LAM, in contrast, the semantics for
> kernel pointers vs user pointers actually make sense and can be set
> per mm, which will make things like io_uring (in theory) do the right
> thing.

Arm64 does not context switch the hardware TBI feature either (and it
was always on from the start). A reason is that it requires expensive
TLB maintenance. What we do context switch is the opt-in to the tagged
address ABI which allows tagged pointers into the kernel. That's purely
a software choice (TIF flag) and it only affects the access_ok() check.

With KASAN enabled, we enable the TBI feature for the kernel as well,
it is independently controlled from the user one.

> UAI and LAM are incompatible from a userspace perspective.  Since LAM
> is pretty clearly superior [0], it seems like a better long term
> outcome would be for programs that want tag bits to target LAM and for
> AMD to support LAM if there is demand.  For that matter, do we
> actually expect any userspace to want to support UAI?  (Are there
> existing too-clever sandboxes that would be broken by enabling UAI?)

From the arm64 experience, there were some thoughts in the early days
that JIT compilers may use the top byte in the generated code to store
pointer type etc. IIRC, this was never deployed. But subsequently we got
MTE (memory tagging extensions) that uses bits 56 to 59 as a tag checked
against the memory colour during access. This seems to be more useful.

There is ASan as well that can make use of the top bits but it requires
recompilation (in principle, for MTE you just need the libc to return
tagged pointers on malloc()).

Of course, there can be ABI surprises and we came across some, see
commit dcde237319e6 ("mm: Avoid creating virtual address aliases in
brk()/mmap()/mremap()").

> Given that UAI is not efficiently context switched, the implementation
> of uaccess is rather bizarre.  With the implementation in this series
> in particular, if the access_ok checks ever get out of sync with
> actual user access, a user access could be emitted with the high bits
> not masked despite the range check succeeding due to masking, which
> would, unless great care is taken, result in a "user" access hitting
> the kernel range.  That's no good.

Per my (mis)understanding of the x86 address space, I don't think it's
that bad if you are careful with bit 56 always being 0 for user
addresses. The user can't generate a valid kernel address that goes past
access_ok().