mbox series

[RFC,0/3] add support for mm-local memory allocations

Message ID 20240621201501.1059948-1-rkagan@amazon.de (mailing list archive)
Headers show
Series add support for mm-local memory allocations | expand

Message

Kagan, Roman June 21, 2024, 8:14 p.m. UTC
In a series posted a few years ago [1], a proposal was put forward to allow the
kernel to allocate memory local to a mm and thus push it out of reach for
current and future speculation-based cross-process attacks.  We still believe
this is a nice thing to have.

However, in the time passed since that post Linux mm has grown quite a few new
goodies, so we'd like to explore possibilities to implement this functionality
with less effort and churn leveraging the now available facilities.

Specifically, this is a proof-of-concept attempt to implement mm-local
allocations piggy-backing on memfd_secret(), using regular user addressess but
pinning the pages and flipping the user/supervisor flag on the respective PTEs
to make them directly accessible from kernel, and sealing the VMA to prevent
userland from taking over the address range.  The approach allowed to delegate
all the heavy lifting -- address management, interactions with the direct map,
cleanup on mm teardown -- to the existing infrastructure, and required zero
architecture-specific code.

Compared to the approach used in the orignal series, where a dedicated kernel
address range and thus a dedicated PGD was used for mm-local allocations, the
one proposed here may have certain drawbacks, in particular

- using user addresses for kernel memory may violate assumptions in various
  parts of kernel code which we may not have identified with smoke tests we did

- the allocated addresses are guessable by the userland (ATM they are even
  visible in /proc/PID/maps but that's fixable) which may weaken the security
  posture

Also included is a simple test driver and selftest to smoke test and showcase
the feature.

The code is PoC RFC and lacks a lot of checks and special case handling, but
demonstrates the idea.  We'd appreciate any feedback on whether it's a viable
approach or it should better be abandoned in favor of the one with dedicated
PGD / kernel address range or yet something else.

[1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/

Fares Mehanna (2):
  mseal: expose interface to seal / unseal user memory ranges
  mm/secretmem: implement mm-local kernel allocations

Roman Kagan (1):
  drivers/misc: add test driver and selftest for proclocal allocator

 drivers/misc/Makefile                         |   1 +
 tools/testing/selftests/proclocal/Makefile    |   6 +
 include/linux/secretmem.h                     |   8 +
 mm/internal.h                                 |   7 +
 drivers/misc/proclocal-test.c                 | 200 +++++++++++++++++
 mm/gup.c                                      |   4 +-
 mm/mseal.c                                    |  81 ++++---
 mm/secretmem.c                                | 208 ++++++++++++++++++
 .../selftests/proclocal/proclocal-test.c      | 150 +++++++++++++
 drivers/misc/Kconfig                          |  15 ++
 tools/testing/selftests/proclocal/.gitignore  |   1 +
 11 files changed, 649 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/proclocal/Makefile
 create mode 100644 drivers/misc/proclocal-test.c
 create mode 100644 tools/testing/selftests/proclocal/proclocal-test.c
 create mode 100644 tools/testing/selftests/proclocal/.gitignore

Comments

David Woodhouse July 4, 2024, 11:11 a.m. UTC | #1
On Fri, 2024-06-21 at 22:14 +0200, Roman Kagan wrote:
> 
> Compared to the approach used in the orignal series, where a dedicated kernel
> address range and thus a dedicated PGD was used for mm-local allocations, the
> one proposed here may have certain drawbacks, in particular
> 
> - using user addresses for kernel memory may violate assumptions in various
>   parts of kernel code which we may not have identified with smoke tests we did
> 
> - the allocated addresses are guessable by the userland (ATM they are even
>   visible in /proc/PID/maps but that's fixable) which may weaken the security
>   posture

I think this approach makes sense as it's generic and applies
immediately to all architectures. I'm slightly uncomfortable about
using userspace addresses though, and the special cases that it
introduces.

I'd like to see a per-arch ARCH_HAS_PROCLOCAL_PGD so that it *can* be
put back into a dedicated address range where possible.

Looking forward to the x86 KVM code from before being dusted off and
put on top of this, and also the Arm version of same. A test driver and
test case is all very well, but it's less exciting than the real use
case :)
Alexander Graf Aug. 28, 2024, 9:50 a.m. UTC | #2
Hey Roman,

On 21.06.24 22:14, Roman Kagan wrote:
> In a series posted a few years ago [1], a proposal was put forward to allow the
> kernel to allocate memory local to a mm and thus push it out of reach for
> current and future speculation-based cross-process attacks.  We still believe
> this is a nice thing to have.
>
> However, in the time passed since that post Linux mm has grown quite a few new
> goodies, so we'd like to explore possibilities to implement this functionality
> with less effort and churn leveraging the now available facilities.
>
> Specifically, this is a proof-of-concept attempt to implement mm-local
> allocations piggy-backing on memfd_secret(), using regular user addressess but
> pinning the pages and flipping the user/supervisor flag on the respective PTEs
> to make them directly accessible from kernel, and sealing the VMA to prevent
> userland from taking over the address range.  The approach allowed to delegate
> all the heavy lifting -- address management, interactions with the direct map,
> cleanup on mm teardown -- to the existing infrastructure, and required zero
> architecture-specific code.
>
> Compared to the approach used in the orignal series, where a dedicated kernel
> address range and thus a dedicated PGD was used for mm-local allocations, the
> one proposed here may have certain drawbacks, in particular
>
> - using user addresses for kernel memory may violate assumptions in various
>    parts of kernel code which we may not have identified with smoke tests we did
>
> - the allocated addresses are guessable by the userland (ATM they are even
>    visible in /proc/PID/maps but that's fixable) which may weaken the security
>    posture
>
> Also included is a simple test driver and selftest to smoke test and showcase
> the feature.
>
> The code is PoC RFC and lacks a lot of checks and special case handling, but
> demonstrates the idea.  We'd appreciate any feedback on whether it's a viable
> approach or it should better be abandoned in favor of the one with dedicated
> PGD / kernel address range or yet something else.
>
> [1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/


I haven't seen any negative feedback on the RFC, so when can I expect a 
v1 of this patch set that addresses the non-production-readyness of it 
that you call out above? :)


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597