mbox series

[v7,0/9] KFENCE: A low-overhead sampling-based memory safety error detector

Message ID 20201103175841.3495947-1-elver@google.com (mailing list archive)
Headers show
Series KFENCE: A low-overhead sampling-based memory safety error detector | expand

Message

Marco Elver Nov. 3, 2020, 5:58 p.m. UTC
[ From v7 we think this series is ready to be included in the mm tree.
  Where appropriate, we would welcome additional Acks / Reviews by MM,
  x86, and arm64 maintainers. Thank you! ]

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.  This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

v7:
* Clean up print_diff_canary() boundary calculation.
* Cleaner CONFIG_KFENCE_STRESS_TEST_FAULTS, using "if EXPERT".
* Make __kfence_free() part of the public API.
* Only not-present faults should be handled by KFENCE.
* Remove arm64 dependency on 4K page size.
* Move kmemleak_free_recursive() before kfence_free() in SLAB.
* Revert unused orig_size in SLUB.
* For KASAN compatibility, also skip kasan_poison_*().
* Various smaller fixes (see details in patches).

v6: https://lkml.kernel.org/r/20201029131649.182037-1-elver@google.com
* Record allocation and free task pids, and show them in reports. This
  information helps more easily identify e.g. racy use-after-frees.

v5: https://lkml.kernel.org/r/20201027141606.426816-1-elver@google.com
* Lots of smaller fixes (see details in patches).
* Optimize is_kfence_address() by using better in-range check.
* Removal of HAVE_ARCH_KFENCE_STATIC_POOL and static pool
  support in favor of memblock_alloc'd pool only, as it avoids all
  issues with virt_to translations. With the new optimizations to
  is_kfence_address(), we measure no noticeable performance impact.
* Taint with TAINT_BAD_PAGE, to distinguish memory errors from regular
  warnings (also used by SL*B/KASAN/etc. for memory errors).
* Rework sample_interval parameter dynamic setting semantics.
* Rework kfence_shutdown_cache().
* Fix obj_to_index+objs_per_slab_page, which among other things is
  required when using memcg accounted allocations.
* Rebase to 5.10-rc1.

v4: https://lkml.kernel.org/r/20200929133814.2834621-1-elver@google.com
* MAINTAINERS: Split out from first patch.
* Make static memory pool's attrs entirely arch-dependent.
* Fix report generation if __slab_free tail-called.
* Clarify RCU test comment [reported by Paul E. McKenney].

v3: https://lkml.kernel.org/r/20200921132611.1700350-1-elver@google.com
* Rewrite SLAB/SLUB patch descriptions to clarify need for 'orig_size'.
* Various smaller fixes (see details in patches).

v2: https://lkml.kernel.org/r/20200915132046.3332537-1-elver@google.com
* Various comment/documentation changes (see details in patches).
* Various smaller fixes (see details in patches).
* Change all reports to reference the kfence object, "kfence-#nn".
* Skip allocation/free internals stack trace.
* Rework KMEMLEAK compatibility patch.

RFC/v1: https://lkml.kernel.org/r/20200907134055.2878499-1-elver@google.com

Alexander Potapenko (5):
  mm: add Kernel Electric-Fence infrastructure
  x86, kfence: enable KFENCE for x86
  mm, kfence: insert KFENCE hooks for SLAB
  mm, kfence: insert KFENCE hooks for SLUB
  kfence, kasan: make KFENCE compatible with KASAN

Marco Elver (4):
  arm64, kfence: enable KFENCE for ARM64
  kfence, Documentation: add KFENCE documentation
  kfence: add test suite
  MAINTAINERS: add entry for KFENCE

 Documentation/dev-tools/index.rst  |   1 +
 Documentation/dev-tools/kfence.rst | 297 +++++++++++
 MAINTAINERS                        |  12 +
 arch/arm64/Kconfig                 |   1 +
 arch/arm64/include/asm/kfence.h    |  19 +
 arch/arm64/mm/fault.c              |   4 +
 arch/arm64/mm/mmu.c                |   7 +-
 arch/x86/Kconfig                   |   1 +
 arch/x86/include/asm/kfence.h      |  65 +++
 arch/x86/mm/fault.c                |   5 +
 include/linux/kfence.h             | 201 +++++++
 include/linux/slab_def.h           |   3 +
 include/linux/slub_def.h           |   3 +
 init/main.c                        |   3 +
 lib/Kconfig.debug                  |   1 +
 lib/Kconfig.kfence                 |  72 +++
 mm/Makefile                        |   1 +
 mm/kasan/common.c                  |  19 +
 mm/kasan/generic.c                 |   3 +-
 mm/kfence/Makefile                 |   6 +
 mm/kfence/core.c                   | 826 +++++++++++++++++++++++++++++
 mm/kfence/kfence.h                 | 107 ++++
 mm/kfence/kfence_test.c            | 823 ++++++++++++++++++++++++++++
 mm/kfence/report.c                 | 235 ++++++++
 mm/slab.c                          |  38 +-
 mm/slab_common.c                   |   5 +-
 mm/slub.c                          |  60 ++-
 27 files changed, 2792 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/dev-tools/kfence.rst
 create mode 100644 arch/arm64/include/asm/kfence.h
 create mode 100644 arch/x86/include/asm/kfence.h
 create mode 100644 include/linux/kfence.h
 create mode 100644 lib/Kconfig.kfence
 create mode 100644 mm/kfence/Makefile
 create mode 100644 mm/kfence/core.c
 create mode 100644 mm/kfence/kfence.h
 create mode 100644 mm/kfence/kfence_test.c
 create mode 100644 mm/kfence/report.c

Comments

Andrew Morton Nov. 4, 2020, 12:31 a.m. UTC | #1
On Tue,  3 Nov 2020 18:58:32 +0100 Marco Elver <elver@google.com> wrote:

> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors.  This
> series enables KFENCE for the x86 and arm64 architectures, and adds
> KFENCE hooks to the SLAB and SLUB allocators.
> 
> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.

Has kfence detected any kernel bugs yet?  What is its track record?

Will a kfence merge permit us to remove some other memory debugging
subsystem?  We seem to have rather a lot of them.
Marco Elver Nov. 4, 2020, 12:36 p.m. UTC | #2
On Wed, 4 Nov 2020 at 01:31, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue,  3 Nov 2020 18:58:32 +0100 Marco Elver <elver@google.com> wrote:
>
> > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > low-overhead sampling-based memory safety error detector of heap
> > use-after-free, invalid-free, and out-of-bounds access errors.  This
> > series enables KFENCE for the x86 and arm64 architectures, and adds
> > KFENCE hooks to the SLAB and SLUB allocators.
> >
> > KFENCE is designed to be enabled in production kernels, and has near
> > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > for precision. The main motivation behind KFENCE's design, is that with
> > enough total uptime KFENCE will detect bugs in code paths not typically
> > exercised by non-production test workloads. One way to quickly achieve a
> > large enough total uptime is when the tool is deployed across a large
> > fleet of machines.
>
> Has kfence detected any kernel bugs yet?  What is its track record?

Not yet, but once we deploy in various production kernels, we expect
to find new bugs (we'll report back with results once deployed).
Especially in drivers or subsystems that syzkaller+KASAN can't touch,
e.g. where real devices are required to get coverage. We expect to
have first results on this within 3 months, and can start backports
now that KFENCE for mainline is being finalized. This will likely also
make it into Android, but deployment there will take much longer.

The story is similar with the user space version of the tool
(GWP-ASan), where results started to materialize once it was deployed
across the fleet.

> Will a kfence merge permit us to remove some other memory debugging
> subsystem?  We seem to have rather a lot of them.

Nothing obvious I think. KFENCE is unique in that it is meant for
production fleets of machines (with ~zero overhead and no new HW
features), with the caveat that due to it being sampling based, it's
not so suitable for single machine testing. The other debugging tools
are suitable for the latter, but not former.

Thanks,
-- Marco
Alexander Potapenko Nov. 4, 2020, 3:16 p.m. UTC | #3
On Wed, Nov 4, 2020 at 1:36 PM Marco Elver <elver@google.com> wrote:
>
> On Wed, 4 Nov 2020 at 01:31, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue,  3 Nov 2020 18:58:32 +0100 Marco Elver <elver@google.com> wrote:
> >
> > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > > low-overhead sampling-based memory safety error detector of heap
> > > use-after-free, invalid-free, and out-of-bounds access errors.  This
> > > series enables KFENCE for the x86 and arm64 architectures, and adds
> > > KFENCE hooks to the SLAB and SLUB allocators.
> > >
> > > KFENCE is designed to be enabled in production kernels, and has near
> > > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > > for precision. The main motivation behind KFENCE's design, is that with
> > > enough total uptime KFENCE will detect bugs in code paths not typically
> > > exercised by non-production test workloads. One way to quickly achieve a
> > > large enough total uptime is when the tool is deployed across a large
> > > fleet of machines.
> >
> > Has kfence detected any kernel bugs yet?  What is its track record?
>
> Not yet, but once we deploy in various production kernels, we expect
> to find new bugs (we'll report back with results once deployed).
> Especially in drivers or subsystems that syzkaller+KASAN can't touch,
> e.g. where real devices are required to get coverage. We expect to
> have first results on this within 3 months, and can start backports
> now that KFENCE for mainline is being finalized. This will likely also
> make it into Android, but deployment there will take much longer.
>
> The story is similar with the user space version of the tool
> (GWP-ASan), where results started to materialize once it was deployed
> across the fleet.
>
> > Will a kfence merge permit us to remove some other memory debugging
> > subsystem?  We seem to have rather a lot of them.
>
> Nothing obvious I think. KFENCE is unique in that it is meant for
> production fleets of machines (with ~zero overhead and no new HW
> features), with the caveat that due to it being sampling based, it's
> not so suitable for single machine testing. The other debugging tools
> are suitable for the latter, but not former.

Agreeing with everything Marco said I can only add that it would be
nice to have a separate discussion about the existing memory debugging
subsystems and the need to remove any of them.
Having many tools in a toolbox does not hurt, but we need to ensure
that all the tools in question are visible to the users (so that
people know when and how to use them), can find important bugs and do
not duplicate each other.


> Thanks,
> -- Marco