Message ID | 20201103175841.3495947-1-elver@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KFENCE: A low-overhead sampling-based memory safety error detector | expand |
On Tue, 3 Nov 2020 18:58:32 +0100 Marco Elver <elver@google.com> wrote: > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a > low-overhead sampling-based memory safety error detector of heap > use-after-free, invalid-free, and out-of-bounds access errors. This > series enables KFENCE for the x86 and arm64 architectures, and adds > KFENCE hooks to the SLAB and SLUB allocators. > > KFENCE is designed to be enabled in production kernels, and has near > zero performance overhead. Compared to KASAN, KFENCE trades performance > for precision. The main motivation behind KFENCE's design, is that with > enough total uptime KFENCE will detect bugs in code paths not typically > exercised by non-production test workloads. One way to quickly achieve a > large enough total uptime is when the tool is deployed across a large > fleet of machines. Has kfence detected any kernel bugs yet? What is its track record? Will a kfence merge permit us to remove some other memory debugging subsystem? We seem to have rather a lot of them.
On Wed, 4 Nov 2020 at 01:31, Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 3 Nov 2020 18:58:32 +0100 Marco Elver <elver@google.com> wrote: > > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a > > low-overhead sampling-based memory safety error detector of heap > > use-after-free, invalid-free, and out-of-bounds access errors. This > > series enables KFENCE for the x86 and arm64 architectures, and adds > > KFENCE hooks to the SLAB and SLUB allocators. > > > > KFENCE is designed to be enabled in production kernels, and has near > > zero performance overhead. Compared to KASAN, KFENCE trades performance > > for precision. The main motivation behind KFENCE's design, is that with > > enough total uptime KFENCE will detect bugs in code paths not typically > > exercised by non-production test workloads. One way to quickly achieve a > > large enough total uptime is when the tool is deployed across a large > > fleet of machines. > > Has kfence detected any kernel bugs yet? What is its track record? Not yet, but once we deploy in various production kernels, we expect to find new bugs (we'll report back with results once deployed). Especially in drivers or subsystems that syzkaller+KASAN can't touch, e.g. where real devices are required to get coverage. We expect to have first results on this within 3 months, and can start backports now that KFENCE for mainline is being finalized. This will likely also make it into Android, but deployment there will take much longer. The story is similar with the user space version of the tool (GWP-ASan), where results started to materialize once it was deployed across the fleet. > Will a kfence merge permit us to remove some other memory debugging > subsystem? We seem to have rather a lot of them. Nothing obvious I think. KFENCE is unique in that it is meant for production fleets of machines (with ~zero overhead and no new HW features), with the caveat that due to it being sampling based, it's not so suitable for single machine testing. The other debugging tools are suitable for the latter, but not former. Thanks, -- Marco
On Wed, Nov 4, 2020 at 1:36 PM Marco Elver <elver@google.com> wrote: > > On Wed, 4 Nov 2020 at 01:31, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 3 Nov 2020 18:58:32 +0100 Marco Elver <elver@google.com> wrote: > > > > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a > > > low-overhead sampling-based memory safety error detector of heap > > > use-after-free, invalid-free, and out-of-bounds access errors. This > > > series enables KFENCE for the x86 and arm64 architectures, and adds > > > KFENCE hooks to the SLAB and SLUB allocators. > > > > > > KFENCE is designed to be enabled in production kernels, and has near > > > zero performance overhead. Compared to KASAN, KFENCE trades performance > > > for precision. The main motivation behind KFENCE's design, is that with > > > enough total uptime KFENCE will detect bugs in code paths not typically > > > exercised by non-production test workloads. One way to quickly achieve a > > > large enough total uptime is when the tool is deployed across a large > > > fleet of machines. > > > > Has kfence detected any kernel bugs yet? What is its track record? > > Not yet, but once we deploy in various production kernels, we expect > to find new bugs (we'll report back with results once deployed). > Especially in drivers or subsystems that syzkaller+KASAN can't touch, > e.g. where real devices are required to get coverage. We expect to > have first results on this within 3 months, and can start backports > now that KFENCE for mainline is being finalized. This will likely also > make it into Android, but deployment there will take much longer. > > The story is similar with the user space version of the tool > (GWP-ASan), where results started to materialize once it was deployed > across the fleet. > > > Will a kfence merge permit us to remove some other memory debugging > > subsystem? We seem to have rather a lot of them. > > Nothing obvious I think. KFENCE is unique in that it is meant for > production fleets of machines (with ~zero overhead and no new HW > features), with the caveat that due to it being sampling based, it's > not so suitable for single machine testing. The other debugging tools > are suitable for the latter, but not former. Agreeing with everything Marco said I can only add that it would be nice to have a separate discussion about the existing memory debugging subsystems and the need to remove any of them. Having many tools in a toolbox does not hurt, but we need to ensure that all the tools in question are visible to the users (so that people know when and how to use them), can find important bugs and do not duplicate each other. > Thanks, > -- Marco