Re: [PATCH v3] scripts: add leaking_addresses.pl

Message ID	CAGXu5j+o6cf0PrnxD+W+09F7UY6JL+h7Bn5_4T2C0Z_fTZAx-w@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kernel-hardening-return-10413-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com> Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk MIME-Version: 1.0 Sender: keescook@google.com In-Reply-To: <CA+55aFzPeCuvjscBNj6ANzfRUDsjT6Ntn9Grr915CrSpOfNSnA@mail.gmail.com> References: <1509945567-11801-1-git-send-email-me@tobin.cc> <CA+55aFzPeCuvjscBNj6ANzfRUDsjT6Ntn9Grr915CrSpOfNSnA@mail.gmail.com> From: Kees Cook <keescook@chromium.org> Date: Tue, 7 Nov 2017 13:22:13 -0800 Message-ID: <CAGXu5j+o6cf0PrnxD+W+09F7UY6JL+h7Bn5_4T2C0Z_fTZAx-w@mail.gmail.com> To: Linus Torvalds <torvalds@linux-foundation.org>, "Tobin C. Harding" <me@tobin.cc>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, Andy Lutomirski <luto@kernel.org>, Joe Perches <joe@perches.com> Cc: Network Development <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "kernel-hardening@lists.openwall.com" <kernel-hardening@lists.openwall.com>, "Jason A. Donenfeld" <Jason@zx2c4.com>, "Theodore Ts'o" <tytso@mit.edu>, Paolo Bonzini <pbonzini@redhat.com>, Tycho Andersen <tycho@docker.com>, "Roberts, William C" <william.c.roberts@intel.com>, Tejun Heo <tj@kernel.org>, Jordan Glover <Golden_Miller83@protonmail.ch>, Greg KH <gregkh@linuxfoundation.org>, Petr Mladek <pmladek@suse.com>, Ian Campbell <ijc@hellion.org.uk>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <wilal.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Chris Fries <cfries@google.com>, Dave Weinstein <olorin@google.com>, Daniel Micay <danielmicay@gmail.com>, Djalal Harouni <tixxdz@gmail.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Content-Type: text/plain; charset="UTF-8" Subject: [kernel-hardening] Re: [PATCH v3] scripts: add leaking_addresses.pl

Kees Cook Nov. 7, 2017, 9:22 p.m. UTC

On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Nov 5, 2017 at 9:19 PM, Tobin C. Harding <me@tobin.cc> wrote:
>> Currently we are leaking addresses from the kernel to user space. This
>> script is an attempt to find some of those leakages. Script parses
>> `dmesg` output and /proc and /sys files for hex strings that look like
>> kernel addresses.
>
> Lovely. This is great. It shows just how much totally pointless stuff
> we leak, and to normal users that really shouldn't need it.
>
> I had planned to wait for 4.15 to look at the printk hashing stuff
> etc, but this part I think I could/should merge early just because I
> think a lot of kernel developers will go "Why the f*ck would we expose
> that kernel address there?"
>
> The module sections stuff etc should likely be obviously root-only,
> although maybe I'm missing some tool that ends up using it and is
> useful to normal developers.
>
> And I'm thinking we could make kallsyms smarter too, and instead of
> depending on kptr_restrict that screws over things with much too big a
> hammer, we could make it take 'perf_event_paranoid' into account. I
> suspect that's the main user of kallsyms that would still be relevant
> to non-root.

Linus, what do you have in mind for the root-only "yes we really need
the actual address output" exceptions?

For example, right now /sys/kernel/debug/kernel_page_tables
(CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x.

Looking other places that stand out, it seems like
/proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
%p usage. It's unclear to me if a hash is sufficient for meaningful
debugging there?

Seems like these three from dmesg could be removed?

[    0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576
arch/x86/realmode/init.c

[    0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944
r8192 d30512 u524288
mm/percpu.c

[    0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB)
mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff]
lib/swiotlb.c

Tobin, some other feedback on v4...

I find the output hard to parse. Instead of:

[27527 lockdep_chains] [ffffffffb226c628] cgroup_mutex

Could we have:

27527 /proc/lockdep_chains: [ffffffffb226c628] cgroup_mutex

At the very least, getting the full file path is needed or might not
be clear where something lives.

And for my kernels, I needed to exclude usbmon or the script would
hang (perhaps add a read timeout to the script to detect stalling
files?)



-Kees

Linus Torvalds Nov. 7, 2017, 9:44 p.m. UTC | #1

On Tue, Nov 7, 2017 at 1:22 PM, Kees Cook <keescook@chromium.org> wrote:
>
> Linus, what do you have in mind for the root-only "yes we really need
> the actual address output" exceptions?

I am convinced that absolutely none of them should use '%pK'.

So far we have actually never seen a valid case wher %pK was really
the right thing to do.

> For example, right now /sys/kernel/debug/kernel_page_tables
> (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x.

So I think it could continue to use %x, and just make sure the whole
file is root-only.

And that is why %pK is so wrong. It's almost never really about root.

Look at /proc/kallsyms, for example. There it's mainly about kernel
profiles (although there certainly have been other uses historically,
and maybe some of them remain) - which we have another flag for
entirely that is very much specifically about kernel profiles.

> Looking other places that stand out, it seems like
> /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
> %p usage. It's unclear to me if a hash is sufficient for meaningful
> debugging there?

Maybe not, but that is also _so_ esoteric that I suspect the right fix
is to just make it root-only readable.

I've never used it, we should check with people who have. I get the
feeling that this is purely for PeterZ debugging.

The very first commit that introduced that code actually has a

    (FIXME: should go into debugfs)

so I suspect it never should have been user-readable to begin with. I
guess it makes some things easier, but it really is *very* different
from things like profiling.

Profiling you often *cannot* do as root - some things you profile
really shouldn't be run as root, and might even refuse to do so. So
requiring you to be root just to get a kernel profile is very bad.

But looking at lockdep stats? Yeah, 'sudo' isn't so big of a deal.

And I really suspect that's true of a _lot_ of these %p things that
really want a pointer. It's not that they really want %pK, it's that
they shouldn't have been visible to regular users in the first place.

Things that *do* want a pointer and should be visible to regular users
are things like oops messages etc, but even there we obviously
generally want to use %pS/%pF when possible (but generally %x when not
- things like register contents etc that *may* contain pointers).

And if they really are visible to users - because you want to
cross-correlate them for things like netstat - I think the hashing is
the right thing to do both for root and for regular users.

> Seems like these three from dmesg could be removed?
>
> [    0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576
> arch/x86/realmode/init.c
>
> [    0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944
> r8192 d30512 u524288
> mm/percpu.c
>
> [    0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB)
> mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff]
> lib/swiotlb.c

Yes, I think the solution for a lot of the random device discovery
messages etc is to just remove them. They were likely useful when the
code was new and untested, and just stayed around afterwards.

                      Linus

Kees Cook Nov. 7, 2017, 10:08 p.m. UTC | #2

On Tue, Nov 7, 2017 at 1:44 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 7, 2017 at 1:22 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> Linus, what do you have in mind for the root-only "yes we really need
>> the actual address output" exceptions?
>
> I am convinced that absolutely none of them should use '%pK'.
>
> So far we have actually never seen a valid case wher %pK was really
> the right thing to do.

One case to keep in mind (likely for the future, once we stabilize
this current plan), is to consider the situation of "untrusted root
users", in the case of locked down systems where modules are disabled,
etc, etc. In those cases, there's a brighter line between kernel
memory and uid 0. (And when we do start looking at that class of info
leaks, we'll be faced with a possible mess from %x-but-root-only
usage.)

>> For example, right now /sys/kernel/debug/kernel_page_tables
>> (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x.
>
> So I think it could continue to use %x, and just make sure the whole
> file is root-only.

Yup, sounds good.

Looks like the emerging rules for displaying virtual addresses are:

- don't use %p
- especially don't use %p in dmesg
- really, go use %pS/%pF
- if you absolutely need an address, show it with %x and have the file
be root-only

What should the guidance be for physical addresses? They're sensitive
too (especially since they're reachable via the physmap), but we've
traditionally been leaving them in dmesg, etc.

>> Looking other places that stand out, it seems like
>> /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
>> %p usage. It's unclear to me if a hash is sufficient for meaningful
>> debugging there?
>
> Maybe not, but that is also _so_ esoteric that I suspect the right fix
> is to just make it root-only readable.

Anyone running a "real" system with LOCKDEP is out of their mind. :)

> I've never used it, we should check with people who have. I get the
> feeling that this is purely for PeterZ debugging.
>
> The very first commit that introduced that code actually has a
>
>     (FIXME: should go into debugfs)

Peter, Paul, what do you think about relocating these files?
(/proc/lockdep* into debugfs)

> so I suspect it never should have been user-readable to begin with. I
> guess it makes some things easier, but it really is *very* different
> from things like profiling.

I didn't make that clear in the original report: I was running this as
root just to see what even the root leaks looked like (it was
surprisingly small, IMO). Those files are already root-only.

>> Seems like these three from dmesg could be removed?
>>
>> [    0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576
>> arch/x86/realmode/init.c
>>
>> [    0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944
>> r8192 d30512 u524288
>> mm/percpu.c
>>
>> [    0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB)
>> mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff]
>> lib/swiotlb.c
>
> Yes, I think the solution for a lot of the random device discovery
> messages etc is to just remove them. They were likely useful when the
> code was new and untested, and just stayed around afterwards.

Yup, sounds right. (Though my physical address question from above
remains: should these also drop their physical address reports?)

-Kees

Steven Rostedt Nov. 7, 2017, 10:44 p.m. UTC | #3

On Tue, 7 Nov 2017 13:44:01 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Looking other places that stand out, it seems like
> > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
> > %p usage. It's unclear to me if a hash is sufficient for meaningful
> > debugging there?  
> 
> Maybe not, but that is also _so_ esoteric that I suspect the right fix
> is to just make it root-only readable.

Also note, I don't believe anyone should be running a LOCKDEP
configured kernel in a production (secured) environment. As it adds
quite a bit of overhead. It's something you run on test environments to
make sure it doesn't detect any possible deadlocks.

> 
> I've never used it, we should check with people who have. I get the
> feeling that this is purely for PeterZ debugging.

I've used it. But then again, I also debug lockdep ;-)

> 
> The very first commit that introduced that code actually has a
> 
>     (FIXME: should go into debugfs)
> 
> so I suspect it never should have been user-readable to begin with. I
> guess it makes some things easier, but it really is *very* different
> from things like profiling.

Want me to whip up a patch to move the file?

-- Steve

> 
> Profiling you often *cannot* do as root - some things you profile
> really shouldn't be run as root, and might even refuse to do so. So
> requiring you to be root just to get a kernel profile is very bad.
> 
> But looking at lockdep stats? Yeah, 'sudo' isn't so big of a deal.
> 
>

Tobin Harding Nov. 8, 2017, 2:05 a.m. UTC | #4

Hi Kees,

It seems I over looked your suggestions when submitting v4. My mistake.

On Tue, Nov 07, 2017 at 01:22:13PM -0800, Kees Cook wrote:
> On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Sun, Nov 5, 2017 at 9:19 PM, Tobin C. Harding <me@tobin.cc> wrote:
> >> Currently we are leaking addresses from the kernel to user space. This
> >> script is an attempt to find some of those leakages. Script parses
> >> `dmesg` output and /proc and /sys files for hex strings that look like
> >> kernel addresses.
> >
> > Lovely. This is great. It shows just how much totally pointless stuff
> > we leak, and to normal users that really shouldn't need it.
> >
> > I had planned to wait for 4.15 to look at the printk hashing stuff
> > etc, but this part I think I could/should merge early just because I
> > think a lot of kernel developers will go "Why the f*ck would we expose
> > that kernel address there?"
> >
> > The module sections stuff etc should likely be obviously root-only,
> > although maybe I'm missing some tool that ends up using it and is
> > useful to normal developers.
> >
> > And I'm thinking we could make kallsyms smarter too, and instead of
> > depending on kptr_restrict that screws over things with much too big a
> > hammer, we could make it take 'perf_event_paranoid' into account. I
> > suspect that's the main user of kallsyms that would still be relevant
> > to non-root.
> 
> Linus, what do you have in mind for the root-only "yes we really need
> the actual address output" exceptions?
> 
> For example, right now /sys/kernel/debug/kernel_page_tables
> (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x.
> 
> Looking other places that stand out, it seems like
> /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
> %p usage. It's unclear to me if a hash is sufficient for meaningful
> debugging there?
> 
> Seems like these three from dmesg could be removed?
> 
> [    0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576
> arch/x86/realmode/init.c
> 
> [    0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944
> r8192 d30512 u524288
> mm/percpu.c
> 
> [    0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB)
> mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff]
> lib/swiotlb.c
> 
> Tobin, some other feedback on v4...
> 
> I find the output hard to parse. Instead of:
> 
> [27527 lockdep_chains] [ffffffffb226c628] cgroup_mutex
> 
> Could we have:
> 
> 27527 /proc/lockdep_chains: [ffffffffb226c628] cgroup_mutex

This is what I had during development, it becomes had to parse when the
message contains ':' and also if the address is not contained in braces
(I'm assuming '[ffffffffb226c628] cgroup_mutex' is the message).

We could use your suggested format but replace the ':' character?

> At the very least, getting the full file path is needed or might not
> be clear where something lives.

Current dev version includes this.

> And for my kernels, I needed to exclude usbmon or the script would
> hang (perhaps add a read timeout to the script to detect stalling
> files?)

Good idea, I'll add this.

> diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl
> index 282c0cc2bdea..a9b729c0a052 100644
> --- a/scripts/leaking_addresses.pl
> +++ b/scripts/leaking_addresses.pl
> @@ -70,6 +70,7 @@ my @skip_walk_dirs_any = ('self',
>                           'thread-self',
>                           'cwd',
>                           'fd',
> +                         'usbmon',
>                           'stderr',
>                           'stdin',
>                           'stdout');
> 

Added this.

thanks. Again, sorry for missing this before v4.

Tobin

Tobin Harding Nov. 8, 2017, 3:06 a.m. UTC | #5

On Tue, Nov 07, 2017 at 01:44:01PM -0800, Linus Torvalds wrote:
> On Tue, Nov 7, 2017 at 1:22 PM, Kees Cook <keescook@chromium.org> wrote:
> >
> > Linus, what do you have in mind for the root-only "yes we really need
> > the actual address output" exceptions?
> 
> I am convinced that absolutely none of them should use '%pK'.
> 
> So far we have actually never seen a valid case wher %pK was really
> the right thing to do.
> 
> > For example, right now /sys/kernel/debug/kernel_page_tables
> > (CONFIG_X86_PTDUMP=y) needs actual address and currently uses %x.
> 
> So I think it could continue to use %x, and just make sure the whole
> file is root-only.
> 
> And that is why %pK is so wrong. It's almost never really about root.
> 
> Look at /proc/kallsyms, for example. There it's mainly about kernel
> profiles (although there certainly have been other uses historically,
> and maybe some of them remain) - which we have another flag for
> entirely that is very much specifically about kernel profiles.
> 
> > Looking other places that stand out, it seems like
> > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
> > %p usage. It's unclear to me if a hash is sufficient for meaningful
> > debugging there?
> 
> Maybe not, but that is also _so_ esoteric that I suspect the right fix
> is to just make it root-only readable.
> 
> I've never used it, we should check with people who have. I get the
> feeling that this is purely for PeterZ debugging.
> 
> The very first commit that introduced that code actually has a
> 
>     (FIXME: should go into debugfs)
> 
> so I suspect it never should have been user-readable to begin with. I
> guess it makes some things easier, but it really is *very* different
> from things like profiling.
> 
> Profiling you often *cannot* do as root - some things you profile
> really shouldn't be run as root, and might even refuse to do so. So
> requiring you to be root just to get a kernel profile is very bad.
> 
> But looking at lockdep stats? Yeah, 'sudo' isn't so big of a deal.
> 
> And I really suspect that's true of a _lot_ of these %p things that
> really want a pointer. It's not that they really want %pK, it's that
> they shouldn't have been visible to regular users in the first place.
> 
> Things that *do* want a pointer and should be visible to regular users
> are things like oops messages etc, but even there we obviously
> generally want to use %pS/%pF when possible (but generally %x when not
> - things like register contents etc that *may* contain pointers).

This is opt-in, it means asking developers to do the right thing every time
they think they need a pointer. If we hash %p as soon as someone has
been bitten by it they will start using %x and sooner or later they will
use %x exclusively (suggesting using %x for _really_ necessary addresses
will only hasten the process).

If the only real benefit of hashing %p is to clean up kruft why don't we
add %pX and do tree wide substitution of all %p to %pX. People that get
broken will change it back to %p and we will have half a chance of
finding new abusers of %p down the track.

If there is not a 'one size fits all' solution, as we have seen with
kptr_restrict, then should we not be trying to make it easier for people
to do the right thing _and_ easier for us to catch it when we do the
wrong thing?

There is already almost 30 000 users of %{x,X}, surely we don't want to
promote more kernel address printing to be mixed in with all of that?

> And if they really are visible to users - because you want to
> cross-correlate them for things like netstat - I think the hashing is
> the right thing to do both for root and for regular users.
> 
> > Seems like these three from dmesg could be removed?
> >
> > [    0.000000] Base memory trampoline at [ffffa3fc40099000] 99000 size 24576
> > arch/x86/realmode/init.c
> >
> > [    0.000000] percpu: Embedded 38 pages/cpu @ffffa4007fc00000 s116944
> > r8192 d30512 u524288
> > mm/percpu.c
> >
> > [    0.456395] software IO TLB [mem 0xbbfdf000-0xbffdf000] (64MB)
> > mapped at [ffffa3fcfbfdf000-ffffa3fcfffdefff]
> > lib/swiotlb.c
> 
> Yes, I think the solution for a lot of the random device discovery
> messages etc is to just remove them. They were likely useful when the
> code was new and untested, and just stayed around afterwards.

Is anyone actually going to do this? If the hashing gets done then these
messages are not a risk, are _real_ kernel developers going to bother
cleaning this up? Surely newbies are not going to get much love either
if they submit patches that '[PATCH] remove unnecessary printk message'.

thanks,
Tobin.

Tobin Harding Nov. 8, 2017, 4:18 a.m. UTC | #6

On Tue, Nov 07, 2017 at 01:22:13PM -0800, Kees Cook wrote:
> On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
[snip]
> And for my kernels, I needed to exclude usbmon or the script would
> hang (perhaps add a read timeout to the script to detect stalling
> files?)

Bother. I submitted the patch set without adding this suggestion. Even
after apologizing for missing it.

Better context switching required.

thanks,
Tobin.

Peter Zijlstra Nov. 8, 2017, 8:20 a.m. UTC | #7

On Tue, Nov 07, 2017 at 05:44:13PM -0500, Steven Rostedt wrote:
> On Tue, 7 Nov 2017 13:44:01 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > > Looking other places that stand out, it seems like
> > > /proc/lockdep_chains and /proc/lockdep (CONFIG_LOCKDEP=y) has a ton of
> > > %p usage. It's unclear to me if a hash is sufficient for meaningful
> > > debugging there?  
> > 
> > Maybe not, but that is also _so_ esoteric that I suspect the right fix
> > is to just make it root-only readable.
> 
> Also note, I don't believe anyone should be running a LOCKDEP
> configured kernel in a production (secured) environment. As it adds
> quite a bit of overhead. It's something you run on test environments to
> make sure it doesn't detect any possible deadlocks.
> 
> > 
> > I've never used it, we should check with people who have. I get the
> > feeling that this is purely for PeterZ debugging.
> 
> I've used it. But then again, I also debug lockdep ;-)
> 
> > 
> > The very first commit that introduced that code actually has a
> > 
> >     (FIXME: should go into debugfs)
> > 
> > so I suspect it never should have been user-readable to begin with. I
> > guess it makes some things easier, but it really is *very* different
> > from things like profiling.
> 
> Want me to whip up a patch to move the file?

Fine by me; create /debug/lockdep/ for the 3 files or something like
that.

As to the actual addresses, they can be used to double check things are
in fact the same object (in case of name collisions), are in static
memory (as these things ought to be) etc.. But mostly they're not too
important.

And yes, as everybody says, LOCKDEP is debug tool; you run this on your
(local) dev kernels, anything else it out of spec.

Re: [PATCH v3] scripts: add leaking_addresses.pl

Commit Message

Comments

Patch